How I Know Your Data Science/ML Project Will Fail Before You Even Begin

Written with Stephen Pettinato.

Data science is a paradox—It is titled the “sexiest job of the 21st century” yet sees 70-85% project failure rates. And surprisingly, the demand for data professionals still far exceeds supply.

This combination of high-demand and high-failure rates is counterintuitive. Why do businesses keep investing in data resources? Companies keep investing because they have to. According to McKinsey Research, the gap between leaders and laggards in data adoption is growing, and only a tiny fraction of the potential value is unlocked. Frustration builds as businesses invest in data teams and don’t feel they are getting a good return.

Data professionals are frustrated, too. A peculiar trait of data folks is their intense curiosity. We want to know why things work at a deeper level than most. We also want to build products and make recommendations that drive impact. It is soul-crushing when we invest deeply in a project only to watch it not reach its full potential.

This post focuses on factors that you, a data professional, can control during a project. We’ve seen hundreds of data projects over the past 10+ years and distilled the patterns that correlate with successful outcomes. And the majority of these patterns exist before you even begin!

The first part of this post lists the top ten issues that lead to failure before a project begins. The second part provides a general template that you can use.

Failure Drivers

Issues that lead to a 90%+ failure rate:

You can’t answer why we are doing this work. And “because [insert important stakeholder] says so” is not a good answer. Very rarely is a specific ask from a stakeholder the right problem to solve.
You can’t answer why this work is meaningful or discuss the opportunity cost of your time. More often than not, a current “good enough” solution is still “good enough,” even if it is an annoyance for a team. Remember that a yes on your time is also a no to other opportunities.
You aren’t clear how you measure success. Get specific.
The deliverables are fuzzy, placeholders, or unclear. I see this as a sign that you and your stakeholders are not aligned.
Your stakeholders can’t answer any of the above questions. This situation is another big sign that you and your stakeholders are not aligned. And you can be sure that your leadership is talking to your stakeholders.

Issues that lead to a 70%+ failure rate:

You don’t have semi-frequent milestones. Splitting a project into bite-sized pieces is helpful in so many ways.
Your stakeholder section looks like the company’s about us page. Too many stakeholders are a mess. Too few stakeholders are ineffective. Find what works best for your company culture, and don’t be afraid to prune over time.
You don’t discuss how the project ends. Document the end to ensure that your team can account for maintenance in project planning. Why does this step lead to failure? It’s incredible how fast you can overburden yourself as an organization if you aren’t clear on priorities and where you spend your time.
You shared a slide deck with me. OK, I’ll admit this is a cheap shot. I’ve successfully delivered large projects that began as a slide deck. And slide culture is so deeply ingrained in some company cultures that you don’t have a choice. However, you can still write out your plan in detail and use slides to summarize for a presentation.
Your design document points to little or no other internal resources. I’m skeptical about pitching a project that relies little on current architecture and systems. It’s a warning sign that you are duplicating work or didn’t get enough feedback on your design. If the system is bespoke, then specify why.

Here’s a project template that helps address the above issues. This template generalizes well across industries and team compositions because it focuses on the first principles for a successful project. First-principles thinking, sometimes called reasoning from first principles, means breaking down a problem into its fundamental components and building up from there. The approach both makes complex problems more accessible and stimulates creativity.

Project Template

This project template (Github link) is a strategic tool designed to guide teams through planning, executing, and successfully completing projects. It emphasizes the importance of adaptability, feedback, and continuous improvement as fundamental components of effective project management.

A word of caution—completing this document should not be mistaken for thorough project planning. It's a dynamic tool meant to evolve alongside your project.

Project Lead Tips:

Flexibility: This template is a foundational guide, not a strict rulebook. It's designed to be adjusted and adapted to fit your project's unique needs and challenges.
Valuable Feedback: Not all feedback will be equally useful, but every input is an opportunity to learn. Prioritize insights that align with your project goals, but remain open to unexpected lessons.
Team Collaboration: Project leadership involves guiding the vision and structure of your initiative, but achieving your goals is a team effort. Encourage ownership by delegating sections of this document to team members.
Document Autonomy: This template should serve as a comprehensive resource for anyone involved in or joining the project at any stage, facilitating onboarding, stakeholder meetings, and project reviews without requiring additional explanation.

Leadership Tips:

Engagement: Post-reading, engage in a detailed walkthrough of this document with your team, sans slides, to foster a deeper understanding and alignment on project goals and strategies.
Beyond Checkboxes: Encourage your team to approach this document as a living record of their planning and thought processes, not just a formality to be completed.
Contextual Clarity: Providing clear, upfront context about project expectations and directions can significantly impact your team's motivation and alignment. Misalignments can occur; when they do, own them, clarify, and realign.
Project Visibility: Keep this document and your project's progress visible and accessible. Encouraging an environment where projects are openly shared allows others to learn from, contribute to, and build upon existing work.

High-Level Overview

Definition: This section serves as the executive summary of your project, succinctly capturing its essence, goals, and value proposition in one to two sentences. It's the elevator pitch of your project, setting the stage for the detailed planning that follows by clearly articulating the project's purpose, its significance to stakeholders, and the metrics by which success will be measured.
Nature: Strategic and foundational, providing a concise overview aligning stakeholders with the project's core objectives and anticipated impact.
Example: "Our project aims to implement an advanced recommender system, enhancing user content engagement by personalizing selections, with the goal of boosting user engagement by 30% and increasing retention rates by 20% within the first year."

Problem Statement

Definition: The problem statement identifies and articulates the key issues or opportunities that the project seeks to address. It lays the groundwork for the project by detailing existing challenges, the necessity of the project for solving these issues, and the potential benefits of the project's successful execution. This section is essential for justifying the project's initiation and rallying stakeholder support around a shared understanding of the objectives.
Nature: Analytical and diagnostic, highlighting the discrepancy between the current state and the desired future state, thereby establishing the project's rationale and urgency.
Example: "The existing approach to content delivery employs a one-size-fits-all strategy that fails to engage a diverse user base, leading to decreased user activity and retention. We anticipate significantly enhancing the user experience by implementing a personalized recommender system, thus encouraging more prolonged and frequent engagement."

Objectives and Outcomes

Definition: This section outlines the project's strategic goals (objectives) and the expected benefits or changes resulting from achieving these goals (outcomes). Objectives should be aligned with the broader aims of the business or organization. At the same time, outcomes should provide a clear vision of the project's intended impact, detailing how the successful completion of the project will improve current conditions or solve existing problems.
Nature: Forward-looking and goal-oriented, defining specific targets for the project and outlining the anticipated benefits to stakeholders.
Example: "Objective: Develop and deploy a machine learning-based recommender system to personalize user content feeds, thereby enhancing user engagement and satisfaction. Outcome: Achieve a 25% increase in average user session time and a 15% rise in monthly active users within six months of system deployment."

Deliverables

Definition: Deliverables refer to the tangible or intangible products, services, or results that the project will produce. This section specifies what the project will deliver upon completion, linking these outputs directly to the project's objectives to illustrate how the deliverables contribute to achieving the desired outcomes.
Nature: Concrete and outcome-focused, clearly defining the end products or results of the project efforts.
Example: "The project will deliver a fully operational, machine learning-based recommender system, integrated seamlessly into the existing content platform. Additionally, comprehensive documentation on the system's architecture and algorithms and a detailed user guide for system administrators will be provided."

Stakeholders and Roles and Responsibilities

Definition: Identifies all parties with a vested interest in the project, detailing their roles, responsibilities, and expected contributions. This section is crucial for clarifying the project's governance structure, ensuring that all participants understand their roles and expectations.
Nature: Clarifying and organizational, establishing a clear framework for project governance and accountability.
Example: "Project Sponsor: Provides overall direction and resources. Data Science Team: Leads the development of the recommender algorithm. Engineering Team: Responsible for integrating the algorithm into the platform. End Users: Offer feedback and insights to refine the system."

Schedule and Milestones

Definition: Outlines the anticipated timeline for the project, including key milestones, deadlines for each significant phase, and any dependencies that might impact the project schedule. This framework is vital for planning, tracking progress, and ensuring timely project execution.
Nature: Temporal and structured, offering a chronological blueprint for the project's execution.
Example: "Milestone 1: Complete the initial data collection and preprocessing by the end of Q1. Milestone 2: Develop and test the recommender system algorithm by the end of Q2. Final Milestone: Fully integrate the recommender system and launch it to users by the end of Q3."

Requirements

Functional Requirements

Definition: Specifies the behaviors, features, and functionalities the project's deliverables must exhibit to satisfy user and stakeholder needs effectively. These requirements focus on what the system is expected to do, from operational functions to user interactions.
Nature: Detailed and user-focused, guiding the development of systems that meet specific user needs and project objectives.
Example: "The recommender system must dynamically process user interactions, tailor content recommendations in real-time, and support scalability to accommodate up to 100,000 concurrent users without compromising performance."

Technical Requirements

Definition: Details the technical specifications, standards, and environmental conditions that the project must adhere to. This includes requirements for software and hardware, integration protocols, security standards, and performance criteria, ensuring that the project aligns with technical feasibility and industry standards.
Nature: Precise and technical, ensuring the project's deliverables are technically viable and secure.
Example: "The system will be developed on a scalable cloud computing platform, utilizing data encryption for security, adhering to GDPR for data privacy, and featuring API integration for compatibility with existing content management systems."

Budget and Resources

Definition: Provides a comprehensive financial overview and resource allocation plan for the project, emphasizing the importance of accurate budget forecasting and strategic resource management. This section includes a breakdown of expected costs and a detailed listing of the material, technological, and human resources required for project success, highlighting the necessity of strategic planning in resource allocation to maximize efficiency and return on investment.
Nature: Quantitative and strategic, ensuring the project is well-supported financially and materially.
Example: "The project is budgeted at $1.2 million, covering data acquisition, development, integration, and deployment phases. Required resources include a team of data scientists for algorithm development, software engineers for system integration, and cloud computing infrastructure for hosting the system."

Communication Plan and Stakeholder Engagement

Definition: Outlines a comprehensive strategy for effective information exchange and stakeholder involvement throughout the project lifecycle. This plan includes mechanisms for regular updates, feedback loops, and collaborative decision-making processes, ensuring that stakeholders remain informed, engaged, and aligned with the project's progress and outcomes.
Nature: Inclusive and communicative, fostering transparency, collaboration, and stakeholder buy-in.
Example: "Communication strategies include weekly progress updates via email, monthly stakeholder engagement meetings to gather input and adjust strategies as needed, and a dedicated project channel on Slack for daily communication and rapid feedback."

Risk Management

Definition: Establishes a systematic approach to identifying, assessing, and mitigating risks that could impact the project's success. This proactive strategy involves analyzing potential risks, determining their likelihood and potential impact, and developing plans to address these risks effectively, ensuring the project's resilience and preparedness for unforeseen challenges.
Nature: Proactive and preventative, aimed at minimizing project vulnerabilities and enhancing adaptability.
Example: "Identified risk: Potential delays in data acquisition due to evolving data privacy regulations. Mitigation strategy includes early engagement with data privacy experts, developing a flexible project timeline to accommodate regulatory review, and exploring alternative data sources as needed."

Close-Out and Maintenance

Definition: Describes the processes and activities involved in formally concluding the project, documenting its successes and lessons learned, and transitioning to ongoing operations or maintenance. This section emphasizes the importance of evaluating project outcomes, ensuring the deliverables' sustainability, and establishing a framework for continuous improvement, including regular reviews and updates to adapt to changing user needs and technological advancements.
Nature: Conclusive and forward-looking, focusing on securing the project's long-term impact and adaptability.
Example: "Upon project completion, a comprehensive post-deployment review is conducted to evaluate the system's performance and user satisfaction. A dedicated maintenance team is established to provide ongoing support, monitor system effectiveness, and implement necessary updates and improvements, ensuring the recommender system continues to meet user needs and remains at the forefront of technological advancements."