How I Know Your Data Science/ML Project Will Fail Before You Even Begin

Photo by Markus Spiske / Unsplash

Written with Stephen Pettinato.

Data science is a paradox—It is titled the “sexiest job of the 21st century” yet sees 70-85% project failure rates. And surprisingly, the demand for data professionals still far exceeds supply.

This combination of high-demand and high-failure rates is counterintuitive. Why do businesses keep investing in data resources? Companies keep investing because they have to. According to McKinsey Research, the gap between leaders and laggards in data adoption is growing, and only a tiny fraction of the potential value is unlocked. Frustration builds as businesses invest in data teams and don’t feel they are getting a good return.

Data professionals are frustrated, too. A peculiar trait of data folks is their intense curiosity. We want to know why things work at a deeper level than most. We also want to build products and make recommendations that drive impact. It is soul-crushing when we invest deeply in a project only to watch it not reach its full potential.

This post focuses on factors that you, a data professional, can control during a project. We’ve seen hundreds of data projects over the past 10+ years and distilled the patterns that correlate with successful outcomes. And the majority of these patterns exist before you even begin!

The first part of this post lists the top ten issues that lead to failure before a project begins. The second part provides a general template that you can use.

Failure Drivers

Issues that lead to a 90%+ failure rate:

  • You can’t answer why we are doing this work. And “because [insert important stakeholder] says so” is not a good answer. Very rarely is a specific ask from a stakeholder the right problem to solve.
  • You can’t answer why this work is meaningful or discuss the opportunity cost of your time. More often than not, a current “good enough” solution is still “good enough,” even if it is an annoyance for a team. Remember that a yes on your time is also a no to other opportunities.
  • You aren’t clear how you measure success. Get specific.
  • The deliverables are fuzzy, placeholders, or unclear. I see this as a sign that you and your stakeholders are not aligned.
  • Your stakeholders can’t answer any of the above questions. This situation is another big sign that you and your stakeholders are not aligned. And you can be sure that your leadership is talking to your stakeholders.

Issues that lead to a 70%+ failure rate:

  • You don’t have semi-frequent milestones. Splitting a project into bite-sized pieces is helpful in so many ways.
  • Your stakeholder section looks like the company’s about us page. Too many stakeholders are a mess. Too few stakeholders are ineffective. Find what works best for your company culture, and don’t be afraid to prune over time.
  • You don’t discuss how the project ends. Document the end to ensure that your team can account for maintenance in project planning. Why does this step lead to failure? It’s incredible how fast you can overburden yourself as an organization if you aren’t clear on priorities and where you spend your time.
  • You shared a slide deck with me. OK, I’ll admit this is a cheap shot. I’ve successfully delivered large projects that began as a slide deck. And slide culture is so deeply ingrained in some company cultures that you don’t have a choice. However, you can still write out your plan in detail and use slides to summarize for a presentation.
  • Your design document points to little or no other internal resources. I’m skeptical about pitching a project that relies little on current architecture and systems. It’s a warning sign that you are duplicating work or didn’t get enough feedback on your design. If the system is bespoke, then specify why.

Here’s a project template that helps address the above issues. This template generalizes well across industries and team compositions because it focuses on the first principles for a successful project. First-principles thinking, sometimes called reasoning from first principles, means breaking down a problem into its fundamental components and building up from there. The approach both makes complex problems more accessible and stimulates creativity.

Project Template

This design template (Github link) helps you maximize success on your projects. The value of this template comes through going through the process, gathering feedback, exploring solutions, and adjusting as you learn.

A word of caution—Be careful about confusing a filled-out document as going through the process effectively.

Project lead tips:

  • Be flexible and adjust the template as needed. This template is a starting point and guide. It is not a prescriptive destination.
  • Not all feedback is valuable. You can learn a lot from some discussions. And sometimes, you gain nothing. However, you can never learn less by gathering input.
  • Work as a team on the design. While it’s your job as the project lead to own the overall structure and voice, it’s not your job to write everything yourself. Get your team involved by assigning ownership to sections.
  • This document needs to stand on its own. A nice benefit is this helps with onboarding, meeting new stakeholders, and project reviews.

Leadership tips:

  • Ask your team to walk you through the document once you’ve read through it deeply. And don’t use slides.
  • Avoid setting OKRs and goals that focus on filling out this document. There is a fine line between incentivizing your team to gather feedback and think through solutions vs. your team filling it out to check a box.
  • Give your team as much context upfront as you can. It’s incredibly demotivating for a group to jam on a design for weeks only to hear they went in a wildly different direction than you expected. It’s your fault when this happens, so when it does, own up to it and provide better expectations.
  • Incentivize your team to keep this document updated throughout the project. Also, if your organizational culture allows, encourage your team to make projects discoverable so that others can build on their work.

Problem Statement

This section answers the question, “Why are we doing this work?” Think expansively rather than reductively at first. For example, ask questions like, “In a perfect world where this data exists and you have a great prediction, what would you do?” Very rarely is a specific ask from a stakeholder the right problem to solve. Tips:

  • If you don’t understand why you are doing this work, don’t do it.
  • “Far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question, which can always be made precise.” - John Tukey

Business Case and Current State
Why is this project important? How is this problem solved today? Thought starters:

  • What business metric(s) improve by doing this work? How does this project align with larger company goals?
  • What happens if we don’t do this work? More often than not, a current “good enough” solution is still “good enough,” even if it is an annoyance for a team. Remember that a “yes” on your time is also a “no” to other opportunities.

Success Metrics
How do you know the project is successful? Get specific. Thought starters:

  • Will it be a “2% increase of conversion rate” or some meaningful change to a business metric?
  • Is it a production model with a minimum percentage uptime?
  • Can you run an experiment such as an AB test? What other metrics are important?

Lessons Learned
What did you learn from the last phase of the same project? Or what did you learn from a previous, similar project?

Requirements
What are the high-level requirements? Work with the stakeholders and come up with ~3-5 bullet points such as:

  • Make a model that predicts XYZ.
  • Model results should be available in the data warehouse and updated periodically.
  • Business owners should understand how to query and use these results.

Deliverables
What are the final deliverables? Work with the stakeholders to come up with deliverables, using the above requirements as examples, such as:

  • A presentation explaining the model outputs, feature importance, exploratory visuals, and recommended next steps.
  • A specific database location for model results with documentation on data refresh frequency.
  • Training for business owners on how to query the results and a data dictionary.

Tips:

  • If you don’t understand what the high-level goals are, then keep iterating. Often the high-level goals are too high-level, and it’s up to us as data professionals to refine and articulate them, which is great because we get to own and guide the project.
  • Do not start a project without at least one explicitly defined and agreed-upon deliverable with a target due date.

Prioritization
Work with your broader organization to evaluate the priority of this project against other projects.

Your organization may say that this project is not worth doing right now or even not worth doing at all. Document this decision here and review it with stakeholders. Priorities can shift, and it’s OK to redesign or reprioritize a project at a later date. What is not relevant to this prioritization decision is the timing of the request. Be careful about pressure from stakeholders because someone requested something x months ago.

Stakeholders

Who are the project stakeholders? List names and roles. Tips:

  • No development should start until the scope is finalized-ish. And be careful about scope creep during a project. It’s OK if the goal changes and we adjust. What’s not OK is if a significant scope change isn’t discussed and agreed on.
  • Too many stakeholders are a mess. Too few stakeholders are ineffective. Find what works best for your company culture, and don’t be afraid to prune over time.
  • Be flexible. You’ll know when you are ready to start; feedback helps refine the design and approach and allows folks to invest in the results.

Ethics

Does your project cause harm? Thought starters:

  • Can the data product harm users or the organization? If so, how can you reduce these risks?
  • How are you capturing, storing, and retaining user data? Does your system comply with data-related laws such as GDPR and CCPA?
  • Are your model results and analyses biased against specific demographics?
  • How transparent is the decision-making logic in the system?

System Architecture

What’s this going to look like at the end? Be specific—diagrams, a picture or handwritten design, dashboard mocks, etc. Ideally, you’re able to link to pre-existing resources as well.

Inputs and Outputs
What are all the data inputs and outputs? Exactly where. Directories, formats, schema.tablename, etc.

Product Usage
How will this data product be used? What systems and teams will this data product interact with at a detailed level? Be specific. For example:

  • Data flows from this system to X and Y and influences decision Z.
  • Data flows into G and is used for H, K, and J.

Algorithm/Mocks/Report
Rename this header as appropriate for the project. This part is a bit of a grab bag and depends on the problem. Thought starters:

ML models

  • It’s tempting to spend way too much time on this one, so this documentation is vital to put in some guardrails.
  • The suggested approach would be an iterative approach like “build X and Y review with the team and evaluate based on target metric Z, if target metric is > T% launch, otherwise, decide next steps.”

Dashboard or report

  • Mocks are essential to decide if this solution fulfills the requirements. You need to review them with stakeholders.
  • In place of a specific mock, an iterative approach is to build X and Y, review with the team, then decide the next steps.
  • What calculations/aggregations are necessary? Sometimes this is obvious in a mock.

Corner cases

  • Document any weird or essential corner cases.
  • Team review is essential as everyone has their pet corner case.

Milestones

What are you delivering and when? Having a project split into bite-sized pieces is helpful in many ways:

  • Detailed tickets with clear items to be done help maintain focus.
  • Communicate progress to wider stakeholders.
  • Provide guardrails to stay on track.
  • Are nicely incremental and allow for an iterative approach to a project.
  • Be flexible. Too much detail in a ticket entails too much time spent writing tickets; too little leaves the ticket vague and makes completion unclear.

Close-Out

How does this project end? Document the end to ensure that your team can account for maintenance in project planning. Thought starters:

  • Some projects need maintenance to keep running.
  • Some projects can spur on another project.
  • Some projects are done with a deliverable.
  • Some projects are handed off to other teams.
Jason Gilbertson

Jason Gilbertson