We have seen it happen more times than we can count. A team runs a proof of concept, the results look great, leadership gets excited, and then — six months later — the thing is still not live. Or worse, it is live but nobody uses it.
AI projects have an unusually high failure rate compared to conventional software. The reasons are usually not technical. They are about assumptions that never got questioned.
The demo is not the product
The most common failure starts here. A proof of concept runs on clean, handpicked data, with generous compute, and often with a human in the loop to catch edge cases. The demo works beautifully. Everyone concludes the hard part is done.
It is not. The hard part is making it work on messy, inconsistent, real-world data — at scale, automatically, without someone checking every output. That gap is enormous, and most teams do not discover it until they are deep into implementation.
The proof of concept answers "can this work?" The production system has to answer "will this work every time, for every user, on data we have never seen before?"
The data problem gets underestimated every time
In a demo, data is your friend. You pick examples that show the model in its best light. In production, you get everything — incomplete records, format inconsistencies, duplicate entries, values that should not exist.
We have seen AI projects where 60% of the actual build time went into cleaning and validating data pipelines. None of that was scoped at the start because everyone assumed the data was "basically ready."
Before you start any AI project, spend serious time with your actual data. Not sample data. Not cleaned data. The real stuff, in the state it actually lives in your systems.
The wrong success metric
Teams often optimise for accuracy on a test set. That is a reasonable starting point, but it is not the same as being useful in production. We have seen models with 94% accuracy on benchmarks that completely failed in practice because the 6% of errors happened to fall on the most important edge cases — the ones with the highest business impact.
Before you build, define what "good enough" looks like in terms of business outcomes, not model metrics. How many errors can users tolerate before they stop trusting the system? What happens when the model is wrong? Is there a fallback?
No one owns it after launch
AI systems degrade over time. The world changes, user behaviour shifts, and the data the model was trained on stops being representative. Without someone responsible for monitoring and retraining, most AI systems quietly get worse over months — and nobody notices until the business impact is obvious.
Plan for ongoing ownership before you build. It does not need to be a full-time role, but it does need to be someone's actual responsibility.
What we do differently
When we scope an AI project, we spend the first week on the data and the business constraints — before writing a single line of model code. We want to understand:
- What does the system need to do when the model is uncertain?
- What are the highest-stakes error types, and how do we catch them?
- Who owns this post-launch, and what does monitoring look like?
- What does "live in production" actually mean for this use case?
The answers to those questions shape everything else. Sometimes they change the entire approach. That is usually a good thing — better to find out in week one than in month six.
If you are planning an AI project, we are happy to talk through the scope before you commit to anything. No pitch. Just an honest conversation about what would actually work.