Thought
Business
Root Cause Analysis Series: The 5 Whys for untangling digital product problems
The 5 Whys is a root cause analysis technique that asks "why" at least five times in a row, peeling each layer of an answer until the team reaches a system or process cause it can actually change. It works best when a problem keeps recurring and no one can name a clear trigger. The discipline pushes a team to stop patching symptoms and start fixing the structure that produced them.
What The 5 Whys is and where it came from
The 5 Whys was developed by Sakichi Toyoda, founder of Toyota Industries, in the 1930s. It became a core habit inside the Toyota Production System under Taiichi Ohno and later spread across Lean manufacturing, Six Sigma, quality management, and software operations worldwide. Toyota still uses it on the production floor today, running a 5 Whys against every defect to make sure the same issue does not surface in the next batch.
The core idea is simple. Most teams stop explaining a problem at the surface, where the symptom lives. The 5 Whys forces the conversation deeper, into the process, the policy, or the structure that allowed the symptom to appear. That is the layer the team can actually redesign. Anything higher is a temporary patch dressed up as a fix.
How to run a 5 Whys session (with an example)
Step 1: Define the problem precisely
Before you ask the first "why", write the problem as something observable and measurable. "The website has issues" is not a problem statement. "Conversion rate dropped 40 percent month over month" is.
Example problem: Mobile app users are complaining that payments fail. Transaction success rate has dropped from 95 percent to 60 percent.
Step 2: Why number one
Why did transaction success rate drop to 60 percent?
Because the payment gateway returns an error during the confirmation step.
Step 3: Why number two
Why is the payment gateway returning an error?
Because the API call to the payment provider is using a request format that no longer matches their spec.
Step 4: Why number three
Why is the request format wrong?
Because the payment provider released a new API version, and the engineering team never received the change notice.
Step 5: Why number four
Why did the team miss the change notice?
Because there is no alert or monitoring in place for changes from third-party services.
Step 6: Why number five
Why is there no monitoring for third-party services?
Because vendor management has no required step for subscribing to change logs or release notes from external APIs.
Root cause: Vendor management has no systematic process for tracking changes in external dependencies.
Durable fix: Add a third-party checklist that requires every external service to have a subscribed change feed, an automated weekly API health check, and a named owner who reviews vendor release notes each month.
Notice how the early answers (gateway error, wrong API format) would have generated quick patches. The real fix lives at layer five, in how the company manages vendors. Stopping at layer two restores service for a week, but the same class of problem returns the next time a vendor pushes an update.
A useful sanity check, "5" is shorthand for "deep enough", not a hard rule. Some problems land at layer three, others need layer seven. You have reached a real cause when the answer points at something your team controls directly.
Common pitfalls
Four patterns derail a 5 Whys session more than any others.
Stopping at a person. Answers like "because the developer forgot" feel concrete, but they end the analysis at the wrong layer. The deeper question is why the system let one person become a single point of failure. Teams that stop at a name drift into blame culture, and the same gap appears the next time a different person is on the hook.
Jumping branches mid-chain. It is tempting to switch lanes when an answer hints at a juicier cause elsewhere. The result is a tangled chain where layer two is about vendors, layer three is about hiring, and layer four is about culture. No one can act on it. Keep one chain disciplined, then run a separate chain if you suspect a parallel cause.
Running the session alone. A 5 Whys done in a single head becomes a private theory. The chain should include two or three people who actually touched the process, with evidence (logs, metrics, tickets) backing each "why". If the only evidence is memory, the answers are speculation.
Padding the chain to hit five. Some teams treat "5" as the target rather than the depth. The fix is to test each layer against a simple question, "does this answer point at something we control?" If yes, stop. If no, keep going.
Compared to other tools in the RCA Series
In SUFFIX's RCA toolkit, Problem-Analysis (the 4-axis framework) is the gateway. It classifies the problem by clarity of symptom, scope, urgency, and ownership before you pick a tool. Fact-Based Thinking, drawn from McKinsey practice, sharpens the problem statement so the analysis starts on solid ground.
Fishbone Diagram is the right call when a problem touches multiple functions at once and you need broad coverage across people, process, tools, and environment. Fault Tree Analysis (FTA) is built for problems with several parallel causes where AND/OR logic matters. FMEA is the proactive option, used before failure occurs to rank risks by Severity, Occurrence, and Detection.
Change Analysis is the natural pair to 5 Whys when there is a clear before-and-after moment. Barrier Analysis maps which defensive layers held and which broke. Parent Cause and Management Oversight zoom out to the organizational layer, where ownership models and reporting structures shape why problems repeat.
5 Whys is the lightest tool in the kit. It starts fastest, requires no template, and works well when the team has a working hypothesis to test layer by layer. Pick it when the problem is recurring, the cause feels structural, and you want a single disciplined chain rather than broad coverage.
When NOT to use The 5 Whys
5 Whys is weak in three situations.
When the problem has multiple parallel causes that interact, a single "why" chain picks one branch and misses the others. Fishbone Diagram fits better. If the parallel causes follow strict AND/OR logic, FTA is sharper.
When the problem started right after a clear change, Change Analysis is faster. Comparing before and after surfaces the shifted variable directly. You can then run 5 Whys on that variable for the structural cause.
When the goal is to prevent failure rather than diagnose one, FMEA is the right tool, scoring risks before they occur.
There is also a softer limit. If the chain leans on one person's memory, answers drift toward speculation. Run it with people who touched the process, not managers who heard about it secondhand.
Use case for digital product teams
For digital product teams, 5 Whys earns its keep on the recurring incidents that quick patches never fully close. Payment failures, sporadic checkout drops, sync issues with third-party services, and onboarding completion rates that keep slipping after every "fix" all reward a disciplined chain.
The SUFFIX way to run it is to keep the session short, evidence-based, and shared. Two or three people who touched the process, one document, and the actual data (logs, dashboards, vendor release notes) in front of the team. Write the problem in measurable terms, run the chain to the layer that points at a process or policy you own, then translate the answer into a durable fix with a named owner and a recheck date.
For executives, 5 Whys is the right tool when the question is "why does this problem keep coming back even after we fixed it?" It pushes the team past the most recent patch and into the conditions that made the patch necessary, which is usually where the real budget conversation lives.
FAQ
What is The 5 Whys and who created it?
Do you always have to ask "why" exactly five times?
How is The 5 Whys different from Change Analysis?
How do you know you have reached the real root cause?
Writer
Director
Jate Saitthiti