Thought

Business

Root Cause Analysis Series: The 5 Whys for untangling digital product problems

The 5 Whys is a root cause analysis technique that asks "why" at least five times in a row, peeling each layer of an answer until the team reaches a system or process cause it can actually change. It works best when a problem keeps recurring and no one can name a clear trigger. The discipline pushes a team to stop patching symptoms and start fixing the structure that produced them.

What The 5 Whys is and where it came from

The 5 Whys was developed by Sakichi Toyoda, founder of Toyota Industries, in the 1930s. It became a core habit inside the Toyota Production System under Taiichi Ohno and later spread across Lean manufacturing, Six Sigma, quality management, and software operations worldwide. Toyota still uses it on the production floor today, running a 5 Whys against every defect to make sure the same issue does not surface in the next batch.

The core idea is simple. Most teams stop explaining a problem at the surface, where the symptom lives. The 5 Whys forces the conversation deeper, into the process, the policy, or the structure that allowed the symptom to appear. That is the layer the team can actually redesign. Anything higher is a temporary patch dressed up as a fix.

How to run a 5 Whys session (with an example)

Step 1: Define the problem precisely

Before you ask the first "why", write the problem as something observable and measurable. "The website has issues" is not a problem statement. "Conversion rate dropped 40 percent month over month" is.

Example problem: Mobile app users are complaining that payments fail. Transaction success rate has dropped from 95 percent to 60 percent.

Step 2: Why number one

Why did transaction success rate drop to 60 percent?

Because the payment gateway returns an error during the confirmation step.

Step 3: Why number two

Why is the payment gateway returning an error?

Because the API call to the payment provider is using a request format that no longer matches their spec.

Step 4: Why number three

Why is the request format wrong?

Because the payment provider released a new API version, and the engineering team never received the change notice.

Step 5: Why number four

Why did the team miss the change notice?

Because there is no alert or monitoring in place for changes from third-party services.

Step 6: Why number five

Why is there no monitoring for third-party services?

Because vendor management has no required step for subscribing to change logs or release notes from external APIs.

Root cause: Vendor management has no systematic process for tracking changes in external dependencies.

Durable fix: Add a third-party checklist that requires every external service to have a subscribed change feed, an automated weekly API health check, and a named owner who reviews vendor release notes each month.

Notice how the early answers (gateway error, wrong API format) would have generated quick patches. The real fix lives at layer five, in how the company manages vendors. Stopping at layer two restores service for a week, but the same class of problem returns the next time a vendor pushes an update.

A useful sanity check, "5" is shorthand for "deep enough", not a hard rule. Some problems land at layer three, others need layer seven. You have reached a real cause when the answer points at something your team controls directly.

Common pitfalls

Four patterns derail a 5 Whys session more than any others.

Stopping at a person. Answers like "because the developer forgot" feel concrete, but they end the analysis at the wrong layer. The deeper question is why the system let one person become a single point of failure. Teams that stop at a name drift into blame culture, and the same gap appears the next time a different person is on the hook.

Jumping branches mid-chain. It is tempting to switch lanes when an answer hints at a juicier cause elsewhere. The result is a tangled chain where layer two is about vendors, layer three is about hiring, and layer four is about culture. No one can act on it. Keep one chain disciplined, then run a separate chain if you suspect a parallel cause.

Running the session alone. A 5 Whys done in a single head becomes a private theory. The chain should include two or three people who actually touched the process, with evidence (logs, metrics, tickets) backing each "why". If the only evidence is memory, the answers are speculation.

Padding the chain to hit five. Some teams treat "5" as the target rather than the depth. The fix is to test each layer against a simple question, "does this answer point at something we control?" If yes, stop. If no, keep going.

Compared to other tools in the RCA Series

In SUFFIX's RCA toolkit, Problem-Analysis (the 4-axis framework) is the gateway. It classifies the problem by clarity of symptom, scope, urgency, and ownership before you pick a tool. Fact-Based Thinking, drawn from McKinsey practice, sharpens the problem statement so the analysis starts on solid ground.

Fishbone Diagram is the right call when a problem touches multiple functions at once and you need broad coverage across people, process, tools, and environment. Fault Tree Analysis (FTA) is built for problems with several parallel causes where AND/OR logic matters. FMEA is the proactive option, used before failure occurs to rank risks by Severity, Occurrence, and Detection.

Change Analysis is the natural pair to 5 Whys when there is a clear before-and-after moment. Barrier Analysis maps which defensive layers held and which broke. Parent Cause and Management Oversight zoom out to the organizational layer, where ownership models and reporting structures shape why problems repeat.

5 Whys is the lightest tool in the kit. It starts fastest, requires no template, and works well when the team has a working hypothesis to test layer by layer. Pick it when the problem is recurring, the cause feels structural, and you want a single disciplined chain rather than broad coverage.

When NOT to use The 5 Whys

5 Whys is weak in three situations.

When the problem has multiple parallel causes that interact, a single "why" chain picks one branch and misses the others. Fishbone Diagram fits better. If the parallel causes follow strict AND/OR logic, FTA is sharper.

When the problem started right after a clear change, Change Analysis is faster. Comparing before and after surfaces the shifted variable directly. You can then run 5 Whys on that variable for the structural cause.

When the goal is to prevent failure rather than diagnose one, FMEA is the right tool, scoring risks before they occur.

There is also a softer limit. If the chain leans on one person's memory, answers drift toward speculation. Run it with people who touched the process, not managers who heard about it secondhand.

Use case for digital product teams

For digital product teams, 5 Whys earns its keep on the recurring incidents that quick patches never fully close. Payment failures, sporadic checkout drops, sync issues with third-party services, and onboarding completion rates that keep slipping after every "fix" all reward a disciplined chain.

The SUFFIX way to run it is to keep the session short, evidence-based, and shared. Two or three people who touched the process, one document, and the actual data (logs, dashboards, vendor release notes) in front of the team. Write the problem in measurable terms, run the chain to the layer that points at a process or policy you own, then translate the answer into a durable fix with a named owner and a recheck date.

For executives, 5 Whys is the right tool when the question is "why does this problem keep coming back even after we fixed it?" It pushes the team past the most recent patch and into the conditions that made the patch necessary, which is usually where the real budget conversation lives.

FAQ

What is The 5 Whys and who created it?

The 5 Whys is a root cause analysis technique that asks "why" repeatedly to move from the visible symptom of a problem down to the underlying system or process cause. It was created by Sakichi Toyoda at Toyota Industries in the 1930s and refined inside the Toyota Production System under Taiichi Ohno. From there it spread across Lean manufacturing, Six Sigma, and software operations worldwide because of how cheap and fast it is to run.

Do you always have to ask "why" exactly five times?

No. The "5" is a guideline, not a hard rule. The real test is whether the answer points at something your team controls directly, like a process, a policy, or a structural decision. Some problems land at layer three. Others need layer seven. If the answer still rests on an external party, a single person's mistake, or "bad luck", the chain is not done. Keep going until you reach a layer where the team can change the system that produced the problem, and the fix is something you can actually own.

How is The 5 Whys different from Change Analysis?

Change Analysis is built for problems that started right after a clear change, like a new system, a workflow shift, or a vendor swap. It compares the before and after states to find the variable that moved. The 5 Whys is built for problems with no obvious trigger, where the team needs to drill down through layers of cause. Both belong in the root cause analysis toolkit and can be combined. Use Change Analysis to identify the shifted factor first, then run The 5 Whys on that factor to find the deeper structural cause.

How do you know you have reached the real root cause?

The main signal is the answer points at something your team can actually change, like a process, a policy, an ownership model, or an org structure. When you fix that layer, the same problem does not come back in a new form. If the answer still blames an external party, a single person, or random chance, keep going. A true root cause usually means redesigning how the team works, not just adding another reminder to an existing workflow.

Writer

Director

Jate Saitthiti