Practice with mock tests, learn from structured notes, and get your questions answered by a global forensic community, all in one place.
Systematic techniques for tracing engineering failures back to their origins, from the simple 5-Whys to probabilistic fault trees and barrier analysis used in aerospace, process-plant, and infrastructure investigations.
Last updated:
A bridge has fallen, a reactor has tripped, a fuselage panel has separated in flight. The immediate cause is obvious from the wreckage. What is far less obvious, and far more important, is why that immediate cause was able to happen. Root-cause analysis (RCA) is the structured process engineers use to keep asking why until they reach the conditions, decisions, or gaps in a system that made the failure possible in the first place.
The discipline ranges from a conversation tool you can sketch on a whiteboard in twenty minutes to a full probabilistic model that takes months to build. At the lightweight end sits the 5-Whys and the fishbone diagram, good for straightforward industrial incidents and for framing early-stage hypotheses. At the quantitative end sits Fault Tree Analysis (FTA), Failure Mode and Effects Analysis (FMEA), and its quantitative extension, FMECA, tools that aerospace, nuclear, and process-plant industries have used for decades to demonstrate safety cases before a failure happens and to reconstruct it precisely after one does.
What connects all these methods is a common goal: to distinguish the root cause, the condition whose removal would break the failure chain entirely, from the proximate cause and the contributing factors that sit on top of it. Get that distinction wrong and you fix the symptom, reassure management, and wait for the next incident to remind you that the real problem is still in the ground.
Fast, conversational, and often surprisingly deep.
Taiichi Ohno of Toyota developed the 5-Whys as part of the Toyota Production System in the 1950s. The premise is disarmingly simple: once you state the problem, ask why it occurred. Take the answer and ask why again. Repeat until you reach something actionable at the system level rather than the component level. The canonical Toyota example runs from a machine stopping (why? overload tripped the fuse; why? bearing was insufficiently lubricated; why? the oil pump was not drawing enough oil; why? the pump inlet strainer was clogged; why? no schedule existed for cleaning it) to a maintenance scheduling gap, which is the actual fix.
The Ishikawa (fishbone or cause-and-effect) diagram, developed by Kaoru Ishikawa in the 1960s for quality control at Kawasaki, takes the same idea and makes it spatial. The problem sits at the head of the fish. Six categories of cause radiate off the spine: Machine, Method, Material, Man (human), Measurement, and Environment (the 6M model in manufacturing, or 8P in service industries). Each branch is populated through brainstorming. The result is a visual map that shows which cause categories are dense with potential contributors.
For forensic engineering, fishbone diagrams are most useful in the early scoping phase of an investigation, when the team is mapping what it does not yet know. They keep attention on all cause categories and prevent premature focus on the most visible physical failure before the human, procedural, and organisational branches have been explored.
Work backward from the accident to the combinations of events that made it possible.
Fault Tree Analysis was developed at Bell Telephone Laboratories in 1961 for the US Air Force Minuteman missile program. NASA and the nuclear-power industry adopted it in the 1960s, and it became the de facto standard for safety cases in aerospace (MIL-STD-1629A) and process industries (IEC 61511). The formal method works as follows.
| Method | Direction | Best for | Output |
|---|---|---|---|
| Fault Tree Analysis (FTA) | Top-down (deductive) | Specific known failure, safety case for a system | Minimal cut sets, top-event probability |
| FMEA | Bottom-up (inductive) | Design review, no failure yet observed | Failure mode list, effect severity |
| FMECA | Bottom-up (inductive) | Prioritising corrective actions by risk | Criticality ranking, Risk Priority Number |
| Event and Causal Factor Chart (ECFC) | Chronological | Accident sequence reconstruction | Timeline with barriers and decisions |
Start from each component and ask what it can do wrong before the system fails.
FMEA was formalised by the US military in MIL-P-1629 in 1949, initially for the design of military aircraft and missile systems. It is now mandated or strongly recommended across automotive (AIAG FMEA manual, 4th edition), aerospace (SAE J1739), semiconductor, and medical-device industries. The core worksheet captures five things for each failure mode: the component that fails, the mode of failure (how it fails), the effect on the next higher assembly and on the system, the current controls in place, and the severity rating.
In a post-failure forensic investigation, FMEA works in reverse. The investigator constructs the failure-mode inventory for the failed component, asks which mode matches the observed fracture surface, deformation pattern, or functional loss, then checks whether that mode was foreseen in the original design review. If the failure mode appears in the original FMEA with a high-severity rating but no corrective action was taken, that is a significant finding for a liability determination.
Time is a dimension of root cause, not just a backdrop.
Event and Causal Factor Charting (ECFC) is a timeline-based RCA method that places events on a horizontal time axis and arranges causal factors and conditions on branches beneath them. Unlike FTA, which is atemporal, ECFC preserves the sequence of decisions, actions, and missed opportunities that led to an incident. It was developed in the 1980s by the US Department of Energy for nuclear plant incidents and is now widely used in oil-and-gas, transportation, and healthcare investigations.
The timeline structure is particularly valuable for forensic engineering because most major failures are not caused by a single event but by a sequence of small decisions, deferred maintenance items, and ignored signals that converge. The Texas City refinery explosion in 2005 is a clear example: the ECFC produced by the Baker Panel showed that at least fifteen decision points preceding the blowdown drum over-pressurisation were visible opportunities to interrupt the chain. Each looked manageable on its own. Placed on a chart together, they form a damning progression.
Every harm means a barrier was missing, weak, or defeated.
Barrier Analysis, drawn from Haddon's energy-release model and developed further by Johnson (MORT) and Svenson, asks a focused question: what barriers should have stood between the hazard and the target, and why did they not? A barrier is anything, physical, procedural, administrative, or protective equipment, whose purpose is to prevent the hazard from reaching the person, asset, or environment.
For each barrier the analyst asks: was this barrier present? Was it adequate for the actual hazard energy or magnitude? Was it used correctly? A barrier may be present on paper, absent in practice, degraded through wear, or bypassed by a workaround that gradually became routine. Barrier Analysis often reveals that the same failure would have been prevented if any one of three or four independent barriers had actually functioned, making it powerful evidence that the failure was a systemic management problem rather than a single operator mistake.
No single technique covers every kind of failure; investigators mix them deliberately.
A real forensic investigation rarely uses one RCA method in isolation. A typical approach starts with a 5-Whys and fishbone session to build the initial hypothesis map, then moves to ECFC to lay out the timeline and identify missed barriers, then uses FTA for any sub-system where a quantitative probability argument is needed for litigation. FMEA worksheets from the original design review, if available, are compared to the observed failure mode to assess foreseeability.
| Investigation phase | Recommended method | Output used for |
|---|---|---|
| Initial scoping and hypothesis generation | 5-Whys, Ishikawa diagram | Identifying areas for further investigation |
| Sequence reconstruction | Event and Causal Factor Chart | Establishing timeline, finding decision points |
| System safety analysis | Fault Tree Analysis | Quantifying failure probability, identifying single-point failures |
| Design foreseeability | FMEA / FMECA review | Establishing whether the failure mode was anticipated |
| Barrier and management failure | Barrier Analysis / MORT | Attributing responsibility to organisational controls |
A court case adds one further demand: the method chosen must be explainable to a non-specialist fact-finder. FTA diagrams can be presented as exhibits if simplified carefully. FMEA worksheets translate well because they are tabular and reference specific component names. The investigator's job is both to use rigorous methods and to make those methods accessible enough that the conclusions can be tested under cross-examination rather than accepted on authority.
An AND gate in a fault tree means that the output event occurs when:
Test yourself on Forensic Engineering with free, timed mocks.
Practice Forensic Engineering questionsSpotted an error in this page? Report a correction or read our editorial standards.