Practice with mock tests, learn from structured notes, and get your questions answered by a global forensic community, all in one place.
How Safety Management Systems and systemic failure models explain why organisations produce accidents despite individual competence, using Reason's Swiss Cheese Model, bow-tie analysis, and the Deepwater Horizon disaster as a defining case.
Last updated:
On 20 April 2010, the Deepwater Horizon semi-submersible drilling rig exploded in the Gulf of Mexico. Eleven workers died. The well blew out for 87 days, releasing an estimated 4.9 million barrels of oil. The Presidential Commission's final report, and the separate Bly Report commissioned by BP, both reached the same conclusion: this was not a failure caused by one reckless decision or one faulty component. It was the result of a Safety Management System that had systematically produced a tolerance for risk that no single engineer or executive had ever explicitly approved.
Safety Management Systems (SMS) are the formal organisational structures, policies, procedures, and cultures through which companies attempt to manage safety risk. When they work, they identify hazards before they cause harm, maintain the barriers that prevent escalation, and create a culture in which workers can raise concerns without fear. When they break down, they produce the conditions James Reason identified in his Swiss Cheese Model: multiple holes in defensive layers lining up until nothing stands between a hazard and a disaster.
For forensic engineers, the SMS dimension of an accident investigation is where the analysis moves from the physical evidence (what broke) to the organisational evidence (why the system allowed it to break). This distinction is not merely philosophical. In litigation, it is the difference between a finding of individual negligence and a finding of organisational negligence, and those findings attach to different defendants, different damages, and different regulatory consequences.
Accidents happen when holes in layers of defence align, not when one person makes one mistake.
James Reason, then Professor of Psychology at the University of Manchester, published his influential Human Error in 1990 and refined the Swiss Cheese Model through the 1990s. The model has been adopted in aviation (ICAO Annex 19 SMS framework), healthcare, nuclear power, and the offshore oil industry as the conceptual foundation for systemic accident investigation.
Reason distinguishes between two broad failure categories. Active failures are the unsafe acts committed by front-line workers: the operator who failed to close a valve, the pilot who misread the altitude, the nurse who gave the wrong dosage. These are visible and immediate, and they are almost always what early-stage investigations focus on. Latent conditions are the hidden failures that incubate within the organisation: a maintenance system that deferred inspection, a training programme that never covered an unusual scenario, a risk register that categorised a known hazard as acceptable because no one had been hurt yet.
The practical implication is that fixing the front-line operator who made the active failure is the least effective corrective action available. That individual will be replaced or retrained, but the latent conditions remain for the next operator in the same situation. Sustainable improvement requires identifying and closing the holes in the upstream layers: the design, the procedure, the training, the supervision, and the management culture that tolerated the precursor signals.
A bow-tie shows, on one diagram, what stops an accident happening and what limits the harm when prevention fails.
Bow-tie analysis emerged from the Shell and ICI major-hazard process-safety tradition in the 1980s and was formalised as a risk management tool by the Energy Institute, DNV, and others in the 2000s. Its structure is deliberate: the knot at the centre of the bow-tie is the critical event, the single most important moment to prevent. The left half shows the threat pathways and the prevention barriers. The right half shows the consequence pathways and the mitigation barriers.
An SMS is only as good as the management attention that keeps it alive.
A complete SMS, as defined by ICAO Annex 19 for aviation, the International Safety Management Code (ISM Code) for shipping, OSHA's Process Safety Management standard (29 CFR 1910.119) for process industries, and industry frameworks such as API RP 75 for offshore oil, has four functional pillars: safety policy and objectives, safety risk management, safety assurance, and safety promotion.
| SMS pillar | What it requires | Common failure mode in accident investigations |
|---|---|---|
| Safety policy and objectives | Clear accountability, measurable safety targets, management commitment | Production pressure overrides stated safety commitments; stop-work authority not exercised |
| Safety risk management | Hazard identification, risk assessment, barrier management, change management | Process changes not reassessed; MOC (management of change) bypassed under schedule pressure |
| Safety assurance | Monitoring, auditing, incident reporting, performance measurement | Near-miss reports filed and closed without root-cause investigation; audit findings deferred |
| Safety promotion | Training, competency, communication, safety culture development | Competency gaps not closed; safety culture assessments show fear of reprisal for raising concerns |
The most common systemic failure pattern the forensic engineer encounters is the SMS that looks complete on paper and is hollow in practice. A company can have a beautiful Process Safety Management manual, a full set of procedure documents, a completed risk register, and a functioning audit schedule, and still produce a catastrophic accident, because none of those documents were treated as living controls that actually governed decisions under production pressure. The distance between the written SMS and the practiced SMS is where most organisational accidents live.
The same act can be individual negligence, organisational negligence, or both, and the difference determines who pays and who is prosecuted.
In most jurisdictions, corporate manslaughter, corporate homicide, and equivalent offences require proof that a senior management failure (not just a front-line worker's error) caused the death. The UK Corporate Manslaughter and Corporate Homicide Act 2007 requires a 'gross breach of duty of care' attributable to the 'way in which its activities are managed or organised by its senior management'. Demonstrating that breach requires exactly the kind of SMS gap analysis that Reason's framework enables.
In the United States, criminal prosecution for major industrial accidents typically runs through a mix of environmental statutes (Clean Water Act, Clean Air Act) and workplace safety regulations (OSHA) rather than a corporate manslaughter framework. Civil liability under US law allows plaintiffs to reach senior management and the corporation through theories of negligent supervision, failure to maintain a safe workplace, and negligent SMS design or execution.
Every SMS failure pattern described in this topic appeared in the 2010 Gulf of Mexico blowout.
The Bly Report (BP's internal investigation), the Presidential Commission report, and the US Chemical Safety Board's investigation of the Macondo well blowout together constitute one of the most thoroughly documented industrial SMS failures in history. The primary physical cause was a failure of the cement job on the production casing, which allowed hydrocarbons to migrate up the wellbore. But the cement failure was one hole in one layer. What made the accident catastrophic was the alignment of holes in every other defensive layer.
The Deepwater Horizon investigation is a masterclass in how the same set of facts can be framed as individual human error (the crew misread the negative pressure test) or as systemic SMS failure (the company had no clear procedure for interpreting the test, a culture of schedule pressure that discouraged raising concerns, and multiple previous warning signals that were not investigated to root cause). Both framings are factually supported. In litigation, BP ultimately paid more than 65 billion US dollars in fines, damages, and cleanup costs, a figure that reflects both the direct physical harm and the depth of the organisational failure.
Translating Swiss Cheese and bow-tie findings into language a judge and jury can work with.
The challenge in presenting systemic failure evidence is making the connection between abstract organisational concepts and specific acts of commission or omission that a court can evaluate. An expert who testifies that 'the safety culture was deficient' without pointing to specific, documented, dated evidence will be attacked on cross-examination as offering impressionistic opinion. The same analysis grounded in SMS audit records, management review minutes, incident reports, and deferred corrective actions is far more durable.
The sequence that survives cross-examination typically runs: (1) state what the defendant's SMS required, citing the specific SMS document and section; (2) identify the specific evidence that this requirement was not met, citing dated records; (3) show how this gap produced the specific conditions that led to the accident, using the barrier analysis or bow-tie as the connective structure; (4) state what a competent operator in the same industry would have done, citing industry standards and comparator evidence if available. Each step is document-based and falsifiable, which is what Daubert reliability requires.
In Reason's Swiss Cheese Model, 'latent conditions' are best described as:
Test yourself on Forensic Engineering with free, timed mocks.
Practice Forensic Engineering questionsSpotted an error in this page? Report a correction or read our editorial standards.