Practice with mock tests, learn from structured notes, and get your questions answered by a global forensic community, all in one place.
How engineers distinguish a unit that deviated from its own approved design specification from one built correctly to a flawed design, with quality-management tools and the Therac-25 software defect as a case study.
Last updated:
Two products roll off the same assembly line. The engineer approved the design, the materials were specified, the tolerances were set. One product works exactly as intended. The other fractures at a load well below what the design should handle, injures someone, and ends up in a courtroom. The design is sound, the engineering drawings specify a good product. But this particular unit was not built to those drawings. A fastener torque was too low. A weld did not fully fuse. A heat-treatment cycle was interrupted. This is a manufacturing defect, and it is analytically distinct from the design defect question in ways that change the entire investigation.
Establishing a manufacturing defect requires evidence. Not just the broken part in front of you, but the specification the manufacturer approved, the process records that should show how the unit was made, the quality checks that should have caught any deviation, and the inspection data that would confirm or deny that the system worked. This is where the quality management infrastructure of a modern manufacturer becomes central to the litigation.
Software makes all of this harder. Software has no dimensions to measure, no microstructure to examine under SEM, no torque value to compare against a specification. Yet software defects can be manufacturing defects in the legal sense, as the Therac-25 radiation machine demonstrated with lethal clarity in the mid-1980s. A software process that introduced a race condition into a released version of safety-critical code was a failure of the quality system that governed software development, and its consequences were overdoses that killed and seriously injured patients who came for cancer treatment.
The question is not whether the design is safe. The question is whether this unit matched the design.
A manufacturing defect is, by definition, a deviation from the manufacturer's own specification. The engineer's task is to show two things in sequence. First, what was the specification? Second, how did this unit depart from it?
A factory's own monitoring data is often the most powerful evidence in the case.
Modern manufacturing processes generate substantial records. Heat treatment charts record the temperature profile and time of every batch. Welding logs capture current, voltage, and wire feed rate. CMM printouts record measured dimensions of each part. Torque wrench logs show the setting used on each assembly shift. These records are the factual backbone of any manufacturing defect investigation, and they are obtained through discovery or regulatory disclosure.
Statistical Process Control (SPC) records are particularly valuable. A manufacturer running SPC will have control charts plotting a key process variable over time. The rules for these charts, developed by Walter Shewhart and refined by W. Edwards Deming and the ISO 7870 series, identify signals that the process has moved outside its natural variation. If the control chart for the relevant process shows an out-of-control signal on or around the date the plaintiff's unit was manufactured, it is direct evidence that the process was not in specification at that time.
When records are missing or incomplete, courts have drawn adverse inferences in some jurisdictions. A manufacturer that is required by its own quality plan to retain SPC data for five years but cannot produce the records for the relevant period must explain why. The absence of records that should exist is itself a fact that the engineer should flag, not simply note as a limitation.
Certification means a system existed. The investigation determines whether it worked.
ISO 9001 is the world's most widely adopted quality management standard. Its current version (ISO 9001:2015) requires organisations to establish documented processes for design, production, measurement, customer complaint handling, corrective action, and management review. Certification by an accredited body confirms that an auditor found the system in place and operating; it does not guarantee that every product leaving the factory is defect-free.
| ISO 9001 element | Litigation relevance | What the engineer looks for |
|---|---|---|
| Design controls (Clause 8.3) | Establishes the approved design specification that defines conformance for manufacturing defect analysis | Design review records, approved drawings, change history, FMEA outputs |
| Production process controls (Clause 8.5) | Shows what process parameters were required and monitored during manufacture | Work instructions, process sheets, monitoring records, SPC charts |
| Measurement and calibration (Clause 7.1.5) | Ensures inspection instruments were accurate when used to pass the plaintiff's unit | Calibration certificates for measurement equipment used on the production date |
| Nonconformance control (Clause 8.7) | Records instances where the process produced out-of-spec product | Nonconformance reports (NCRs), disposition records, concessions and waivers |
| Corrective action (Clause 10.2) | Documents how previously identified defects were addressed | Prior complaints similar to the plaintiff's failure; closed vs. open corrective actions |
In practice, the engineer reviews the quality management system documentation not to evaluate the system in the abstract but to identify what records should exist and then determine whether those records for the plaintiff's unit are present, complete, and consistent with the claimed specification compliance.
Removing hardware safety interlocks and trusting software alone, a decision with fatal consequences.
The Therac-25 was a computer-controlled linear accelerator built by Atomic Energy of Canada Limited (AECL) and introduced in 1982 for radiation therapy. It was designed to operate in two modes: a low-energy electron mode and a high-energy X-ray mode that required a metal target and a beam flattener to be positioned in the beam path. The Therac-25 differed from its predecessors (the Therac-6 and Therac-20) in a critical way: the hardware interlocks that physically prevented the high-energy beam from firing when the target was absent were removed, with software checks substituted instead.
Between June 1985 and January 1987, at least six patients in Canada and the United States received massive radiation overdoses from Therac-25 machines. Three died. The injuries, severe radiation burns, neurological damage, and death, resulted from the machine delivering its high-energy beam in electron mode, without the target assembly in place, at doses orders of magnitude above the therapeutic level. The patients experienced what felt like electric shocks and reported intense heat; many did not understand what had happened to them until days or weeks later when radiation damage became apparent.
The root cause, meticulously documented by Nancy Leveson and Clark Turner in a 1993 IEEE Transactions on Software Engineering case study, was a race condition in the software controlling the machine. The code included a variable called DATENT that tracked whether the collimator (the beam-shaping assembly) was in the correct position. When a radiotherapy technician entered a treatment mode and then edited it within a narrow time window, the multitasking operating system could set the beam output to high-energy mode before the DATENT flag was correctly updated. The safety check then passed on stale data, and the beam fired at high energy with the target absent.
The Therac-25 case is used in software engineering education worldwide because it illustrates several quality system failures simultaneously. The race condition itself was a software manufacturing defect in the sense that it was a deviation from correct behaviour that the design intent required. But surrounding it were deeper quality system failures: the reliance on a single software check in place of independent hardware interlocks; the absence of adequate testing for timing-dependent failure modes; an error-reporting system that displayed cryptic codes rather than actionable alarms; and an initial investigation process that attributed the first injuries to operator error rather than machine malfunction, delaying identification of the systemic problem.
The engineer's job is to translate quality-system evidence into the deviation-from-specification narrative the court needs.
Quality management records are technically dense and procedurally unfamiliar to non-engineers. The forensic engineer's report must translate them into a clear narrative: this is what the specification required; these are the records that should document compliance; here is the specific record that shows non-compliance; and here is how that non-compliance caused the failure that harmed the plaintiff.
For software, the equivalent of process control records are version control logs, test results, code review records, and defect tracking system entries. The question is whether the software release process included adequate testing for timing-dependent and concurrency failure modes, and whether the scope of testing was appropriate for a safety-critical application where the consequences of a software error were lethal.
Getting the distinction right early determines what evidence matters and what does not.
In many cases the distinction between manufacturing and design defect is blurred at the outset of the investigation. The engineer should resist reaching a conclusion before examining both the failed unit and the specification. A useful initial diagnostic is to ask: if we built another unit exactly to the drawings and tested it under the same conditions, would it fail the same way?
| Indicator | Points toward manufacturing defect | Points toward design defect |
|---|---|---|
| Conforming exemplars tested | Exemplars do not replicate failure under same conditions | Exemplars replicate the failure under similar conditions |
| Physical examination | Specific local deviation found (weld underrun, wrong hardness, missing feature) | No deviation from specification; unit conforms throughout |
| Field failure rate | Isolated failure; no pattern in product line | Pattern of failures across multiple units from same design |
| Prior complaints | No prior complaints for this failure mode | Prior complaints about the same failure mode |
| Design standard review | Product design meets applicable standards | Product design falls below applicable standards or a SAD exists |
In the Therac-25 context, the initial attribution of injuries to operator error is a classic example of prematurely ruling out a manufacturing (software quality) defect. The first technicians to encounter the malfunction reported the error to AECL and were told the machine passed all safety checks. It was only when multiple independent incidents occurred at different facilities that the pattern of a systemic quality failure became undeniable.
A machine part is tested and found to have a hardness significantly above the specification maximum. Which defect theory is most directly applicable?
Test yourself on Forensic Engineering with free, timed mocks.
Practice Forensic Engineering questionsSpotted an error in this page? Report a correction or read our editorial standards.