Cognitive Bias, Expert Testimony and the 2009 NAS Critique

The discipline-shaping critique and its lasting impact: the 2009 NAS 'Strengthening Forensic Science in the United States' report's chapter on fire investigation (the lack of empirical foundation for several pattern-based origin-determination conclusions, the high error rates revealed in proficiency testing, the call for population-frequency anchoring of opinions), the sequential-unmasking + linear ACE-V + blind verification responses, the courtroom presentation of probability statements and Bayesian reasoning, expert-witness gatekeeping under Daubert / Frye + BSA 2023 + CrimPR + Cairns checklist, and the modern best-practice manuals that have responded.

Last updated: 18 Jun 2026

The 2009 National Academy of Sciences report "Strengthening Forensic Science in the United States" found that core fire investigation pattern indicators, including alligator charring, low burn lines, and concrete spalling, lacked controlled experimental validation and that examiner proficiency was not systematically tested. Subsequent UL-NIST blind studies found error rates above 20 per cent in some rounds. The report directly triggered Daubert admissibility challenges in US courts and accelerated structural reforms including sequential unmasking, blind verification, and evaluative probability reporting. These reforms have partially but not completely closed the evidentiary gap the NAS identified.

In 2009, the US National Academy of Sciences published "Strengthening Forensic Science in the United States: A Path Forward." Its fire-investigation chapter documented that pattern-based indicators used to classify fires as incendiary lacked empirical foundation, had never been validated against controlled experimental fires, and were applied by examiners whose proficiency was not systematically tested. The report drew a clear line: fire investigation before 2009 operated largely on untested heuristic, and the post-2009 discipline has been working to close that gap.

Key takeaways

The NAS found that alligator charring, low burn lines, and concrete spalling had never been tested for specificity in controlled accidental fires; the UL-NIST blind studies (2011) found examiner error rates above 20 per cent in some rounds.
The Daubert error-rate criterion is the direct legal consequence: without specificity data, false-positive rates for incendiary classification are unknown, and federal courts can exclude the opinion.
Itiel Dror's UCL research showed that the same fire pattern can be interpreted differently by the same experienced examiner depending on the case context provided, establishing that bias is structural, not personal.
Sequential unmasking addresses contextual bias by requiring all physical scene observations to be documented before the examiner receives police intelligence, witness accounts, or financial background on suspects.
The Cameron Todd Willingham case, reviewed by the Texas Forensic Science Commission in 2009, found that pattern indicators used to support the original arson conviction were inconsistent with modern fire science under NFPA 921.

This topic traces the NAS critique, its evidentiary consequences, and the structural responses that have partially but not completely closed the gap. For the accreditation and proficiency-testing frameworks the NAS called for, see the topic on quality systems: ISO 17025, NABL, ANAB and UKAS.

By the end of this topic you will be able to:

Explain the four core findings of the 2009 NAS report on fire investigation: the empirical-foundation gap, the Daubert error-rate problem, the cognitive-bias problem, and the proficiency-testing absence.
Describe the three main cognitive bias mechanisms in fire investigation (contextual bias, anchoring bias, expectation bias) and explain why structural rather than motivational remedies are required.
Compare sequential unmasking, blind verification, and linear ACE-V as structural responses to contextual contamination, including their operational constraints in field versus laboratory settings.
Explain the Bayesian likelihood ratio framework for evaluative reporting and identify why population-level specificity data limits its application to field fire pattern evidence.
Compare Daubert, Frye, CrimPR Part 19, and BSA 2023 Section 39 as expert-evidence gatekeeping mechanisms and assess their respective capacity to screen unreliable fire investigation testimony.

The 2009 NAS Report: Core Findings on Fire Investigation

The NAS committee, co-chaired by Harry T. Edwards (senior judge, US Court of Appeals for the DC Circuit) and Constantine Gatsonis (biostatistician, Brown University), reviewed fire investigation alongside fingerprints, bite marks, hair analysis, and other disciplines. Its fire-investigation findings, reported in Chapter 9, cited several specific categories of concern.

First, the empirical foundation problem. Many pattern indicators used to determine fire origin, including the shape of the burn pattern at the presumed area of origin, alligator charring depth and surface texture, low burn lines running along floor surfaces, and the distribution of calcination on gypsum wallboard, had been applied as heuristics derived from the personal experience of trained investigators. The NAS found that few of these indicators had been subjected to controlled experimental testing to determine their specificity: that is, whether they occurred in accidental fires without any added accelerant. The UL research published between 2004 and 2011, in which NFPA 921-based investigators blind-interpreted fire scene photographs from both arson and accidental fires, found error rates that were significantly higher than the near-zero error rates implied by examiner confidence. In some rounds, examiners disagreed with each other and with the known reference classification at rates above 20 per cent.

Second, the error-rate problem for Daubert. The NAS report explicitly framed this as a Daubert admissibility issue. Under Daubert v. Merrell Dow Pharmaceuticals (1993), admissibility of expert testimony in federal court requires, among other criteria, that the method have a known error rate. If no controlled studies had been conducted to determine the false-positive rate of inferring deliberate fire-setting from pattern indicators alone, then the Daubert criterion was not met, and fire investigation testimony constructed on those indicators was legally vulnerable even if the expert was personally experienced.

Third, the cognitive-bias problem. The NAS noted that fire investigators in most jurisdictions collected scene evidence, interviewed witnesses, reviewed police intelligence about the suspect, and then formed an opinion in the same investigation. This structure creates confirmation bias: an investigator who is told before examining the scene that the building owner is in financial difficulty and has recently increased his insurance will interpret ambiguous burn patterns through that frame. The NAS called for separation of the scene-observation function from the opinion-formation function, and for external check mechanisms.

Fourth, the proficiency-testing problem. The NAS found that proficiency testing for fire investigators was neither mandatory nor systematically conducted. The CTS fire debris scheme (for laboratory analysts) existed, but field fire investigation proficiency testing (where investigators examine test scenes and have their origin-and-cause conclusions checked against known reference) was rarely applied. This meant that error rates could not be empirically measured even if the discipline had wanted to report them.

The Science Behind the Critique: What Controlled Experiments Showed

The NIST experiments most directly relevant to the NAS fire critique were conducted at the Bureau of Alcohol, Tobacco, Firearms and Explosives (ATF) Fire Research Laboratory in Maryland and published through the NIJ grant programme. John DeHaan's laboratory research on burn pattern formation in furnished rooms, published progressively from the early 1990s through the 2000s, demonstrated that patterns previously attributed exclusively to accelerant use could be reproduced in ordinary accidental fires given the right fuel load and ventilation conditions. DeHaan's work formed part of the empirical basis for successive editions of NFPA 921.

The Underwriters Laboratories project, a collaboration between UL's Fire Safety Research Institute and NIST, ran a series of controlled arson-vs-accident test fires in which trained investigators (working under NFPA 921) were blinded to the actual cause and asked to classify each scene. Results published in 2011 showed that even experienced NFPA 921-trained investigators produced false-positive arson classifications in a material fraction of accidental test fires, and false-negative classifications (missing deliberate fires) in a smaller fraction of incendiary test fires. These error rates, while lower than the pre-NFPA 921 era, confirmed that pattern-only origin determination carried residual uncertainty that was not being disclosed to courts.

The BRE (Building Research Establishment) in the UK conducted parallel research on fire patterns in UK residential construction, which differs from US residential construction in materials (brick and block construction is more common in the UK, timber-frame more common in the US). BRE findings, published through the Home Office Scientific Development Branch and the Journal of Fire Sciences, similarly found that ventilation-controlled burning produced floor-level burn patterns and deep char deposits that could be mistaken for accelerant indicators under older interpretive frameworks.

In India, the National Accreditation Board for Laboratories and the Central Forensic Science Laboratory network have not, as of 2025, sponsored published controlled-experiment validation studies on fire pattern indicators in Indian construction types (which include concrete-framed multistorey construction, unreinforced masonry, and vernacular building types absent from US and UK research). This represents a gap: the NAS critique was anchored to US construction and US investigative practice, and its direct applicability to Indian scene patterns in reinforced concrete buildings has not been empirically tested.

Error-rate cascade in pattern-based fire investigation: at each step from scene observation to courtroom opinion, unverified heuristics introduce cumulative uncertainty that no single validation study has yet fully quantified across all construction types.

Cognitive Bias Mechanisms in Fire Investigation

Research in cognitive psychology applied to forensic science, led principally by Itiel Dror at University College London and collaborators including Jennifer Mnookin, Simon Cole, and scholars affiliated with the Innocence Project, has identified several bias mechanisms that apply specifically to fire investigation.

Contextual bias (also called task-irrelevant contextual information bias) occurs when information about the case context available to the examiner before or during analysis influences the direction of the result. In fire investigation, the most common form is the investigating officer briefing the fire examiner on suspected motive before the scene examination begins. An examiner who is told that the property owner has a history of insurance fraud and has been seen near the scene in the hours before the fire will approach the scene with a hypothesis that is not purely evidence-driven. Ambiguous patterns, including char depths that could represent either accelerant use or extended burning, will be interpreted in the direction of the pre-existing hypothesis.

Anchoring bias occurs when an investigator's initial judgment about origin or cause, formed early in the scene examination, becomes resistant to revision as conflicting evidence emerges later in the examination. Fire scenes are excavated progressively, and evidence about origin typically appears at different times. An examiner who concludes from early surface-level burn patterns that the fire started in a particular corner may resist revising that conclusion when deeper excavation reveals an electrical fault in an adjacent location.

Expectation bias (observer effect) occurs when the examiner knows what a previous examiner found and allows that finding to constrain their own independent analysis. In multi-examiner cases (for example, where an insurance investigator precedes a police investigator), knowledge of the earlier finding predisposes the second examiner to confirm rather than independently evaluate.

Itiel Dror's controlled studies on fingerprint examiners, published from 2006 onwards, demonstrated that the same fingerprint could be classified differently by the same examiner depending on the contextual information provided, even when the actual print had not changed. Although these studies focused on fingerprints, the underlying bias mechanism is domain-general, and subsequent research by Charman and colleagues extended the finding to fire investigation contexts specifically.

Three cognitive bias mechanisms in fire investigation, their trigger conditions, and the structural remedy each requires: contextual bias is addressed by sequential unmasking, anchoring bias by staged documentation before excavation proceeds, and expectation bias by genuinely blind re-examination.

Courtroom Presentation: Probability Statements and Bayesian Reasoning

The traditional fire investigation courtroom opinion is categorical: "In my opinion, the fire was incendiary in origin." The NAS critique, and subsequent development in the evaluative-reporting literature, calls for probabilistic language that discloses the uncertainty inherent in the opinion. The transition is most advanced in England and Wales, where the FSR Codes of Practice mandate evaluative reporting for DNA and progressively encourage it for other forensic disciplines including fire debris analysis.

The Bayesian evaluative framework expresses the evidential weight of a finding as a likelihood ratio (LR): the probability of the evidence (the observed pattern or residue finding) given the prosecution's proposition (accelerant was added), divided by the probability of the evidence given the defence's proposition (no accelerant was added; the observed pattern resulted from ordinary pyrolysis). An LR greater than 1 supports the prosecution's proposition; an LR less than 1 supports the defence's proposition. The examiner does not assign a probability to guilt or innocence; that is for the jury. The examiner only quantifies how much more or less probable the evidence is under one proposition than the other.

For fire debris GC-MS findings, the LR can be empirically grounded: population-level data from fire debris proficiency testing and OSAC validation studies provide the denominator (how often does a substrate produce the observed chromatographic pattern in the absence of added accelerant?). For field fire pattern evidence (burn shapes, char depth, calcination), the empirical data for LR computation are sparse. This is precisely the NAS critique: without the population-level frequency data that would populate the denominator of the LR, probabilistic opinion language becomes verbal rather than empirical, and the apparent precision of Bayesian reporting conceals a fundamentally unsupported inference.

The UK Court of Appeal in R v. T [2010] EWCA Crim 2439 (a footwear case) and the subsequent discussion in R v. Dlugosz [2013] EWCA Crim 2 warned against LR evidence presented without an adequate empirical database. The Forensic Science Regulator's guidance and the Royal Statistical Society's guidance on expert probability evidence both emphasise that LR reporting with inadequate population data is not inherently safer than categorical reporting: it creates a false veneer of quantitative precision over an essentially subjective judgment.

In India, the Bharatiya Sakshya Adhiniyam 2023 (BSA) Section 39 governing expert evidence does not prescribe a specific reporting language or LR framework. The courts have consistently treated categorical expert opinion as the expected format. Bayesian-framed evidence has appeared in a small number of DNA cases before the Supreme Court and Delhi High Court, but fire investigation opinions in Indian courts remain almost entirely categorical, mirroring the pre-2009 pattern in the US and UK. The transition, if it comes, is likely to lag the UK timeline by a decade or more.

Reporting style	What the examiner states	Strengths	Limitations	Jurisdiction trend
Categorical	The fire was incendiary in origin.	Straightforward for juries; maps to legal fact questions	Conceals uncertainty; binary when reality is probabilistic	Still dominant in India and most US state courts
Qualified categorical	In my opinion, the fire was most probably incendiary, based on X, Y, Z indicators.	Acknowledges limitations without abandoning conclusion	Still non-quantitative; qualifier 'probably' is undefined	Common in US federal courts post-Daubert
Verbal LR (evaluative)	The evidence is more consistent with deliberate fire-setting than with accidental fire.	Direction of inference explicit; avoids false precision	'More consistent' undefined; no empirical calibration	FSR-encouraged in England and Wales
Numerical LR	The evidence is approximately 40 times more probable under the prosecution's proposition than the defence's proposition.	Quantified; Bayes-coherent; integrates with other evidence	Requires population database; complex for juries	Aspirational in UK fire debris; rare in field investigation

Expert-Witness Gatekeeping: Daubert, Frye, BSA 2023 and CrimPR

In the United States, the two dominant admissibility standards are Daubert (applied in federal court and a majority of state courts) and Frye (applied in a minority of state courts, including California, Illinois, and New York). Under Daubert, the trial judge assesses whether the methodology is (1) testable, (2) peer-reviewed, (3) generally accepted, and (4) associated with a known error rate. The NAS critique directly weakened fire investigation testimony on criterion (4): if no controlled studies established the error rate of pattern-based incendiary classification, the Daubert error-rate criterion was not satisfied. Following the NAS report, defence challenges to fire investigation testimony using the Daubert framework increased, and several federal and state courts excluded or limited testimony where the challenged indicators (alligator charring, low burn without accelerant, as catalogued in the topic on origin and cause determination) were the primary basis of the incendiary classification.

The Frye standard, older and more permissive, requires only that the methodology be generally accepted within the relevant scientific community. Because NFPA 921 itself is generally accepted, testimony based on NFPA 921 methodology tends to survive Frye challenges more easily. However, the "generally accepted" criterion has its own complexity: NFPA 921 is generally accepted as a framework, but specific pattern indicators within that framework may not be generally accepted as definitive incendiary evidence, and Frye scrutiny at that granularity is available.

In the UK, expert evidence in criminal proceedings is governed by Criminal Procedure Rules Part 19. CrimPR Rule 19.4 requires the expert to provide a statement of the expert's qualifications, the methodology, the limitations of the methodology, and whether the methodology represents currently recognised standards. The "Cairns checklist," developed from the judgment in R v. Dlugosz and the subsequent work of the Law Commission, provides a structured framework for assessing whether expert evidence meets the CrimPR threshold. The FSR Codes of Practice, now statutory under the Forensic Science Regulator Act 2021, intersect with CrimPR Part 19 by providing the regulatory baseline for "currently recognised standards."

In India, the admissibility of expert opinion under BSA 2023 Section 39 (successor to IEA Section 45) rests with the court's assessment of the expert's competence. Unlike Daubert, there is no structured methodology test. The expert is cross-examined on credentials, experience, and method, but the court is not formally required to conduct a pre-admission gatekeeping hearing (the Indian equivalent of a Daubert hearing is not a codified feature of Indian evidence law). Defence challenge of fire investigation opinions in Indian courts therefore focuses on cross-examination of the witness rather than pre-trial exclusion. This gives defence counsel the opportunity to expose methodological weakness but also means that juries (or judges) are exposed to problematic testimony without a pre-screening filter.

Key terms

NAS Report 2009: 'Strengthening Forensic Science in the United States: A Path Forward,' published by the National Academy of Sciences. Found that fire investigation pattern indicators lacked empirical validation, that proficiency testing was absent, and that Daubert-relevant error rates were unknown. Triggered a decade of reform in US forensic science.
Contextual bias: The influence of case-relevant background information (suspect financial difficulties, police intelligence) on the direction of an examiner's technical analysis. Demonstrated experimentally for forensic examiners by Itiel Dror and colleagues at UCL. Structurally addressed by sequential unmasking.
Sequential unmasking: An examination protocol in which case context is released to the examiner in a controlled sequence: technical-analysis steps are completed and documented before investigative context is revealed, preventing premature contextual contamination of pattern interpretation.
Likelihood ratio (LR): The ratio of the probability of the evidence under the prosecution's proposition to the probability of the evidence under the defence's proposition. The Bayesian evaluative reporting framework replaces categorical 'incendiary' or 'accidental' conclusions with an LR that quantifies the direction and strength of the evidence.
Daubert standard: The US federal court expert-admissibility test established in Daubert v. Merrell Dow Pharmaceuticals (1993), requiring that expert methodology be testable, peer-reviewed, generally accepted, and associated with a known error rate. Applies in federal court and a majority of US state courts.
Frye standard: An older US expert-admissibility test (Frye v. United States, 1923) requiring only that the methodology be generally accepted in the relevant scientific community. Less demanding than Daubert; applies in California, Illinois, New York, and a minority of state courts.
Blind verification: A quality-assurance step in which a second analyst re-examines an exhibit independently, without knowledge of the first analyst's conclusion. Addresses expectation bias but is structurally difficult for field fire scene re-examination, where the live scene is no longer available.
ACE-V: Analysis, Comparison, Evaluation, Verification: a linear examination framework that documents each stage of the forensic analysis separately, creating an audit trail that allows later review of whether observation and evaluation were sequentially separated.
BSA 2023 Section 39: The expert-evidence provision of India's Bharatiya Sakshya Adhiniyam 2023, successor to Indian Evidence Act Section 45. Governs the admissibility of expert opinion without a Daubert-equivalent structured gatekeeping hearing; challenge is conducted through cross-examination rather than pre-trial exclusion.
CrimPR Part 19: Criminal Procedure Rules Part 19 (England and Wales): the expert-witness procedural code requiring disclosure of qualifications, methodology, limitations, and whether the method represents currently recognised standards. Intersects with the FSR Codes of Practice.

Worked example

Post-Conviction Review of a 1998 Arson Conviction Using NAS 2009 Methodology

When the NAS report landed in 2009, it did not just change how future fires would be investigated. It made every old conviction a potential wrongful one.

Scene: A 1998 conviction for arson with intent to endanger life is referred to a post-conviction review body. The original investigation concluded incendiary cause based on three findings: alligatoring char on floorboards interpreted as indicating rapid accelerant-fuelled fire; a low burn pattern at the living room door threshold; and the investigator's testimony that fire development was inconsistent with the reported electrical fault without an accelerant.

Step 1 (NAS methodology gap analysis): The reviewing expert applies NFPA 921 (2021 edition) criteria. Finding 1: NFPA 921 Chapter 7 now explicitly states that alligatoring char morphology is not a reliable accelerant indicator; large shiny alligator char is associated with post-flashover rapid burning of ordinary cellulosic fuels. Finding 2: post-flashover burning of ordinary furnishings regularly produces floor-level damage indistinguishable from poured-accelerant patterns without chemical testing. Finding 3: the original investigator offered no HRR calculation or fire modelling to support the fire development opinion.

Step 2 (Bias identification): Applying a linear sequential unmasking analysis to the original case file, the reviewing expert identifies confirmation bias: the investigator's contemporaneous notes use the word "arson" as a heading before any physical evidence analysis was completed. Fire scene photographs were labelled with interpretive terms ("pour pattern," "trailer") before chemical testing results were available.

Step 3 (Negative result suppression): The original GC-MS laboratory report is retrieved. The analyst reported "no ignitable liquid residues detected." This negative result was not discussed in the investigating officer's report or the expert's court testimony. Failure to integrate a negative chemical finding with a pattern-based arson conclusion is a significant methodological error under any edition of NFPA 921.

Conclusion: The post-conviction review concludes that the three pattern indicators relied upon at trial are not, under current methodology, sufficient to support an incendiary determination, and the negative GC-MS result directly contradicts the accelerant hypothesis. The case is referred to the Criminal Cases Review Commission with a finding that the original expert testimony did not meet the scientific reliability standard articulated in the NAS 2009 report.

Practice

Question 1 of 5· 0 answered

The 2009 NAS report on forensic science identified a specific evidentiary problem with fire pattern indicators as used in origin-and-cause determination. What was the core finding?

Has the 2009 NAS critique of fire investigation been fully addressed?

Partially. NFPA 921 (2024 revision), OSAC validation studies, updated ANAB supplemental requirements, and FSR evaluative-reporting guidance have closed significant gaps in laboratory fire debris practice. However, population-level specificity data for pattern indicators across diverse construction types, fuels, and ventilation conditions remain incomplete. UL-NIST and BRE research programmes are progressively filling the database, but pattern-only origin determination is not yet fully validated across all construction types encountered globally.

Does cognitive bias affect laboratory fire debris analysts the same way it affects field investigators?

Yes, though the mechanisms differ. GC-MS analysts told the case hypothesis before reading the chromatogram, or asked to verify a colleague's result, are susceptible to the same contextual and confirmation bias that Dror demonstrated in fingerprint examiners. The structural remedy is blind analysis: the analyst must not receive the submitting investigator's working conclusion before interpreting the chromatographic data. Several ANAB-accredited laboratories have formalised this as a documented SOP step.

Is the Bayesian likelihood ratio framework legally required for fire evidence in England and Wales?

Not mandated in the strict sense, but the FSR Codes of Practice require examiners to frame findings in terms of propositions rather than categorical conclusions, which is the evaluative reporting framework. Verbal LR framing ('the evidence is more probable under the prosecution proposition') is the standard expected; numerical LRs are used where an empirical database supports them. For field fire pattern evidence, the absence of robust population data means numerical LRs are rarely offered. The accreditation requirements underpinning this framework are covered in the topic on [quality systems: ISO 17025, NABL, ANAB and UKAS](/topics/forensic-fire-arson-explosives/quality-systems-iso-17025-nabl-anab-ukas-and-proficiency-testing-for-fae-labs).

Test yourself on Forensic Fire, Arson and Explosives with free, timed mocks.

Practice Forensic Fire, Arson and Explosives questions

Found this useful? Pass it along.

Spotted an error in this page? Report a correction or read our editorial standards.