Cognitive Bias, Error and Quality in Odontology

The documented sources of error in forensic odontology, including contextual bias in comparison tasks, well-publicised casework failures, and the mitigation strategies of blind verification, linear-up testing, and proficiency programmes.

Last updated: 19 Jun 2026

Forensic odontology comparison tasks are subject to the same cognitive biases documented across all pattern-recognition disciplines: contextual information that is irrelevant to the comparison can shift an examiner's conclusion, and a second reviewer told the first examiner's opinion will tend to replicate it rather than independently assess it. Controlled studies using identical evidence presented under different case narratives have produced opposite conclusions from the same experienced examiners. The field's current response combines structural procedural controls, principally blind verification and linear-up comparison, with externally validated proficiency testing as a condition of accreditation.

Forensic comparison disciplines assume that a trained examiner looking at the same evidence will reach the same conclusion. Controlled studies have shown that the same bite-mark photographs, presented to examiners under different case narratives, can produce opposite conclusions. A mark assessed as a positive identification when the examiner is told the suspect has confessed is assessed as inconclusive when the examiner is told the suspect has a strong alibi.

Cognitive bias is a structural feature of human perception, not a character flaw. Forensic science's longstanding claim to objectivity made it slow to acknowledge that its practitioners were subject to the same perceptual pressures that psychology had documented in eyewitness memory, medical diagnosis, and financial forecasting.

This topic covers the main bias types that have been demonstrated in forensic odontology contexts, the error cases that drew the most serious professional and judicial attention, and the operational countermeasures, blind verification, linear-up comparison, and proficiency testing, that represent the field's current best practice for managing these risks.

By the end of this topic you will be able to:

Distinguish the principal bias types operating in forensic comparison (confirmation bias, anchoring, contextual bias, investigator effect) and explain why structural controls are required rather than voluntary effort.
Describe the blind verification and linear-up procedures, explain the specific bias each targets, and state their current accreditation status under ISO 17025 and ABFO guidelines.
Summarise the evidentiary significance of the documented casework failures in bite-mark attribution and disaster victim identification, including their role in subsequent Daubert hearings and legislative reform.
Interpret proficiency testing results, including the ABFO 1990s exercises, in terms of known error rate and inter-examiner agreement, and explain why these metrics matter for admissibility.
Explain sequential unmasking and information-firewall procedures as workflow-level bias controls distinct from individual examiner technique.

Key terms

Contextual bias: The distortion of a forensic judgment by case-irrelevant information, such as knowing the suspect's criminal history or that a confession exists, that should not influence a technical comparison.
Confirmation bias: The tendency to search for and weight evidence that confirms an existing hypothesis while discounting or not seeking disconfirming evidence. In forensic comparison, it manifests as finding 'matches' after being told who the suspect is.
Blind verification: A quality control procedure in which a second examiner reviews the evidence without being told the first examiner's conclusion. Prevents the second reviewer from anchoring on an established opinion rather than forming an independent one.
Linear-up: A structured comparison procedure modelled on the eyewitness line-up, in which the examiner assesses a set of candidates without knowing which is the suspect, reducing target-present bias.
Proficiency testing: Exercises in which examiners compare against test samples with known answers to measure accuracy and agreement. Results reveal error rates and identify training gaps, and are a cornerstone of laboratory accreditation.
Inter-examiner agreement: The degree to which independent examiners reach the same conclusion when shown the same evidence. Low inter-examiner agreement is a signal that a method's conclusions are not reproducible and therefore not reliable as a scientific basis for testimony.

Bias types in forensic comparison

Itiel Dror, a cognitive neuroscientist at University College London, has published extensively on how contextual information distorts forensic comparison judgments. His 2006 experiment with latent fingerprint examiners showed that the same pairs of prints, presented under different contextual frames, elicited contradictory conclusions from the same examiners. Variants of that design have since been replicated in fingerprints, footwear marks, and bite-mark analysis.

Several bias types recur across the odontology literature, and the mitigation strategy for each differs.

Confirmation bias: the examiner looks for features that confirm a proposed match and spends less effort searching for features that would exclude it. Especially pronounced when the examiner knows who the suspect is before doing the comparison.
Anchoring: when a first examiner has concluded 'identification', a second examiner doing a review is pulled toward that conclusion even if the evidence would support 'inconclusive' independently.
Contextual bias proper: irrelevant case facts (a criminal record, a confession, a prior conviction) shift the threshold at which the examiner decides the features are sufficient for a conclusion.
Investigator effect: when the forensic examiner is embedded within the investigative team, social pressure to support the working hypothesis can operate below the level of conscious awareness.

Documented error cases in odontology

The errors that attracted the most scrutiny came in two categories: false identifications in disaster victim identification (DVI) settings, and false bite-mark attributions leading to wrongful convictions. Both categories produced formal investigations and drove reforms.

In DVI work, the most studied failure involved misidentifications in the aftermath of the 2004 Indian Ocean tsunami. The sheer scale (an estimated 220,000 dead across 14 countries) and the pressure to return remains to families quickly created conditions in which confirmation bias could operate powerfully. Post-event reviews found cases where dental comparison was completed too quickly with poor-quality antemortem records and without proper reconciliation. Some families received the wrong remains.

In criminal contexts, the most consequential errors involved bite-mark testimony in homicide cases. Beyond Ray Krone (Arizona, 1992, exonerated 2002 by DNA), several cases from Mississippi in the 1990s linked to a single forensic odontologist, Michael West, who used an idiosyncratic ultraviolet fluorescence method involving special filtered glasses to detect bite marks invisible to other examiners, a technique never subjected to peer review or proficiency testing.

Timeline of major reforms in forensic odontology methodology.

Blind verification in practice

Blind verification has the clearest evidence base of any quality control procedure in forensic comparison. A second examiner reviews the same evidence independently, without being told the first examiner's conclusion. If both reach the same conclusion, confidence is higher. If they disagree, the disagreement itself is informative data about the evidence, not an error to be suppressed.

In a non-blind review, the second examiner is informed of the first conclusion before starting. Studies consistently show that this produces high agreement not because the conclusions are reliable but because the second examiner anchors on the first opinion. The apparent inter-examiner agreement is artefactual. Blind review breaks this anchoring by design.

Procedure	What examiner 2 knows	Risk	Use in accredited labs
Open review	First examiner's conclusion and reasoning	High anchoring; agreement is inflated	Declining; flagged by FSR and ABFO
Blind review	Evidence only; no prior conclusion	Low anchoring; disagreement surfaces genuine uncertainty	Recommended; required by ISO 17025 in practice
Linear-up	Set of candidates; target unknown	Reduces target-present bias in bite-mark work	Recommended for bite-mark comparisons post-2016

Implementing blind review requires a case management procedure that withholds the first conclusion until the second examiner has submitted an independent finding. Evidence from fingerprint and DNA laboratories that have introduced the procedure indicates the operational disruption is manageable and the quality improvement is measurable.

Proficiency testing: measuring what the field can actually do

Proficiency testing presents examiners with test problems that have known correct answers, concealed from the examiner. Results are scored and reported, usually with aggregate statistics across the group. This is standard practice in medical laboratory accreditation (under ISO 15189) and in forensic DNA typing (under ISO 17025 and FBI QAS). It has been applied more unevenly in comparison disciplines.

The ABFO ran formal proficiency exercises in the late 1980s and 1990s. The results were not reassuring. In one widely cited 1999 exercise, 32 ABFO diplomates assessed four bite-mark cases. Agreement on whether the injury was a human bite mark at all was below 85%. Agreement on specific comparisons was lower. These results were used as evidence in subsequent Daubert hearings to argue that the discipline did not have a known or acceptable error rate.

Proficiency test workflow for forensic odontology examiners.

Post-2016, both the ABFO and INTERPOL DVI have moved toward requiring documented proficiency and competency testing as a condition of participation in casework. ISO 17025-accredited dental comparison laboratories now include proficiency exercises in their quality management systems. The shift is from self-assessed expertise to externally validated performance.

Managing contextual information in the workflow

Instructing examiners to be more objective does not reduce contextual bias. The cognitive mechanisms that produce it are automatic and largely unconscious. Effective interventions are structural: changing who receives what information, and when, rather than relying on individual willpower.

Sequential unmasking: release case information to the examiner in stages, starting with only what is needed for the comparison, and adding investigative context only after the technical conclusion is recorded.
Task-relevant versus task-irrelevant information: the examiner needs the antemortem records and the postmortem findings. They do not need to know the suspect's criminal history, whether a confession exists, or what other forensic evidence has been found.
Documentation of the comparison before consultation: recording the features observed and the conclusion reached before discussing the case with investigators creates a contemporaneous record that is harder to retrospectively revise.
Separation of the verification examiner: the examiner performing blind verification should not have informal access to the case file, the investigating officers, or the first examiner's notes before completing their own assessment.

Quality management systems and external oversight

Quality management systems in forensic laboratories, required under ISO 17025 accreditation, create an institutional framework for managing bias risk. These systems define procedures, require documentation, mandate proficiency testing, and produce an audit trail for external review.

For forensic odontology specifically, the Texas Forensic Science Commission in 2016 issued a report recommending that bite-mark evidence not be admitted in Texas courts until its reliability had been better established. This was significant because it came from a state oversight body, not a defence attorney, and it applied the same framework the commission uses to assess laboratory error in DNA cases. Several other states have since restricted or excluded bite-mark evidence on similar reasoning.

In DVI contexts, INTERPOL's DVI guidelines require an independent second confirmation before a positive identification is recorded on the DVI Reconciliation Form. The confirmation must be performed by a different expert, blinded to the first finding. This is operationally embedded quality control, not an add-on.

Worked example

Bite-mark proficiency test at an ABFO workshop: what the results showed

A controlled test reveals the gap between practitioner confidence and measurable accuracy.

At an ABFO annual workshop in the late 1990s, 63 board-certified forensic odontologists were presented with four bite-mark cases assembled from actual case materials. The cases included marks of varying quality, suspect dentition models, and photographs. Examiners completed assessments independently, recording whether each mark was a human bite, whether it was suitable for comparison, and where they placed the conclusion on the five-tier scale.

Case 1 (clear mark, one suspect): Examiners agreed it was a human bite mark in 98% of responses. Agreement on the comparison conclusion: 73% reached 'consistent with' or above; 27% were more cautious or inconclusive.
Case 2 (degraded mark, two suspects): Agreement that the injury was a human bite fell to 63%. Examiners reached three different conclusions across the five-tier scale. Some included one suspect; others included the other; several excluded both.
Overall false-positive rate: in cases where the 'correct' answer was known (suspects verified innocent by other evidence), a non-trivial fraction of examiners still reached inclusion conclusions. This is the figure cited in Daubert challenges as demonstrating an unacceptable error rate.
Post-exercise discussion: disagreements were not resolved by consensus. The exercise revealed that different examiners weighted the same features differently, with no agreed weighting protocol.

The exercise was not published as a formal study, but its findings circulated in the legal and scientific communities and were cited in the NAS 2009 report. It remains the most frequently cited empirical data point about bite-mark examiner agreement in US legal contexts. The exercise demonstrated not examiner negligence but a methodological deficit: the absence of standardised feature-weighting meant that confidence in results was not supported by measured accuracy across practitioners.

Check your understanding

Question 1 of 4· 0 answered

An investigator tells the forensic odontologist that the suspect has already confessed before the comparison is done. What bias risk does this create?

Key Takeaways

Contextual and confirmation bias are documented in forensic odontology through controlled studies: the same evidence yields different conclusions when presented with different case stories.
Error cases including multiple Mississippi wrongful convictions and post-conviction DNA exonerations established that bite-mark identification could produce false-positives in real cases, not just controlled tests.
Blind verification, where the second examiner forms an independent opinion before learning the first conclusion, is the single most effective procedural control and is now required by ISO 17025-aligned quality systems.
Proficiency testing at ABFO workshops revealed examiner disagreement rates that have been cited in Daubert hearings as evidence the discipline lacked a known acceptable error rate.
Structural changes, sequential unmasking, information firewalls, and linear-up procedures, are more effective than relying on individual examiners' willpower to resist bias.

What is contextual bias in forensic odontology?

Contextual bias occurs when an examiner's opinion is influenced by case information that is irrelevant to the comparison task, such as knowing the suspect's identity or seeing a confession before doing the comparison. In bite-mark work, studies have shown that examiners given the same evidence but different contextual information can reach opposite conclusions.

What is a linear-up and how does it reduce bias?

A linear-up is a structured comparison procedure in which the examiner assesses a set of candidates (dental models or charts) without knowing which one is the suspect. It mirrors the eyewitness line-up procedure from psychology. By blinding the examiner to target identity, it reduces the tendency to confirm a pre-selected match.

What did the Mayfield fingerprint case teach forensic odontology?

The 2004 Mayfield misidentification, in which the FBI incorrectly matched a latent fingerprint to an American lawyer in the Madrid bombing case, was caused in part by confirmation bias: examiners who were told the print was from Mayfield found features to support that conclusion. The case prompted forensic disciplines, including odontology, to examine how contextual information should be managed in comparison work.

How do proficiency tests work in forensic odontology?

Proficiency tests present examiners with a set of simulated comparison problems where the correct answers are known to the test designer but not the examiner. Results are compared against the known answers and reported as pass/fail or scored. The ABFO has run proficiency exercises; results from early tests in the 1990s revealed significant disagreement among experienced examiners, prompting methodological reform.

Test yourself on Forensic Odontology with free, timed mocks.

Practice Forensic Odontology questions

Found this useful? Pass it along.

Spotted an error in this page? Report a correction or read our editorial standards.

Key Takeaways

Your journey to becoming a forensic professional starts here.