History of Statistical Evidence in Courts

Courts have used statistical arguments for more than a century, but landmark cases show how probability can mislead juries when presented without rigorous foundations. This topic traces that history from early cases through People v. Collins and R v. Sally Clark, drawing lessons for expert witnesses who use numbers in court.

Last updated: 24 Jun 2026

Statistical evidence in criminal courts has a history marked by genuine insight and serious error. From the nineteenth-century cases involving actuarial life tables to the DNA databases of the present day, courts have asked experts to attach numbers to evidence, and those experts have sometimes provided numbers that were technically wrong, conceptually confused, or both. The two cases that define the modern debate are People v. Collins (California Supreme Court, 1968) and R v. Sally Clark (England and Wales Court of Appeal, 2003). In Collins, a prosecutor used fabricated probabilities and the product rule to produce an absurdly precise match statistic; the California Supreme Court reversed the conviction partly on statistical grounds. In Clark, a paediatrician cited a figure of 1 in 73 million for two cot deaths in the same family, the figure was wrong, the reasoning behind it contained two distinct statistical fallacies, and Clark served more than three years in prison before her acquittal.

Both cases illustrate the same underlying problem: probability statements made without rigorous foundations, presented to lay juries who lack the tools to evaluate them, in an adversarial setting where the opposing side may lack the expertise to challenge them effectively. This is not a problem confined to early legal history. Courts in the United States, the United Kingdom, Australia, Germany, India, and elsewhere continue to receive statistical evidence in cases involving DNA, fingerprints, tool marks, and questioned documents, and the risk of misuse remains real. India's Bharatiya Sakshya Adhiniyam 2023 (which replaced the Indian Evidence Act 1872) carries forward the framework for expert opinion evidence; similar provisions exist in US Federal Rule of Evidence 702, and the UK's Criminal Procedure Rules. The legal structures for admitting expert evidence were not built with statistical reasoning specifically in mind, and the gap between what the law requires and what rigorous probability demands has been the site of recurring miscarriages of justice.

The history of statistical evidence in courts is, in large part, a catalogue of the same errors recurring in new domains. Understanding which errors recur, why they are cognitively seductive, and what institutional and technical responses have been developed is essential for any forensic practitioner who uses numbers in a courtroom.

By the end of this topic you will be able to:

Describe the statistical errors made in People v. Collins and explain why the California Supreme Court reversed the conviction.
Identify the two distinct fallacies in the Sally Clark case and distinguish the independence assumption error from the prosecutor's fallacy.
Define the prosecutor's fallacy and the defence fallacy, and explain each using a concrete numerical example.
Explain what a likelihood ratio is and why it is the preferred format for reporting evaluative forensic conclusions in UK and ENFSI guidance.
List at least four duties that an expert witness using statistical evidence owes to a court, drawing on post-Clark judicial and professional guidance.

Key terms

Prosecutor's fallacy: The error of treating P(evidence | innocent) as if it equals P(innocent | evidence). A small random-match probability does not itself establish a high probability of guilt; Bayes' theorem and a prior are required to make that conversion.
Defence fallacy: The mirror error: arguing that because many people in the population share a feature, the evidence against any individual who has that feature is weak. This ignores the fact that only a very small number of those people are actually suspects in the case.
Product rule (probability): The rule that P(A and B) = P(A) x P(B) holds only when A and B are independent. Applying it to correlated events produces probabilities that are far smaller than the true values, making a coincidence appear impossibly rare.
Likelihood ratio (LR): The ratio of the probability of the observed evidence given the prosecution hypothesis to the probability of the same evidence given the defence hypothesis. An LR greater than 1 supports the prosecution hypothesis; an LR less than 1 supports the defence hypothesis.
Transposition of the conditional: The logical error of swapping a conditional probability with its converse: claiming that P(E|H) = P(H|E). This is formally equivalent to the prosecutor's fallacy and is one of the most common errors in forensic testimony.
Reference class: The population against which a probability or frequency is calculated. Choosing the wrong reference class, for example using general population allele frequencies rather than frequencies within the relevant ethnic subgroup, produces misleading statistics.

Early statistical arguments in court

Courts received numerical evidence long before modern forensic science. Actuarial tables were used in civil litigation about life expectancy from the eighteenth century. Fingerprint frequency arguments were made from the early twentieth century, initially without any validated population database. The Dreyfus affair in France (1894 to 1906) involved disputed handwriting comparison evidence, and the statistician Henri Poincare testified that the mathematical arguments presented by the prosecution expert were invalid: an early instance of a scientist formally challenging the statistical claims of another expert in court.

In the United States, the use of blood group typing in paternity and criminal cases from the 1920s onwards introduced the idea of expressing forensic conclusions as frequency statements. By the 1950s, serological evidence was routinely presented with probability figures, usually without validation studies for the specific population involved. The pattern was consistent: a new forensic method produced numbers, practitioners presented those numbers to courts, and neither the courts nor the legal profession had the tools to evaluate whether the numbers were well-founded.

The same dynamic appeared in the United Kingdom in the nineteenth-century trials of Florence Maybrick (1889) and in several arsenic-poisoning cases where chemical analyses were presented with implied precision that the underlying methods could not support. The concept that an expert might be technically accurate about a measurement but misleading about its forensic significance was not well developed in legal doctrine until much later.

People v. Collins (1968): probability without foundations

People v. Collins, decided by the California Supreme Court in 1968, remains the leading case study in how probability can be misused in court. A robbery was committed by a couple matching certain descriptions: a blonde woman with a ponytail, a bearded black man, and a yellow car. The prosecution called a mathematics instructor who assigned probabilities to each characteristic (for example, one in ten for a woman with a ponytail, one in four for a bearded man) and multiplied them together to produce a figure of 1 in 12 million. The prosecutor then argued that the chance of there being another couple in the area matching all these criteria was essentially zero, so the defendants must be the perpetrators.

The California Supreme Court identified several distinct errors. First, the assumed probabilities had no empirical basis; they were invented. Second, even if the probabilities were correct, the product rule applies only to independent events, and it was not established that these physical characteristics are independent of each other in the population (people with certain characteristics tend to appear together). Third, even accepting the 1 in 12 million figure as correct, the argument that the defendants were therefore guilty commits the prosecutor's fallacy: the probability that a random couple shares these features is not the same as the probability that these particular defendants are innocent. The court reversed the conviction.

Collins is also notable for an additional mathematical point raised in the judgment. Even if the 1 in 12 million figure were correct, probability theory shows that when you have a large population to search across and find one matching couple, the probability that the matching couple is the actual perpetrator is substantially less than certainty; there is a meaningful chance that more than one couple in the relevant area matches the description. The court's analysis of this point was an early judicial engagement with the concept that low random-match probability and high probability of guilt are not the same thing.

R v. Sally Clark (2003): two fallacies and a wrongful conviction

Sally Clark was a British solicitor convicted in 1999 of murdering her two infant sons, both of whom had died suddenly. The prosecution case included evidence from Sir Roy Meadow, a paediatrician who cited a statistic from the CESDI (Confidential Enquiry into Stillbirths and Deaths in Infancy) report: the chance of a cot death in a family with Clark's socioeconomic profile was approximately 1 in 8,543. He then squared this figure to produce approximately 1 in 73 million, stating this as the probability of two cot deaths occurring by chance in the same family.

Two separate statistical errors combined to produce the figure. The first was the independence assumption: squaring the single-death probability treats the two events as statistically independent. In fact, cot death risk has a familial component. If one child in a family dies of sudden infant death syndrome, the risk for subsequent children is elevated, not the same as the base population risk. The correct calculation, accounting for the known familial correlation, would have produced a much higher probability than 1 in 73 million. The second error was then introduced when the prosecution used this figure: the 1 in 73 million figure was presented to the jury as, in effect, the probability that Clark was innocent. That is the prosecutor's fallacy. The probability of two cot deaths given innocence is not the same as the probability of innocence given two cot deaths. To convert between them requires knowing the prior probability of guilt and the probability of the alternative (murder of two children), neither of which was supplied.

Error	What was claimed	What should have been said
Independence assumption	P(2 cot deaths) = P(1 cot death)^2 = 1/73,000,000	Cot death risk is correlated within families; squaring the single-death probability is not valid
Prosecutor's fallacy	1 in 73 million is the chance Clark is innocent	1 in 73 million was P(2 deaths \| chance), not P(innocent \| 2 deaths); Bayes' theorem is required to derive the latter
Reference class	Figure from one socioeconomic profile applied universally	The CESDI figure applied to a specific risk profile; its applicability to Clark's exact circumstances was not verified

The Royal Statistical Society issued a public statement in October 2001, before Clark's first appeal, noting that the statistical evidence had not been presented in an appropriate way and that the figure had no statistical basis. Clark's first appeal failed, but her second appeal in 2003 succeeded, partly on statistical grounds and partly because of undisclosed microbiological evidence. She was released after serving more than three years. Clark died in 2007. The case resulted in a formal review of other cases in which Meadow had given evidence and in significant changes to judicial directions on statistical evidence in England and Wales.

The prosecutor's fallacy and the defence fallacy

The prosecutor's fallacy and the defence fallacy are the two canonical errors in courtroom probability. Both arise from the same underlying confusion: the failure to distinguish between a conditional probability and its converse. In formal terms, P(E|H) is not the same as P(H|E), and treating one as the other is called the transposition of the conditional.

The prosecutor's fallacy in its simplest form: a DNA profile has a random match probability of 1 in 1 million. The prosecutor argues: there is only a 1 in 1 million chance that a random person shares this profile, so there is only a 1 in 1 million chance the defendant is innocent. This is wrong. The 1 in 1 million figure is P(matching profile | person is not the source). The probability that the defendant is not the source, P(not the source | matching profile), depends additionally on: how many other people in the relevant population might have been the source, what other evidence exists, and the prior probability before the DNA evidence was considered. In a city of 10 million people, approximately 10 people share the profile. If the defendant was identified solely by database trawl with no other evidence, the probability they are the source given the match is around 1 in 10, not 1 in 1 million.

The defence fallacy runs in the opposite direction: because 10 people in the city share the profile, the evidence against any one of them is weak. This is also wrong. In a case where the suspect pool has been narrowed by geography, opportunity, or other evidence to a small group, the base rate among that group may be very different from the base rate in the general population. The relevant question is the probability of a match within the realistic suspect pool, not within the entire city.

The likelihood ratio framework is the professional response to both fallacies. A likelihood ratio of 1 million means the evidence is 1 million times more probable if the prosecution hypothesis is true than if the defence hypothesis is true. This statement is precise, avoids both fallacies, and leaves the jury to weigh it alongside all other evidence. The European Network of Forensic Science Institutes (ENFSI) Guideline for Evaluative Reporting (2016) requires this format for evaluative conclusions in member laboratories. Similar guidance has been adopted by the Forensic Science Regulator in England and Wales, and by forensic science bodies in Australia and New Zealand. See Role of Statistics in Evidence Evaluation for the full framework.

Other landmark cases and jurisdictions

People v. Collins and R v. Sally Clark are the most widely cited cases, but the same categories of error appear across jurisdictions. In R v. Deen (England, 1994), the Court of Appeal reversed a conviction where a forensic scientist had transposed the conditional in explaining a DNA match, telling the jury that the probability of the defendant being innocent was 1 in 3 million, a statement that confused P(match | innocent) with P(innocent | match). In R v. Adams (England, 1996 and 1998), the Court of Appeal considered whether Bayesian reasoning should be presented to juries using formal probability calculations, ultimately concluding that juries should not be directed to reason according to Bayes' theorem as a formula, but should receive evidence in terms of likelihood ratios with verbal guidance.

In the United States, the National Research Council's 1996 report (NRC II) on DNA evidence in the courts was a landmark response to the perception that DNA statistics were being misused. The report introduced the ceiling principle and later the theta correction for accounting for population substructure in allele frequency calculations, responding directly to defence challenges in cases including People v. Castro (New York, 1989), where a trial court ruled that the DNA results of the prosecution's laboratory were inadmissible because the laboratory had not followed its own protocols. In Australia, the High Court in Festa v. The Queen (2001) addressed the reliability of coincidence evidence where a complainant identified the accused by an unusual vehicle, and statistical arguments about vehicle frequency were central to the appeal.

Germany provides a contrasting institutional model. German courts use a mixed panel of professional and lay judges rather than a pure jury system, and the role of expert witnesses is more closely supervised by the court. Studies of DNA evidence in German criminal proceedings show fewer instances of the prosecutor's fallacy in reported judgments, though they are not absent. The European Court of Human Rights has addressed statistical evidence in several cases concerning the right to a fair trial under Article 6 of the European Convention on Human Rights, holding that inadequate explanation of statistical evidence to a jury can in principle amount to a breach of the right to a fair hearing.

In India, the Bharatiya Sakshya Adhiniyam 2023 governs the admissibility of expert opinion under sections that track the former provisions of the Indian Evidence Act 1872. DNA evidence is now explicitly addressed in the Code, but statistical framework requirements for how probability conclusions should be presented remain underdeveloped in Indian case law compared to the detailed judicial guidance now available in England and Wales or the ENFSI guidelines used across Europe. The Digital Personal Data Protection Act 2023 introduces additional considerations for how population databases used to generate forensic statistics may be assembled and accessed.

Lessons for expert witnesses using statistical evidence

The recurring cases produce a clear set of duties for any expert who uses statistical arguments in court. These duties have been codified in professional guidelines, judicial directions, and law commission reports across multiple jurisdictions, and they converge on the same points.

State assumptions explicitly. Every probability figure rests on assumptions about independence, reference population, and method validity. These must be stated in the report, not left implicit.
Report in likelihood ratio format. The ENFSI (2016) and Forensic Science Regulator (2020) guidelines require evaluative conclusions to be expressed as likelihood ratios with a verbal scale (for example, LR > 10,000 described as very strong support). This format avoids both the prosecutor's and defence fallacies.
Stay within your expertise. An expert in paediatric medicine (as in the Clark case) who presents a statistical calculation should have that calculation reviewed by a statistician before trial. The expert should not present statistical conclusions beyond their domain competence.
Do not conflate conditional probabilities. The expert must not state or imply that P(E|H) equals P(H|E). If asked to comment on the probability of guilt, the expert should explain that this is not within their remit: their role is to report the likelihood ratio, not the posterior probability.
Disclose the reference database. Any frequency or probability figure must be tied to a named, validated reference database. See Population Databases for Forensic Statistics for what constitutes an adequate database.
Account for correlated features. The product rule cannot be applied to features that are correlated. Independence must be verified empirically or an adjusted calculation must be used.

Worked example

Tracing the prosecutor's fallacy through a DNA case

Walk through a concrete DNA match scenario to identify where the prosecutor's fallacy enters and how to correct it.

A DNA profile from a crime scene is compared to the defendant's reference sample. The forensic scientist calculates a random match probability of 1 in 500,000 for the general population. At trial, the prosecutor says: the chance that someone other than the defendant left this DNA is 1 in 500,000, so there is only a 1 in 500,000 chance the defendant is innocent. Identify each error and construct the correct statement.

Identify the figure that was actually calculated. The 1 in 500,000 figure is P(a random person from the population shares this profile | they are not the source). This is a probability of the evidence given a hypothesis about the source, not a probability of guilt.
Identify the claim the prosecutor made. The prosecutor treated P(evidence | innocent) as if it equals P(innocent | evidence). These are not equal. Swapping them is the transposition of the conditional, the formal definition of the prosecutor's fallacy.
Consider the relevant population. If the realistic suspect pool (people with opportunity, geography, and motive) contained 50 people, approximately 1 in 10,000 people in that pool would share the profile by chance (0.0001 probability within the pool, not 1/500,000 from the general population). The denominator that matters is the suspect pool, not the world.
Apply Bayes' theorem correctly. To find P(defendant is the source | matching profile), we need: the prior probability the defendant is the source before DNA evidence, and the likelihood ratio. If the LR is 500,000 (the evidence is 500,000 times more likely if the defendant is the source than if they are not), this updates the prior odds by a factor of 500,000. The expert may report the LR; the posterior probability is a matter for the jury to determine from all evidence.
Construct the correct statement. The correct statement is: the DNA profile is 500,000 times more likely to be observed if the defendant is the source than if a random person from the relevant population is the source. This is strong support for the proposition that the defendant is the source. It does not establish guilt, which depends on all case circumstances.

Check your understanding

Question 1 of 4· 0 answered

In People v. Collins, what was the primary statistical error in the prosecution's use of the product rule?

Key Takeaways

People v. Collins (1968) established that probabilities presented in court must have an empirical basis and that the product rule cannot be applied to features that are not established to be statistically independent; the California Supreme Court reversed the conviction on these grounds.
R v. Sally Clark (2003) involved two compounding errors: an unjustified independence assumption in squaring the single-cot-death probability, and the prosecutor's fallacy in presenting the resulting figure as the probability of innocence rather than the probability of the evidence under the chance hypothesis.
The prosecutor's fallacy (treating P(evidence | innocent) as P(innocent | evidence)) and the defence fallacy (ignoring the relevant suspect pool when interpreting a random match probability) are the two canonical errors, both arising from transposition of the conditional.
The likelihood ratio format, required by ENFSI (2016) and adopted by forensic regulators in the UK, Australia, and elsewhere, avoids both fallacies by reporting the relative probability of the evidence under competing hypotheses rather than the probability of either hypothesis.
Expert witnesses using statistical evidence owe duties to the court, not the instructing party: to state assumptions, disclose the reference database, avoid transposing conditionals, stay within domain expertise, and account for correlation when applying probability rules.

What was the prosecutor's fallacy in R v. Sally Clark?

The prosecution cited Roy Meadow's claim that the chance of two cot deaths in one family was 1 in 73 million, treating this as the probability of innocence. Two errors combined: the independence assumption was wrong (cot death risk is correlated within families), and the figure was then misread as a posterior probability of guilt rather than a likelihood of the observed deaths under innocence alone. Clark was wrongly convicted. She was acquitted on her second appeal in 2003.

What statistical error was made in People v. Collins?

The prosecution multiplied together probabilities for several visual characteristics of the defendants as if those characteristics were statistically independent, which was not established. The product rule for independent events does not apply when events are correlated. The California Supreme Court overturned the conviction in 1968, citing the misuse of probability and the absence of evidence that the assumed probabilities were correct.

What is the prosecutor's fallacy?

The prosecutor's fallacy is the error of treating P(evidence | innocent) as if it equals P(innocent | evidence). A very small probability of the evidence occurring by chance does not translate directly into a high probability of guilt. Bayes' theorem is required to convert between these probabilities, and that conversion depends on the prior probability of guilt, which must be supplied by the full case facts rather than the statistical evidence alone.

What duties do forensic statisticians have as expert witnesses?

Expert witnesses who use statistical arguments must state their assumptions clearly, disclose the limits of any probability estimate, avoid conflating different conditional probabilities, and present results in formats that a lay jury can interpret correctly. Courts in several jurisdictions now require that likelihood ratios be accompanied by a verbal equivalent scale and that assumptions underlying any calculation be disclosed and, where possible, tested against relevant population data.

How did the UK respond to the Sally Clark case and similar miscarriages?

The Royal Statistical Society issued a public statement in 2001 criticising the statistical evidence in Clark's trial. The UK Court of Appeal subsequently developed guidance on the use of statistical evidence in criminal proceedings. The Law Commission review (2011) and subsequent judicial directions have reinforced requirements for transparency about assumptions and for expert witnesses to stay within the limits of their expertise when presenting probabilistic conclusions.

Test yourself on Forensic Statistics with free, timed mocks.

Practice Forensic Statistics questions

Found this useful? Pass it along.

Spotted an error in this page? Report a correction or read our editorial standards.

Key Takeaways

Your journey to becoming a forensic professional starts here.