Practice with national-level exam (FACT, FACT Plus, NET, CUET, etc.) mocks, learn from structured notes, and get your doubts solved in one place.
How a profile turns into a number a jury can weigh: population allele-frequency databases, Hardy-Weinberg and theta correction, the product rule for RMP, the likelihood ratio framing favoured by ENFSI and SWGDAM, and the prosecutor's fallacy and defence attorney's fallacy that the People v. Simpson trial put on a global stage.
Last updated:
A forensic DNA examiner hands the courtroom not just a profile but a number: the probability that a randomly selected, unrelated person from the relevant population would share that profile by chance. That number, the random match probability (RMP), converts the chemistry of allele calling into an evidential weight that a judge or jury can reason about. Getting it right requires a chain of statistical decisions, each grounded in population genetics, and getting it wrong, in either direction, has produced some of the most consequential miscarriages of justice in the history of forensic science.
The statistical framework used today in the United States, the United Kingdom, Australia, and across the EU rests on three foundational elements: a population allele-frequency database that estimates how common each allele is in the relevant population; the Hardy-Weinberg equilibrium (HWE) model and its theta-correction refinement, which translate individual allele frequencies into genotype frequencies; and the product rule, which multiplies genotype frequencies across independent loci to reach the profile-level RMP. The likelihood ratio (LR) then places that RMP inside a more complete probabilistic argument by comparing the probability of the evidence under the prosecution hypothesis against the probability under the defence hypothesis.
Two recurring errors in presenting DNA statistics to juries have been named, argued about, and litigated across three decades. The prosecutor's fallacy equates the RMP with the probability that the defendant is innocent, a mathematically invalid transposition of the conditional. The defence attorney's fallacy uses the size of a large population to suggest that many unrelated people could share the profile, ignoring the base rates that apply to a particular suspect in a specific investigation. Both errors appeared during the O.J. Simpson trial in 1995, on a global stage, and the resulting academic and legal literature became a standard reference for forensic statistics training worldwide.
Every RMP begins with a question about population genetics: how common is this allele among people who might plausibly be the unknown contributor?
An allele frequency is the proportion of that allele in a defined reference population. For locus D8S1179, allele 13, the allele frequency in the US Caucasian reference database might be 0.3367 (approximately one-third of chromosomes at that locus carry allele 13). The FBI's population frequency databases, published and periodically updated, provide allele frequencies for US Caucasian, African American, and Hispanic populations at all CODIS loci. SWGDAM (Scientific Working Group for DNA Analysis Methods) adopted these databases as the US standard and has published several updates since the CODIS 20 expansion.
Test yourself on Forensic Biotechnology with free, timed mocks.
Practice Forensic Biotechnology questionsThe UK Forensic Science Service (FSS) and its successor organisations (Eurofins Forensics, LGC, AFNI) maintain separate population databases for UK Caucasian, UK Asian (predominantly South Asian), and UK Afro-Caribbean reference groups. The ENFSI DNA Working Group coordinates publication of European allele frequency data across participating member-state laboratories, with data pooled from national reference collections. The Indian CFSL has published allele frequency data for Indian populations at the CODIS 13 loci, and several peer-reviewed studies have generated data for the additional CODIS 20 loci in north Indian, south Indian, and tribal population groups.
The choice of reference database for a given case is itself an interpretive decision. If a suspect is of mixed ancestry, or if the ancestry of the unknown contributor is genuinely uncertain, examiners in the US and UK are guided by SWGDAM and ENFSI recommendations to use the database that yields the most conservative (highest) RMP estimate, or to report RMPs across multiple population groups. Courts in the US (Daubert v. Merrell Dow Pharmaceuticals, 509 U.S. 579, 1993) and the UK (R v. Doheny and Adams, 1996 4 All ER 481) have addressed the admissibility of RMP calculations and the basis on which population database choices must be disclosed to the defence.
A century-old population genetics theorem is the statistical backbone of every genotype frequency calculation in forensic DNA work.
The Hardy-Weinberg equilibrium (HWE) predicts genotype frequencies in an ideally large, randomly mating population with no selection, migration, or mutation. For a diploid individual at a single locus, HWE states that the expected frequency of a heterozygous genotype carrying alleles i and j is 2*p(i)*p(j), and the expected frequency of a homozygous genotype carrying allele i twice is p(i)^2. The multiplication by 2 for heterozygotes accounts for the two possible ways to inherit allele i from one parent and allele j from the other.
HWE holds reasonably well for autosomal STR loci in large, randomly mating populations. However, real human populations depart from this ideal in a predictable direction: non-random mating, population structure, and genetic drift within subpopulations lead to a slight excess of homozygotes relative to HWE expectation, a phenomenon measured by Wright's fixation index (FST). This matters for forensic statistics because assuming HWE in a substructured population slightly understates homozygote frequencies, making the RMP slightly more favourable to the prosecution than the data warrant.
The correction for population substructure is the theta (FST) correction, formalised by Balding and Nichols in 1994 and incorporated into SWGDAM guidelines. The theta-corrected genotype frequency for a heterozygote carrying alleles i and j is: 2[(theta + (1-theta)p(i))(theta + (1-theta)p(j))] / [(1+theta)(1+2*theta)]. For homozygotes, the formula substitutes both terms with the same allele frequency. A theta value of 0.01 is commonly used for the major ethnic groups in US and UK databases; a value of 0.03 is used for isolated or historically structured populations where FST values are higher.
The ENFSI DNA Working Group's 2016 document and SWGDAM's 2010 guidelines both recommend using theta = 0.01 as a conservative default for most casework in major population groups. The UK Forensic Science Regulator's codes adopt the same value. Courts in New Zealand and Australia have explicitly considered theta correction in admissibility hearings, and the Australian model Forensic DNA laboratory standards (published by the National Institute of Forensic Science, NIFS) specify that theta must be applied and the value disclosed in the expert report.
Multiplying genotype frequencies across twenty independent loci produces a number so small that writing it in conventional notation requires exponents, and that number is what the jury actually hears.
The product rule states that the probability of observing a joint genotype across multiple independent loci is the product of the individual genotype frequencies at each locus. Independence is justified by the chromosomal distribution of the CODIS loci: the 20 core loci are located on separate chromosomes, satisfying the requirement of statistical independence at the population level. Loci on the same chromosome are separated by sufficient genetic distance to be effectively unlinked for the purpose of the product rule in a large randomly-mating population.
For a 20-locus CODIS profile with genotype frequencies at each locus between approximately 0.01 and 0.20 per heterozygous genotype, the 20-locus RMP in a major US population group is typically in the range of 1 in 10 to the power of 20 or smaller. In practical reporting terms, US laboratories using GlobalFiler and CODIS 20 routinely report RMPs exceeding 1 in 10 quintillion for single-source profiles from known contributors. UK laboratories using ESS17 with SE33 report similar orders of magnitude.
Reported RMP values are often presented to juries and in court documents in multiple equivalent forms: "1 in 10 quintillion", "the probability that a randomly selected unrelated individual from the general population would have this profile is approximately 1 in 1 followed by 20 zeros", or as a verbal equivalent along the lines of "this profile is unique to this individual in all reasonable probability". ENFSI's 2016 guidelines specifically recommend against this last formulation on the grounds that it conflates the RMP with a statement about uniqueness that has not been separately demonstrated; SWGDAM's guidelines recommend reporting the numeric estimate with appropriate confidence intervals.
The RMP answers one question; the likelihood ratio answers the question a court actually needs answered.
The RMP is the probability of the evidence (the matching profile) given the hypothesis that someone other than the defendant was the source. The likelihood ratio (LR) places this in a more complete framework by comparing that probability against the probability of the evidence given the prosecution's hypothesis.
Formally: LR = P(evidence | prosecution hypothesis) / P(evidence | defence hypothesis). For a simple single-source case where the prosecution hypothesis is "the defendant is the donor of the crime-scene profile" and the defence hypothesis is "an unknown unrelated individual is the donor", and where the defendant matches at all loci: P(evidence | Hp) = 1 (if the defendant is the donor, observing their own profile at the crime scene is certain), and P(evidence | Hd) = RMP. Therefore LR = 1 / RMP.
An LR of 1 in 10 quintillion means: the evidence is 10 quintillion times more probable under the prosecution hypothesis than under the defence hypothesis. This formulation is more correct than the RMP alone because it explicitly states both hypotheses and makes clear that the number is a ratio, not a probability of guilt. ENFSI's Guidelines on Evaluative Reporting in Forensic Science (2015) and SWGDAM's 2010 guidelines both endorse the LR framing. The Association of Forensic Science Providers (AFSP) in the UK issued guidance in 2009 stating that all UK forensic science providers should express DNA evidence as a likelihood ratio in their reports.
The LR is the evidential component that is then combined with the prior odds (all the non-DNA evidence in the case) using Bayes' theorem to yield the posterior probability of guilt. Critically, the forensic DNA examiner reports the LR. The determination of guilt, which requires the trier of fact to apply the LR to the prior odds and reach a posterior probability, is the court's task, not the scientist's.
Two fallacies with proper names and documented court appearances have distorted DNA evidence in front of juries on every continent.
The prosecutor's fallacy is the incorrect equating of P(evidence | not guilty) with P(not guilty | evidence). In plain language: the prosecutor (or the expert witness on the prosecution side) says "the probability that an innocent person would match is 1 in 10 billion, so the probability that the defendant is innocent is 1 in 10 billion". This is the transposition of the conditional. It ignores the base rate (prior probability of guilt before the DNA evidence is weighed) and incorrectly treats the RMP as if it were the posterior probability of innocence.
The most famous public airing of this error was the O.J. Simpson trial in 1995. Defence attorney Barry Scheck, cross-examining FBI serologist Robin Cotton, pushed back on the prosecution's use of match probabilities. The trial record, covered by virtually every major global news network, introduced the prosecutor's fallacy (and the defence attorney's fallacy) to a lay audience in a way no academic paper had previously achieved. The subsequent academic literature by Peter Donnelly, Jonathan Koehler, and others on cognitive biases in juror interpretation of DNA statistics has been cited in courts in the US, the UK, the Netherlands, and Australia.
R v. Doheny and Adams (1996) in the England and Wales Court of Appeal is the leading UK case on the proper presentation of DNA evidence to juries. The Court of Appeal held that the DNA expert should give evidence of the match probability (or LR) and should not be asked to express a view on the probability of guilt. The court also held that the Bayes' theorem approach (multiplying prior odds by LR to reach posterior odds) should not be explained to jurors by expert witnesses because juries are not trained statisticians, a position that has remained controversial in the academic literature but has been consistently followed in UK practice.
The defence attorney's fallacy is the incorrect argument that because the RMP is 1 in 10 billion, and the UK population is 67 million, approximately zero other people in the UK could match the profile, but because the world has 8 billion people, there might be hundreds of matches globally, so the DNA evidence proves nothing. This argument is fallacious for two reasons: it ignores the geographical and circumstantial constraints that eliminate most of the 8 billion from consideration, and it fails to account for the other evidence in the case that already narrows the relevant population to a much smaller group.
Two trials, two continents, two decades, the cases that shaped how DNA statistics are taught, governed, and challenged worldwide.
The 1995 O.J. Simpson double-murder trial in Los Angeles generated the most scrutinised forensic DNA evidence presentation in history. The trial was broadcast live on television across the United States and covered internationally. The prosecution's DNA evidence, presented by FBI analysts and Cellmark Diagnostics scientists, included RFLP and PCR-based STR profiles from blood found at the crime scene, on the murder victims, on the defendant's Ford Bronco, on a glove, and at the defendant's Rockingham estate. The match probabilities were enormous, in the range of 1 in 57 billion for some profiles, but the prosecution's presentation suffered from what independent analysts later characterised as a version of the prosecutor's fallacy in the way statistics were framed in closing argument.
The defence, led by Barry Scheck and Peter Neufeld of the Innocence Project, attacked chain of custody, contamination handling at the LAPD Crime Lab, and the statistical interpretation. The acquittal was widely attributed to jury distrust of the forensic evidence rather than to statistical error per se, but the trial permanently changed forensic DNA testimony training. After 1995, every major forensic science training programme in the US, UK, and Australia included explicit instruction on the prosecutor's fallacy and defence attorney's fallacy.
R v. Adams (No 1, 1996 and No 2, 1998) in England and Wales raised a different question. Denis Adams was convicted on DNA evidence alone; the victim identified him as not being her rapist. His defence team attempted to introduce a full Bayesian analysis to the jury, using a prior probability estimated from the victim's identification evidence and updating it with the DNA LR. The Court of Appeal rejected this approach on both occasions, holding that asking a jury to perform Bayesian calculations imposes an inappropriate mathematical burden on the lay decision-maker. The result is that UK practice since 1998 has been to report the LR as a verbal expression of evidential weight ("the evidence is extremely strong support for the prosecution hypothesis") rather than to walk the jury through Bayes' theorem.
Indian courts have not yet directly confronted the prosecutor's fallacy or the defence attorney's fallacy in a landmark DNA case, though the issue has been flagged in the 2003 Supreme Court judgement in Goutam Kundu v. State of West Bengal (paternity context) and in the parliamentary debates on the DNA Technology Bill, where NALSAR University Law Review and the Vidhi Centre for Legal Policy both submitted comments on statistical literacy in the judiciary as a precondition for sound DNA admissibility frameworks.
| Concept | Correct statement | Fallacious version | Jurisdiction where addressed |
|---|---|---|---|
| Prosecutor's fallacy | P(evidence | not the donor) = 1 in 10 billion | P(innocent | this evidence) = 1 in 10 billion | R v. Doheny (UK 1996); People v. Collins (US 1968 precursor); O.J. Simpson (US 1995) |
| Defence attorney's fallacy | The RMP is 1 in 10 billion; given geographic and circumstantial priors, the expected number of matching unrelated individuals in the relevant suspect pool is near zero | There are 8 billion people in the world; 800 could match; DNA proves nothing | Addressed in SWGDAM 2010 guidelines; Balding textbook standard reference |
| Likelihood ratio | LR = P(evidence | Hp) / P(evidence | Hd) = 1 / RMP for single-source full profile | LR = probability of guilt | R v. Doheny (UK); ENFSI 2015 Guidelines; SWGDAM 2010 |
At locus D8S1179, a crime-scene profile shows a heterozygous genotype 13, 15. The allele frequencies in the relevant population database are p(13) = 0.34 and p(15) = 0.11. Using simple Hardy-Weinberg without theta correction, the genotype frequency is: