Allele Frequencies, Random Match Probability and the Likelihood Ratio

How a profile turns into a number a jury can weigh: population allele-frequency databases, Hardy-Weinberg and theta correction, the product rule for RMP, the likelihood ratio framing favoured by ENFSI and SWGDAM, and the prosecutor's fallacy and defence attorney's fallacy that the People v. Simpson trial put on a global stage.

Last updated: 19 Jun 2026

A forensic DNA examiner hands the courtroom not just a profile but a number: the probability that a randomly selected, unrelated person from the relevant population would share that profile by chance. That number, the random match probability (RMP), converts allele calls into evidential weight. Getting it right requires grounded statistical decisions; getting it wrong has produced serious miscarriages of justice.

Key takeaways

The RMP is calculated by multiplying HWE-based genotype frequencies across all typed loci using population allele-frequency databases from SWGDAM (US), FSS (UK), or CFSL (India).
The theta (FST) correction at 0.01 for major population groups adjusts for population substructure and always makes the RMP smaller (more conservative), as described in the Balding-Nichols 1994 formula.
A 20-locus CODIS profile typically yields an RMP in the range of 1 in 10^20, far exceeding any realistic suspect-pool size.
The likelihood ratio (LR = 1 / RMP for a full single-source profile) is the form endorsed by ENFSI (2015) and SWGDAM (2010) for court presentation; it explicitly states both prosecution and defence hypotheses.
The prosecutor's fallacy equates P(evidence | innocent) with P(innocent | evidence); R v. Doheny (1996) established that UK experts must report the LR and must not opine on the probability of guilt.

The statistical framework used today in the United States, the United Kingdom, Australia, and across the EU rests on three foundational elements: a population allele-frequency database that estimates how common each allele is in the relevant population; the Hardy-Weinberg equilibrium (HWE) model and its theta-correction refinement, which translate individual allele frequencies into genotype frequencies; and the product rule, which multiplies genotype frequencies across independent loci to reach the profile-level RMP. The likelihood ratio (LR) then places that RMP inside a more complete probabilistic argument by comparing the probability of the evidence under the prosecution hypothesis against the probability under the defence hypothesis.

Two recurring errors in presenting DNA statistics to juries have been named, argued about, and litigated across three decades. The prosecutor's fallacy equates the RMP with the probability that the defendant is innocent, a mathematically invalid transposition of the conditional. The defence attorney's fallacy uses the size of a large population to suggest that many unrelated people could share the profile, ignoring the base rates that apply to a particular suspect in a specific investigation. Both errors appeared during the O.J. Simpson trial in 1995, on a global stage, and the resulting academic and legal literature became a standard reference for forensic statistics training worldwide.

Allele Frequencies and Population Reference Databases

Every RMP begins with a question about population genetics: how common is this allele among people who might plausibly be the unknown contributor?

An allele frequency is the proportion of that allele in a defined reference population. For locus D8S1179, allele 13, the allele frequency in the US Caucasian reference database might be 0.3367 (approximately one-third of chromosomes at that locus carry allele 13). The FBI's population frequency databases, published and periodically updated, provide allele frequencies for US Caucasian, African American, and Hispanic populations at all CODIS loci. SWGDAM (Scientific Working Group for DNA Analysis Methods) adopted these databases as the US standard and has published several updates since the CODIS 20 expansion.

The UK Forensic Science Service (FSS) and its successor organisations (Eurofins Forensics, LGC, AFNI) maintain separate population databases for UK Caucasian, UK Asian (predominantly South Asian), and UK Afro-Caribbean reference groups. The ENFSI DNA Working Group coordinates publication of European allele frequency data across participating member-state laboratories, with data pooled from national reference collections. The Indian CFSL has published allele frequency data for Indian populations at the CODIS 13 loci, and several peer-reviewed studies have generated data for the additional CODIS 20 loci in north Indian, south Indian, and tribal population groups.

The choice of reference database for a given case is itself an interpretive decision. If a suspect is of mixed ancestry, or if the ancestry of the unknown contributor is genuinely uncertain, examiners in the US and UK are guided by SWGDAM and ENFSI recommendations to use the database that yields the most conservative (highest) RMP estimate, or to report RMPs across multiple population groups. Courts in the US (Daubert v. Merrell Dow Pharmaceuticals, 509 U.S. 579, 1993) and the UK (R v. Doheny and Adams, 1996 4 All ER 481) have addressed the admissibility of RMP calculations and the basis on which population database choices must be disclosed to the defence, the full admissibility framework is examined in admissibility and ethics: Daubert, Frye and ELSI.

Hardy-Weinberg Equilibrium and Theta Correction

A century-old population genetics theorem is the statistical backbone of every genotype frequency calculation in forensic DNA work.

The Hardy-Weinberg equilibrium (HWE) predicts genotype frequencies in an ideally large, randomly mating population with no selection, migration, or mutation. For a diploid individual at a single locus, HWE states that the expected frequency of a heterozygous genotype carrying alleles i and j is 2*p(i)*p(j), and the expected frequency of a homozygous genotype carrying allele i twice is p(i)^2. The multiplication by 2 for heterozygotes accounts for the two possible ways to inherit allele i from one parent and allele j from the other.

HWE holds reasonably well for autosomal STR loci in large, randomly mating populations. However, real human populations depart from this ideal in a predictable direction: non-random mating, population structure, and genetic drift within subpopulations lead to a slight excess of homozygotes relative to HWE expectation, a phenomenon measured by Wright's fixation index (FST). This matters for forensic statistics because assuming HWE in a substructured population slightly understates homozygote frequencies, making the RMP slightly more favourable to the prosecution than the data warrant.

The correction for population substructure is the theta (FST) correction, formalised by Balding and Nichols in 1994 and incorporated into SWGDAM guidelines. The theta-corrected genotype frequency for a heterozygote carrying alleles i and j is: 2[(theta + (1-theta)p(i))(theta + (1-theta)p(j))] / [(1+theta)(1+2*theta)]. For homozygotes, the formula substitutes both terms with the same allele frequency. A theta value of 0.01 is commonly used for the major ethnic groups in US and UK databases; a value of 0.03 is used for isolated or historically structured populations where FST values are higher.

The ENFSI DNA Working Group's 2016 document and SWGDAM's 2010 guidelines both recommend using theta = 0.01 as a conservative default for most casework in major population groups. The UK Forensic Science Regulator's codes adopt the same value. Courts in New Zealand and Australia have explicitly considered theta correction in admissibility hearings, and the Australian model Forensic DNA laboratory standards (published by the National Institute of Forensic Science, NIFS) specify that theta must be applied and the value disclosed in the expert report.

Product rule for RMP: genotype frequencies at three loci are computed using HWE (with theta correction) and multiplied to give the profile-level probability; each locus must be on a separate chromosome to justify independence.

The Product Rule and the Final RMP

Multiplying genotype frequencies across twenty independent loci produces a number so small that writing it in conventional notation requires exponents, and that number is what the jury actually hears.

The product rule states that the probability of observing a joint genotype across multiple independent loci is the product of the individual genotype frequencies at each locus. Independence is justified by the chromosomal distribution of the CODIS 20 loci: the 20 core loci are located on separate chromosomes, satisfying the requirement of statistical independence at the population level. Loci on the same chromosome are separated by sufficient genetic distance to be effectively unlinked for the purpose of the product rule in a large randomly-mating population.

For a 20-locus CODIS profile with genotype frequencies at each locus between approximately 0.01 and 0.20 per heterozygous genotype, the 20-locus RMP in a major US population group is typically in the range of 1 in 10 to the power of 20 or smaller. In practical reporting terms, US laboratories using GlobalFiler and CODIS 20 routinely report RMPs exceeding 1 in 10 quintillion for single-source profiles from known contributors. UK laboratories using ESS17 with SE33 report similar orders of magnitude.

Reported RMP values are often presented to juries and in court documents in multiple equivalent forms: "1 in 10 quintillion", "the probability that a randomly selected unrelated individual from the general population would have this profile is approximately 1 in 1 followed by 20 zeros", or as a verbal equivalent along the lines of "this profile is unique to this individual in all reasonable probability". ENFSI's 2016 guidelines specifically recommend against this last formulation on the grounds that it conflates the RMP with a statement about uniqueness that has not been separately demonstrated; SWGDAM's guidelines recommend reporting the numeric estimate with appropriate confidence intervals.

The Likelihood Ratio Framework

The RMP answers one question; the likelihood ratio answers the question a court actually needs answered.

The RMP is the probability of the evidence (the matching profile) given the hypothesis that someone other than the defendant was the source. The likelihood ratio (LR) places this in a more complete framework by comparing that probability against the probability of the evidence given the prosecution's hypothesis.

Formally: LR = P(evidence | prosecution hypothesis) / P(evidence | defence hypothesis). For a simple single-source case where the prosecution hypothesis is "the defendant is the donor of the crime-scene profile" and the defence hypothesis is "an unknown unrelated individual is the donor", and where the defendant matches at all loci: P(evidence | Hp) = 1 (if the defendant is the donor, observing their own profile at the crime scene is certain), and P(evidence | Hd) = RMP. Therefore LR = 1 / RMP.

An LR of 1 in 10 quintillion means: the evidence is 10 quintillion times more probable under the prosecution hypothesis than under the defence hypothesis. This formulation is more correct than the RMP alone because it explicitly states both hypotheses and makes clear that the number is a ratio, not a probability of guilt. When mixtures are involved, the LR is computed by mixture deconvolution and probabilistic genotyping software rather than directly from product-rule calculations. ENFSI's Guidelines on Evaluative Reporting in Forensic Science (2015) and SWGDAM's 2010 guidelines both endorse the LR framing. The Association of Forensic Science Providers (AFSP) in the UK issued guidance in 2009 stating that all UK forensic science providers should express DNA evidence as a likelihood ratio in their reports.

The LR is the evidential component that is then combined with the prior odds (all the non-DNA evidence in the case) using Bayes' theorem to yield the posterior probability of guilt. Critically, the forensic DNA examiner reports the LR. The determination of guilt, which requires the trier of fact to apply the LR to the prior odds and reach a posterior probability, is the court's task, not the scientist's.

The Prosecutor's Fallacy and the Defence Attorney's Fallacy

Two fallacies with proper names and documented court appearances have distorted DNA evidence in front of juries on every continent.

The prosecutor's fallacy is the incorrect equating of P(evidence | not guilty) with P(not guilty | evidence). In plain language: the prosecutor (or the expert witness on the prosecution side) says "the probability that an innocent person would match is 1 in 10 billion, so the probability that the defendant is innocent is 1 in 10 billion". This is the transposition of the conditional. It ignores the base rate (prior probability of guilt before the DNA evidence is weighed) and incorrectly treats the RMP as if it were the posterior probability of innocence.

The most famous public airing of this error was the O.J. Simpson trial in 1995. Defence attorney Barry Scheck, cross-examining FBI serologist Robin Cotton, pushed back on the prosecution's use of match probabilities. The trial record, covered by virtually every major global news network, introduced the prosecutor's fallacy (and the defence attorney's fallacy) to a lay audience in a way no academic paper had previously achieved. The subsequent academic literature by Peter Donnelly, Jonathan Koehler, and others on cognitive biases in juror interpretation of DNA statistics has been cited in courts in the US, the UK, the Netherlands, and Australia.

R v. Doheny and Adams (1996) in the England and Wales Court of Appeal is the leading UK case on the proper presentation of DNA evidence to juries (the broader admissibility framework is examined in admissibility and ethics). The Court of Appeal held that the DNA expert should give evidence of the match probability (or LR) and should not be asked to express a view on the probability of guilt. The court also held that the Bayes' theorem approach (multiplying prior odds by LR to reach posterior odds) should not be explained to jurors by expert witnesses because juries are not trained statisticians, a position that has remained controversial in the academic literature but has been consistently followed in UK practice.

The defence attorney's fallacy is the incorrect argument that because the RMP is 1 in 10 billion, and the UK population is 67 million, approximately zero other people in the UK could match the profile, but because the world has 8 billion people, there might be hundreds of matches globally, so the DNA evidence proves nothing. This argument is fallacious for two reasons: it ignores the geographical and circumstantial constraints that eliminate most of the 8 billion from consideration, and it fails to account for the other evidence in the case that already narrows the relevant population to a much smaller group.

Correct LR framing versus two named fallacies: the prosecutor's fallacy transposes the conditional (reads P(evidence | innocent) as P(innocent | evidence)); the defence attorney's fallacy ignores the constrained suspect pool and treats the global population as the reference.

The Simpson Trial, R v. Adams and Lessons for Cross-Jurisdictional Practice

Two trials, two continents, two decades, the cases that shaped how DNA statistics are taught, governed, and challenged worldwide.

The 1995 O.J. Simpson double-murder trial in Los Angeles generated the most scrutinised forensic DNA evidence presentation in history. The trial was broadcast live on television across the United States and covered internationally. The prosecution's DNA evidence, presented by FBI analysts and Cellmark Diagnostics scientists, included RFLP and PCR-based STR profiles from blood found at the crime scene, on the murder victims, on the defendant's Ford Bronco, on a glove, and at the defendant's Rockingham estate. The match probabilities were enormous, in the range of 1 in 57 billion for some profiles, but the prosecution's presentation suffered from what independent analysts later characterised as a version of the prosecutor's fallacy in the way statistics were framed in closing argument.

The defence, led by Barry Scheck and Peter Neufeld of the Innocence Project, attacked chain of custody, contamination handling at the LAPD Crime Lab, and the statistical interpretation. The acquittal was widely attributed to jury distrust of the forensic evidence rather than to statistical error per se, but the trial permanently changed forensic DNA testimony training. After 1995, every major forensic science training programme in the US, UK, and Australia included explicit instruction on the prosecutor's fallacy and defence attorney's fallacy.

R v. Adams (No 1, 1996 and No 2, 1998) in England and Wales raised a different question. Denis Adams was convicted on DNA evidence alone; the victim identified him as not being her rapist. His defence team attempted to introduce a full Bayesian analysis to the jury, using a prior probability estimated from the victim's identification evidence and updating it with the DNA LR. The Court of Appeal rejected this approach on both occasions, holding that asking a jury to perform Bayesian calculations imposes an inappropriate mathematical burden on the lay decision-maker. The result is that UK practice since 1998 has been to report the LR as a verbal expression of evidential weight ("the evidence is extremely strong support for the prosecution hypothesis") rather than to walk the jury through Bayes' theorem.

Indian courts have not yet directly confronted the prosecutor's fallacy or the defence attorney's fallacy in a landmark DNA case, though the issue has been flagged in the 2003 Supreme Court judgement in Goutam Kundu v. State of West Bengal (paternity context) and in the parliamentary debates on the DNA Technology Bill, where NALSAR University Law Review and the Vidhi Centre for Legal Policy both submitted comments on statistical literacy in the judiciary as a precondition for sound DNA admissibility frameworks.

Concept	Correct statement	Fallacious version	Jurisdiction where addressed
Prosecutor's fallacy	P(evidence \| not the donor) = 1 in 10 billion	P(innocent \| this evidence) = 1 in 10 billion	R v. Doheny (UK 1996); People v. Collins (US 1968 precursor); O.J. Simpson (US 1995)
Defence attorney's fallacy	The RMP is 1 in 10 billion; given geographic and circumstantial priors, the expected number of matching unrelated individuals in the relevant suspect pool is near zero	There are 8 billion people in the world; 800 could match; DNA proves nothing	Addressed in SWGDAM 2010 guidelines; Balding textbook standard reference
Likelihood ratio	LR = P(evidence \| Hp) / P(evidence \| Hd) = 1 / RMP for single-source full profile	LR = probability of guilt	R v. Doheny (UK); ENFSI 2015 Guidelines; SWGDAM 2010

Key terms

Random match probability (RMP): The probability that a randomly selected unrelated individual from the relevant population would share the observed DNA profile by chance. Computed as the product of HWE-based genotype frequencies across all loci.
Hardy-Weinberg equilibrium (HWE): The prediction that genotype frequencies in a large, randomly mating population equal the products of allele frequencies. Underpins the computation of genotype frequencies from allele frequency databases.
Theta (FST) correction: A correction factor applied to HWE genotype frequency calculations to account for population substructure (non-random mating within subpopulations). Typically 0.01 for major population groups, 0.03 for structured or isolated populations.
Product rule: The multiplication of genotype frequencies across independent loci to compute the profile-level RMP. Justified by the chromosomal independence of CODIS and ESS loci.
Likelihood ratio (LR): The ratio of the probability of the evidence under the prosecution hypothesis to the probability under the defence hypothesis. For a full single-source profile: LR = 1 / RMP.
Prosecutor's fallacy: The incorrect equation of P(evidence | innocent) with P(innocent | evidence). Treats the RMP as if it were the posterior probability of innocence, ignoring the base rate (prior odds).
Defence attorney's fallacy: The incorrect argument that a large global or national population means many unrelated individuals could share the profile, while ignoring the geographic and circumstantial constraints that define the relevant suspect population.
Prior odds: The odds of the prosecution hypothesis relative to the defence hypothesis before the DNA evidence is considered. Combined with the LR by Bayes' theorem to yield the posterior odds.
SWGDAM: Scientific Working Group for DNA Analysis Methods. The US inter-agency body that publishes interpretation guidelines for forensic DNA analysis, including RMP and LR calculation standards.
ENFSI DNA Working Group: The European Network of Forensic Science Institutes DNA Working Group. Coordinates European forensic DNA standards, including the evaluative reporting guidelines adopted by most EU member-state laboratories.

Worked example

Calculating the RMP and Avoiding the Prosecutor's Fallacy, A Murder Trial Scenario

A crime-scene STR profile matches the defendant at all 20 CODIS loci. The forensic expert calculates the RMP and then faces cross-examination on the prosecutor's fallacy. How should the evidence be presented?

Scene: A Crown Court murder trial (England). A full 20-locus GlobalFiler profile from blood at the scene matches the defendant at every locus. The forensic expert for the prosecution has computed the random-match probability.

Step 1 (HWE product-rule calculation): For each of the 20 CODIS loci, the examiner retrieves allele frequencies from the validated UK Caucasian population database (FSS/NIST-derived). For a heterozygous locus with alleles i and j, genotype frequency = 2 × p(i) × p(j). For a homozygous locus: genotype frequency = p(i)² + theta correction. The product of 20 per-locus genotype frequencies gives the combined RMP = 1 in 8.4 × 10^20.

Step 2 (Correct LR framing): The expert states in their report: "Under the proposition that the defendant is the donor of the blood, the probability of obtaining this DNA evidence is 1. Under the proposition that a random unrelated individual from the relevant population is the donor, the probability is 1 in 8.4 × 10^20. The evidence is overwhelmingly more consistent with the defendant being the donor than with an unrelated person being the donor."

Step 3 (Prosecutor's fallacy attempt): During examination-in-chief, prosecuting counsel asks: "Does this mean there is only a 1 in 8.4 × 10^20 chance the defendant is innocent?" The expert declines the framing and explains: "No. The probability I have given is the chance that a random unrelated person would share this profile. Whether the defendant is guilty requires the jury to weigh this probability alongside all other evidence in the case. That is a question for the jury, not for me. To equate the two would be what courts have called the prosecutor's fallacy."

Step 4 (R v. Doheny application): The judge directs the jury following R v. Doheny (1996): the expert has provided the match probability; the jury must assess its weight in the context of the whole case. The judge specifically directs them not to treat the match probability as a probability of guilt.

Conclusion: The RMP calculation uses HWE with theta correction across 20 independent loci; the LR interpretation follows ENFSI evaluative reporting guidelines; and the court presentation follows the R v. Doheny framework. All three steps are direct applications of the statistical and legal principles in this topic.

Frequently asked questions

When is the product rule valid for DNA statistics and how is the RMP calculated?

The product rule multiplies per-locus genotype frequencies across all typed loci to calculate the combined RMP. It is valid when loci are statistically independent, meaning alleles at one locus carry no information about alleles at another. The 20 CODIS core loci are on separate chromosomes, confirming independence. For each heterozygous genotype (alleles i and j), frequency = 2 x p(i) x p(j) under HWE; for homozygotes, the Balding-Nichols theta correction is applied. Population geneticists Lewontin and Hartl challenged the product rule in the 1990s; those concerns were resolved by larger databases and theta corrections.

What is the theta correction in DNA statistics and when should it be used?

Theta (FST) is the population differentiation parameter from Balding and Nichols (1994). It corrects for the slight excess of homozygosity in real subpopulations relative to HWE predictions. The corrected homozygous frequency is [2theta x p + (1-theta) x p^2] / (1 + theta). Theta = 0.01 is used for major outbred populations in the US and UK; theta = 0.03 applies to more structured populations. SWGDAM and ENFSI both specify which theta values must be used. The correction always makes the RMP smaller (more conservative), which is the appropriate direction for evidence used against a defendant.

What is the difference between the prosecutor's fallacy and the defence attorney's fallacy?

The prosecutor's fallacy transposes the conditional: it equates P(evidence | innocent) with P(innocent | evidence), treating the RMP as if it were the posterior probability of innocence. R v. Doheny (1996) in England and Wales is the leading case prohibiting this framing. The defence attorney's fallacy ignores base rates: it argues that a large global population means many people could share the profile, while ignoring the geographic and circumstantial constraints that define the relevant suspect pool. Both errors appeared during the O.J. Simpson trial in 1995 and became standard subjects in post-1995 forensic statistics training worldwide.

How do Indian courts currently handle DNA match probability and the likelihood ratio?

Indian courts admit DNA expert opinion under BSA 2023 Section 39 but lack a landmark ruling equivalent to R v. Doheny establishing a framework for DNA statistics. In practice, FSL expert witnesses report RMPs; courts treat these as weight-of-evidence material. The absence of a specific judicial framework for the prosecutor's fallacy means expert report framing is critical. Parliamentary debates on the DNA Technology Bill cited NALSAR and Vidhi Centre analysis calling for statistical literacy training in the judiciary as a precondition for sound DNA admissibility.

Practice

Question 1 of 5· 0 answered

At locus D8S1179, a crime-scene profile shows a heterozygous genotype 13, 15. The allele frequencies in the relevant population database are p(13) = 0.34 and p(15) = 0.11. Using simple Hardy-Weinberg without theta correction, the genotype frequency is:

Test yourself on Forensic Biotechnology with free, timed mocks.

Practice Forensic Biotechnology questions

Found this useful? Pass it along.

Spotted an error in this page? Report a correction or read our editorial standards.

Frequently asked questions

Your journey to becoming a forensic professional starts here.