Skip to content

The Concept of Random Match Probability

Random match probability (RMP) is the chance that a randomly chosen, unrelated person would share the evidence characteristics observed at a crime scene. This topic explains how RMP is calculated, where it sits within a probabilistic framework for evaluating forensic evidence, and why it must never be treated as the probability that the defendant is innocent.

Last updated:

Share

Random match probability (RMP) is the probability that a randomly selected, unrelated individual drawn from the relevant population would by chance possess the same evidence characteristics as those observed in the crime-scene sample. It is a population frequency estimate, not a statement about the defendant. A DNA profile shared by one person in a billion describes how rare that profile is; it does not, on its own, state the probability that the person in the dock left the sample. Confusing these two quantities is the single most consequential statistical error in forensic courtrooms, and understanding the distinction is the starting point for all rigorous probabilistic evidence evaluation.

RMP sits inside a broader probabilistic framework for evaluating forensic evidence. That framework begins with a pair of competing propositions: the prosecution hypothesis (the defendant is the source of the evidence) and the defence hypothesis (an unknown, unrelated person is the source). The likelihood ratio (LR) measures how much more probable the observed evidence is under the prosecution hypothesis than under the defence hypothesis. Under the simplest assumptions, the LR is simply 1 divided by the RMP, but the LR framework extends naturally to mixed profiles, close relatives, and sub-population effects in ways that the RMP alone cannot capture. Bayesian inference then combines the LR with the prior probability to produce a posterior probability, which is the quantity a fact-finder actually needs.

The history of RMP in courts is partly a record of misuse. Judges in England, the United States, Australia, and India have all encountered expert witnesses who stated, or implied, that an RMP of one in a million means there is only a one-in-a-million chance the defendant is innocent. That statement is wrong in a precise mathematical sense. Correcting it requires naming the error (the prosecutor's fallacy), understanding why it is an error, and knowing what a correctly framed probabilistic statement looks like. Those three tasks are the core of this topic.

By the end of this topic you will be able to:

  • Define random match probability and explain what population it refers to and what it does not say about the defendant.
  • Describe how RMP is estimated from allele frequency databases using the product rule and the theta correction for population sub-structure.
  • Explain the relationship between RMP and the likelihood ratio, and identify when the simple equivalence LR = 1/RMP breaks down.
  • Identify and correct the prosecutor's fallacy and the defence fallacy in written or oral forensic evidence.
  • Place RMP within a Bayesian framework and explain why the posterior probability of the prosecution hypothesis requires prior information that is not contained in the RMP alone.
Key terms
Random match probability (RMP)
The probability that a randomly selected, unrelated individual from the reference population shares the same evidence characteristics as the crime-scene sample. A population frequency estimate, not a statement about whether the defendant is the source.
Product rule
The rule that, under independence, the joint probability of a multi-locus genotype equals the product of the individual locus genotype frequencies. Used to compute RMP for multi-locus STR profiles. Independence across loci is supported for unlinked markers but must be verified for the specific population database in use.
Theta (FST) correction
A correction factor applied to allele frequency estimates to account for population sub-structure (the tendency for allele frequencies to differ between sub-populations). Using theta inflates the RMP slightly, making the statistic conservative in favour of the defendant. Typical values range from 0.01 to 0.03 for well-characterised populations.
Likelihood ratio (LR)
The ratio of the probability of the observed evidence under the prosecution hypothesis to the probability of the same evidence under the defence hypothesis. Under simple single-source assumptions, LR = 1/RMP. The LR is the correct measure of the evidential weight of a match and is the quantity used in evaluative reporting.
Prosecutor's fallacy
The error of treating the RMP (or its reciprocal) as the probability that the defendant is innocent, or as the probability that the defendant did not leave the sample. Formally, it confuses P(evidence | defendant not the source) with P(defendant not the source | evidence). The two are equal only in the trivial case where the prior probability is 0.5.
Defence fallacy
The converse error of inflating the importance of the RMP by arguing that, because many people in the population share the profile, the evidence has no weight. This fails because the relevant population is not the whole world but the realistic pool of alternative suspects, which may be small. Both fallacies misrepresent what the RMP is measuring.

What RMP is and what it is not

Random match probability answers a specific question: if you picked one person at random from the relevant population, what is the probability that person would match the evidence? The question is not about the defendant. It is a statement about the rarity of the evidence characteristic in the population, nothing more.

The question RMP does not answer is: given that we observe a match, what is the probability the defendant is the source? That posterior probability depends on three things: the RMP, the size and composition of the realistic alternative suspect pool, and any other evidence in the case. None of those three inputs can be derived from the RMP alone. A forensic scientist who reports an RMP without acknowledging this distinction is giving the fact-finder an incomplete picture.

The population underlying the RMP also requires definition. A DNA STR profile RMP calculated from a South Asian reference database will differ from one calculated from a European database, sometimes by an order of magnitude, because allele frequencies differ across populations. The choice of reference population should reflect the realistic pool of potential contributors, not the most favourable or least favourable database for the prosecution. In practice, many laboratories report RMP from multiple reference databases and cite the least favourable (largest) estimate as the primary figure. See Population Databases for Forensic Statistics for database selection criteria.

Calculating RMP from allele frequencies

For a standard short tandem repeat (STR) DNA profile, RMP is computed using the product rule. At each locus, the genotype frequency is estimated from allele frequencies in a reference database. For a heterozygous genotype with alleles a and b at a given locus, the genotype frequency is 2 * p(a) * p(b). For a homozygous genotype with allele a, it is p(a)^2. These per-locus frequencies are then multiplied together across all typed loci to give the profile frequency.

The product rule assumes statistical independence of allele frequencies across loci. For loci on different chromosomes (unlinked loci), this assumption is well supported. For closely linked loci on the same chromosome, linkage disequilibrium can cause the product rule to underestimate or overestimate the true profile frequency, and practitioners must use databases or methods that account for this.

StepLocus exampleCalculationResult
Allele frequencies from databaseD3S1358: allele 16 (p=0.22), allele 18 (p=0.14)Heterozygous genotype: 2 × 0.22 × 0.140.0616
Theta correction (θ=0.01)D3S13582 × [θ + (1-θ)×p(16)] × [θ + (1-θ)×p(18)]≈0.0637
Repeat for each locusAll 15 loci in profileMultiply per-locus frequenciesProduct across loci
Final RMPComplete 15-locus profileProduct of all per-locus frequenciese.g. 1 in 10^18

The theta (FST) correction accounts for population sub-structure. Within any broadly defined population (say, South Asian or sub-Saharan African), there are sub-groups whose allele frequencies differ from the overall average. Using the raw average frequency underestimates the true genotype frequency for someone from a sub-group where that allele is more common. Applying a theta correction, with theta typically set at 0.01 to 0.03, makes the RMP larger (more conservative, more favourable to the defendant) by inflating the effective allele frequency. The National Research Council (1996) report and subsequent guidance from laboratories in the UK, US, and Australia all recommend applying theta.

From RMP to likelihood ratio

The likelihood ratio is the preferred currency of forensic evidence evaluation in most contemporary laboratory reporting guidelines, including those issued by the Forensic Science Regulator in England and Wales, the Scientific Working Group for DNA Analysis Methods (SWGDAM) in the United States, and the European Network of Forensic Science Institutes (ENFSI). The LR expresses the evidential weight of a match as a ratio: how many times more probable is the evidence under the prosecution hypothesis than under the defence hypothesis?

Under the simplest scenario, a single-source profile where the crime-scene sample matches the defendant and there is no transfer or persistence issue, the prosecution hypothesis is that the defendant deposited the sample. The probability of observing the match under this hypothesis is 1 (if the defendant is the source, we expect a match). The defence hypothesis is that an unknown, unrelated person deposited the sample. The probability of observing the match under this hypothesis is the RMP. So LR = 1 / RMP. An RMP of 1 in a billion gives an LR of 1 billion: the match is a billion times more probable if the defendant is the source than if a random person is.

The LR framework also generalises naturally to evidence other than DNA. Fingerprint experts in several jurisdictions now report their conclusions as LRs rather than categorical match/no-match opinions. Toolmark, glass fragment, fibre, and shoeprint evidence can all be evaluated using the same LR structure, though the underlying probability models differ. The role of statistics in evidence evaluation across these disciplines is examined in Role of Statistics in Evidence Evaluation.

Bayesian reasoning and posterior probability

Bayes' theorem relates prior probability to posterior probability through the likelihood ratio. In the context of a forensic match: the posterior odds that the defendant is the source, given the evidence, equal the prior odds multiplied by the LR. Symbolically: posterior odds = prior odds × LR. The prior odds represent everything the fact-finder believed about the defendant's culpability before hearing the forensic evidence. The LR then updates that belief.

This framework makes explicit why RMP alone cannot establish guilt or innocence. Suppose the LR is 10 million (corresponding to an RMP of 1 in 10 million). If the prior odds were 1 to 1,000 (say, the defendant was one of 1,000 plausible suspects with no other distinguishing evidence), the posterior odds are 10 million / 1,000 = 10,000 to 1 in favour of the defendant being the source. If the prior odds were 1 to 1 (the defendant was the only realistic suspect given other strong evidence), the posterior odds become 10 million to 1. The RMP is the same in both cases; the probability of guilt is very different.

Courts in most jurisdictions do not ask forensic scientists to state the prior probability, and they should not do so: the prior is the province of the jury or judge, who must weigh it against all other evidence. The forensic scientist's job is to provide a correctly stated LR and explain what it means. The Bayesian framework is the logical structure within which the jury then operates, even if the jury is not told to think in terms of Bayes' theorem. The UK Court of Appeal addressed this in R v Adams [1996] and subsequent cases, reinforcing that experts should not usurp the fact-finding function by combining the LR with a prior to give a final probability.

The prosecutor's fallacy and the defence fallacy

The prosecutor's fallacy occurs when the conditional probability P(evidence | defendant not the source) is presented or understood as P(defendant not the source | evidence). In practice, an expert witness says something like: 'The chance of this DNA profile belonging to someone other than the defendant is one in a billion.' This implies the chance the defendant is innocent is one in a billion, which is not what the RMP says. The RMP says the profile is rare in the population; it does not account for the size of the suspect pool, the strength of other evidence, or the possibility of transfer.

The fallacy has been identified in courts across jurisdictions. In the UK, the case of R v Deen (1993) resulted in a conviction being quashed after the prosecution expert incorrectly stated that the DNA evidence meant there was only a one-in-three-million chance the defendant was not the rapist. In the US, the NRC II report (1996) explicitly addressed the fallacy and called for clear, standardised language in expert testimony. In Australia, the Frequentist Interpretation Committee of the International Association for Forensic Sciences has produced guidance on correct terminology.

The defence fallacy runs in the opposite direction. Defence counsel argues: the RMP is 1 in 10 million, but there are 1.4 billion people in India (or 330 million in the United States), so there must be hundreds or thousands of people who match, meaning the evidence proves nothing. This argument misrepresents the relevant population. The alternative suspect pool is not the entire country; it is the set of people who could realistically have deposited the sample at that time and place. In most cases that pool is very much smaller, and restricting the calculation to it typically preserves substantial evidential weight.

FallacyWhat is claimedWhy it is wrongCorrect framing
Prosecutor's fallacyRMP = probability defendant is innocentConfuses P(E|not source) with P(not source|E); ignores prior probabilityRMP is evidence weight, not posterior probability; jury weighs it against other evidence
Defence fallacyMany people match, so the evidence is worthlessUses whole population, not realistic suspect pool; ignores that any unrelated match must also have had access and opportunityLR quantifies evidential weight given realistic alternative; small LR may still be material
Cold-hit problemRMP calculated before search overstates weight after database hitSearching a million profiles inflates false-positive risk; Bayesian updating requiredAdjust prior to reflect the size of the database searched; report both pre- and post-search probabilities

RMP in evaluative reporting and court

Evaluative reporting guidelines from several jurisdictions now require forensic scientists to express their conclusions as likelihood ratios on a named verbal scale rather than as bare RMP figures. The ENFSI (European Network of Forensic Science Institutes) verbal scale runs from 'limited support' (LR 1 to 10) through 'moderate', 'moderately strong', 'strong', 'very strong', to 'extremely strong support' (LR greater than 10 million). The Forensic Science Regulator in England and Wales endorses the ENFSI approach. The Association of Forensic Science Providers (AFSP) in the UK has issued standards requiring that expert reports state the propositions clearly, report the LR, translate it into a verbal equivalent, and not state a conclusion about guilt or innocence.

In India, forensic DNA evidence is admissible under the Bharatiya Sakshya Adhiniyam 2023 (which replaced the Indian Evidence Act 1872), with Section 79A and related provisions governing electronic records and expert opinion. The statute does not mandate a specific reporting format for probabilistic evidence, but case law from the Supreme Court of India has emphasised that DNA evidence, while powerful, must be evaluated alongside other evidence and cannot substitute for proof of identity beyond reasonable doubt on its own. Courts in the United States similarly apply the Daubert standard (or the older Frye standard in some states) to evaluate whether the probabilistic methods used to calculate RMP are sufficiently reliable and generally accepted. The Federal Rules of Evidence govern admissibility, and federal courts have repeatedly examined the adequacy of population databases and the product rule.

The cold-hit problem is a specific application of RMP that arises when a crime-scene profile is searched against a large database and a match is found. If a database of one million profiles is searched and the RMP is 1 in 100,000, the expected number of coincidental matches in the database is 10. The post-search probability that any specific hit is a genuine match is substantially lower than the pre-search RMP implies. Correct analysis requires adjusting the prior probability to reflect the database search, an adjustment that has been accepted in courts in England, the United States, and Australia but is not yet uniformly applied. See Numbers in Forensic Conclusions for how probabilistic numbers are communicated in final reports.

Check your understanding
Question 1 of 4· 0 answered

A forensic biologist reports a DNA RMP of 1 in 50 million. A prosecutor then tells the jury that there is only a 1-in-50-million chance the defendant is innocent. Which of the following best describes the error?

Key Takeaways

  • Random match probability is a population frequency estimate: it states how rare the observed evidence characteristic is in the reference population, not the probability the defendant is the source.
  • RMP for STR profiles is calculated using the product rule with a theta correction for population sub-structure; the reference population must be chosen to reflect the realistic pool of alternative contributors.
  • The likelihood ratio (LR = 1/RMP under simple assumptions) is the correct measure of evidential weight; it answers how many times more probable the evidence is under the prosecution hypothesis than the defence hypothesis.
  • The prosecutor's fallacy treats RMP as the probability of innocence; the defence fallacy dismisses the evidence by using the whole population rather than the realistic suspect pool; both misrepresent what the RMP measures.
  • Bayesian reasoning shows that converting an LR into a posterior probability of guilt requires a prior probability, which is the province of the fact-finder, not the forensic scientist; expert reports should state the LR and its verbal equivalent, not a conclusion about guilt.
What is random match probability?
Random match probability (RMP) is the probability that a randomly selected, unrelated individual from the relevant population would by chance share the same evidence characteristics as those observed in the crime-scene sample. A very small RMP means the matching profile is rare in the population, but it does not by itself prove the defendant is the source.
How is random match probability calculated for DNA profiles?
For a multi-locus STR DNA profile, RMP is calculated by applying the product rule: the frequency of each allele at each locus is estimated from a population database, the two allele frequencies at each locus are multiplied together (with a correction for population sub-structure, typically theta or FST), and the products across all loci are then multiplied. The result is the estimated frequency of the complete profile in the reference population.
What is the prosecutor's fallacy?
The prosecutor's fallacy is the error of treating the random match probability as the probability that the defendant is innocent, or equivalently, as the probability that the match is coincidental given all the evidence in the case. RMP is a population frequency, not a posterior probability. Correctly interpreting RMP requires placing it within a Bayesian framework that accounts for the prior probability that the defendant was the source and for all other evidence.
What is the difference between RMP and a likelihood ratio?
RMP is a single probability: the chance a random person matches the evidence. A likelihood ratio (LR) expresses how much more probable the evidence is under one hypothesis (the defendant is the source) than under a competing hypothesis (an unknown person is the source). The LR directly addresses the evidential weight, whereas RMP alone does not compare alternative explanations. Under simple assumptions, LR = 1/RMP, but the LR framework handles mixed profiles, close relatives, and sub-population effects more rigorously.
Does a very small random match probability prove guilt?
No. A very small RMP means the profile is rare in the population, which is strong evidence that the defendant is the source if the profile was reliably recovered and there is no plausible innocent explanation for its presence. But rarity in the population is only one factor. Courts in many jurisdictions, including England and Wales, the United States, and India under the Bharatiya Sakshya Adhiniyam 2023, require the trier of fact to weigh all evidence together, not to treat a single statistic as conclusive.

Test yourself on Forensic Statistics with free, timed mocks.

Practice Forensic Statistics questions

Found this useful? Pass it along.

Share

Spotted an error in this page? Report a correction or read our editorial standards.

Your journey to becoming a forensic professional starts here.

Practice with mock tests, learn from structured notes, and get your questions answered by a global forensic community, all in one place.