Computing Likelihood Ratios: Worked Examples

Q: What does a likelihood ratio of 1000 mean in practice?

An LR of 1000 means the observed evidence is 1000 times more probable if the prosecution proposition is true than if the defence proposition is true. It does not mean the defendant is 1000 times more likely to be guilty; that posterior probability depends on the prior odds, which are set by the court, not the forensic scientist. The LR is an input to Bayes' theorem, not a verdict.

Q: Why can an LR be less than 1, and what does that mean?

An LR below 1 means the observed evidence is more probable under the defence proposition than under the prosecution proposition. For example, an LR of 0.01 means the findings are 100 times more likely if the defendant is not the source. Such an LR supports the defence, and a scientist who calculates it must report it honestly even though the direction is unexpected. This is a core requirement of evaluative reporting.

Q: How is the denominator of a DNA LR estimated?

The denominator is the probability of observing the crime-scene DNA profile if the true contributor is an unrelated, randomly chosen person from the relevant population. It is estimated using allele frequencies from a reference database, typically assembled from population surveys. For a multi-locus profile, the frequencies at each locus are combined using the product rule, with corrections for population substructure where appropriate.

Q: What is a kernel density estimate and why is it used in glass LR calculations?

A kernel density estimate (KDE) is a smooth, continuous probability density function fitted to a set of discrete observations. In glass refractive index calculations, the numerator distribution is built from replicate measurements of the crime-scene glass, and the denominator distribution is built from a reference database of glass samples. KDEs are preferred over histograms because they avoid arbitrary binning and produce stable density values even in the tails of the distribution.

Q: Can LRs from different forensic disciplines be multiplied together?

Yes, if the evidence types are conditionally independent given the propositions, their LRs can be multiplied to give a combined LR. For example, if glass and fibre evidence are treated as independent, the product of their individual LRs gives the LR for the combination. In practice, full independence is rarely guaranteed, so combining LRs requires careful evaluation of the dependence structure of the evidence.

The likelihood ratio (LR) is the principal tool for expressing the strength of forensic evidence: it compares the probability of the observed findings under the prosecution proposition to their probability under the defence proposition. This topic walks through LR calculations for DNA profiles, glass refractive index measurements, and speaker comparison, showing how numerator and denominator probabilities are estimated and how LR values above and below 1 direct the inference.

Last updated: 24 Jun 2026

The likelihood ratio (LR) quantifies how much a piece of forensic evidence shifts the probability between two competing propositions. It is defined as the probability of the observed evidence given the prosecution proposition (Hp) divided by the probability of the same evidence given the defence proposition (Hd). When the LR exceeds 1, the evidence is more probable under Hp and therefore supports the prosecution case; when it falls below 1, the evidence supports the defence. The LR is not a verdict: it is an input that courts combine with prior probabilities using Bayes' theorem to reach posterior probabilities of guilt or innocence.

Forensic disciplines each have their own methods for estimating the numerator and denominator of an LR. In DNA profiling, allele frequency databases and the product rule provide the denominator. In glass comparisons, kernel density estimates fitted to refractive index measurements supply both components. In speaker comparison, automatic systems output log-LR scores calibrated against a background population of voices. The mathematics is the same across disciplines; what differs is the data source and the model used to convert measurements into probabilities.

Courts in the United Kingdom, the Netherlands, Australia, and New Zealand have formally accepted LR-based evaluative reporting as the standard framework for communicating the strength of forensic evidence. The framework has been endorsed by the European Network of Forensic Science Institutes (ENFSI) and the UK Forensic Science Regulator. In the United States, adoption has been slower, but the PCAST 2016 report on forensic science and the National Commission on Forensic Science have both recommended a move toward probabilistic, LR-based expressions of evidential weight.

By the end of this topic you will be able to:

State the definition of the likelihood ratio and identify the proposition assigned to the numerator and the proposition assigned to the denominator.
Calculate a DNA profile LR using allele frequencies and the product rule, and explain what the random match probability represents.
Describe how kernel density estimates are used to compute the numerator and denominator probabilities in a glass refractive index LR.
Explain how automatic speaker comparison systems produce log-LR scores and how those scores are calibrated against a background population.
Interpret LR values above 1, below 1, and equal to 1 in terms of evidential direction and strength, and identify the correct role of the LR in a Bayesian inference chain.

Key terms

Likelihood ratio (LR): The ratio of P(evidence | Hp) to P(evidence | Hd). It measures how much more probable the observed evidence is under the prosecution proposition than under the defence proposition. LR > 1 supports Hp; LR < 1 supports Hd; LR = 1 means the evidence does not discriminate between the propositions.
Prosecution proposition (Hp): The proposition advanced by the prosecution, typically asserting that the defendant is the source of the questioned material (e.g., 'the crime-scene DNA came from the defendant'). It occupies the numerator of the LR.
Defence proposition (Hd): The alternative proposition, typically asserting that someone else is the source (e.g., 'the crime-scene DNA came from an unknown, unrelated person'). It occupies the denominator of the LR.
Random match probability (RMP): The probability that a randomly chosen, unrelated individual from the relevant population would share the crime-scene DNA profile. The RMP is the denominator when the numerator under Hp is 1 (i.e., the defendant being the true contributor would certainly produce the observed profile). The LR in this simple case equals 1/RMP.
Kernel density estimate (KDE): A non-parametric smooth probability density function fitted to a set of measurements by placing a kernel (usually Gaussian) over each data point and summing them. Used in glass and other trace evidence LR models to convert a set of reference measurements into a continuous density that can be evaluated at any measurement value.
Log-LR (log likelihood ratio): The natural or base-10 logarithm of the LR. Log-LRs are additive for independent evidence types, making them convenient for combining across disciplines. Positive log-LR supports Hp; negative supports Hd. Speaker comparison systems typically output log-LR scores directly.

The LR framework: structure and logic

Every LR calculation begins by fixing two propositions at the same level of the hierarchy. Source-level propositions address who or what left the trace ('the defendant is the source' versus 'an unknown person is the source'). Activity-level propositions address what happened ('the defendant handled the object' versus 'the defendant never touched it'). Mixing levels in a single LR is an error: the numerator and denominator must address the same question.

The general LR formula is: LR = P(E | Hp) / P(E | Hd), where E is the complete set of observations. In simple cases, P(E | Hp) = 1 because if the defendant truly is the source, we would certainly observe the match. The LR then reduces to 1 / P(E | Hd), so estimating the denominator is the core calculation. In more complex cases, such as mixtures or partial profiles, the numerator may itself be a probability less than 1.

The verbal scale for LR values translates numerical outputs into courtroom language. The ENFSI and the UK Forensic Science Regulator's guidance use similar bands: LR from 1 to 10 is 'weak support', 10 to 100 is 'moderate support', 100 to 1000 is 'moderately strong support', 1000 to 10,000 is 'strong support', and above 10,000 is 'very strong support' for Hp. Values below 1 invert: LR of 0.1 (i.e., 10 in favour of Hd) is 'weak support for Hd'. The verbal scale is guidance, not a legal threshold.

DNA profile LR: the product rule and allele frequencies

A standard DNA profile consists of genotype observations at multiple short tandem repeat (STR) loci. At each locus, the analyst observes two alleles (or one allele if the person is homozygous). Under Hp (defendant is the source), the probability of observing a matching profile is 1. Under Hd (an unknown, unrelated person is the source), the denominator is the probability that a randomly chosen person from the relevant population has the same genotype at every locus.

For a heterozygous genotype with alleles a and b at a single locus, the genotype frequency under the Hardy-Weinberg equilibrium assumption is 2 x p(a) x p(b), where p(a) and p(b) are the allele frequencies in the reference population database. For a homozygous genotype with allele a, the frequency is p(a)^2, sometimes corrected upward using the theta (substructure) correction: p(a)^2 + p(a)(1 - p(a)) x theta. The product rule then multiplies frequencies across loci to give the random match probability (RMP). The LR = 1 / RMP.

Locus	Defendant genotype	Allele frequencies	Genotype frequency
D3S1358	15, 17	p(15)=0.22, p(17)=0.19	2 x 0.22 x 0.19 = 0.0836
vWA	16, 18	p(16)=0.26, p(18)=0.14	2 x 0.26 x 0.14 = 0.0728
FGA	22, 24	p(22)=0.13, p(24)=0.09	2 x 0.13 x 0.09 = 0.0234

Using the three loci in the table as a simplified example: RMP = 0.0836 x 0.0728 x 0.0234 = approximately 1.42 x 10^-4, or about 1 in 7,000. The LR = 1 / 0.000142 = approximately 7,000. Modern 15-to-20-locus profiles produce RMPs typically in the range of 1 in 10^15 to 10^20, giving LRs that are astronomically large. The random match probability concept topic covers the interpretation and misinterpretation of these figures in detail.

Glass refractive index LR: kernel density estimation

Glass fragments are a common form of trace evidence: a broken window leaves shards that may transfer to a suspect's clothing. The refractive index (RI) of glass, measured by a method such as temperature-controlled immersion microscopy (GRIM3), is a physical property that varies between glass sources. The LR question is: do the crime-scene glass fragments and the control glass from the defendant share a common source?

Under Hp, the questioned fragments came from the crime-scene window, so their RI measurements should be consistent with the RI of the control glass from that window. Under Hd, the fragments came from another, unknown glass source, so their RI distribution should reflect the general population of glass sources in the environment. The LR is therefore: LR = P(RI measurements | same source) / P(RI measurements | different source).

The numerator is computed from a kernel density estimate (KDE) fitted to replicate RI measurements of the control glass. Each replicate measurement from the questioned sample is evaluated at the corresponding point on that KDE density curve, giving a probability density value. The denominator is computed from a KDE fitted to a reference database of RI measurements from many different glass sources, representing what an analyst would expect to observe if the fragments came from a random glass pane. Multiplying (or summing in log space) across the questioned fragments gives the combined LR.

The Curran, Triggs, Buckleton, and Weir approach (published in 2000 and subsequently refined) formalised this KDE method and it is now implemented in software tools including the R package 'forensic'. The UK Home Office Forensic Science Service database, later maintained by the Forensic Science Regulator, contains tens of thousands of float glass RI measurements and provides the denominator distribution for cases in England and Wales. Equivalent databases exist in the Netherlands (Netherlands Forensic Institute) and Australia (AFP). LRs for glass RI comparisons in casework typically range from low values (a few tens) when the RI of the questioned glass is common in the reference population, to very high values (millions) when it falls in a thin tail of the distribution.

Speaker comparison LR: automatic systems and calibration

Forensic speaker comparison addresses the question of whether a known speaker (the suspect, whose voice is recorded in a police interview) is the same person as an unknown speaker (captured on a crime recording, a surveillance tape, or an intercepted call). The propositions are typically: Hp = 'the suspect spoke the questioned recording'; Hd = 'a different person, drawn from the relevant population of speakers, spoke the questioned recording'.

Automatic speaker recognition (ASR) systems extract feature vectors from the recordings, typically mel-frequency cepstral coefficients (MFCCs) representing the spectral envelope of speech, and compare the suspect and questioned recordings using a statistical model. The most widely used current architecture is the probabilistic linear discriminant analysis (PLDA) model applied to speaker embeddings (i-vectors or x-vectors). The system outputs a score: a high score indicates acoustic similarity, a low score indicates dissimilarity.

That raw score is not yet an LR. It must be calibrated: the score is mapped to a log-LR by evaluating it against a background population of same-speaker and different-speaker score distributions derived from a reference corpus. If the score falls where same-speaker pairs typically cluster, the log-LR is positive and large; if it falls where different-speaker pairs cluster, the log-LR is negative. Calibration is performed using logistic regression or the Pool Adjacent Violators (PAV) algorithm. The ENFSI Speaker Identification Working Group has published guidelines specifying validation requirements: a calibrated system must demonstrate adequate performance on a representative validation set before its outputs are used in court.

LR values above and below 1: direction and magnitude

An LR of exactly 1 means the evidence is equally probable under both propositions and provides no discriminating information. This is a neutral result, not an absence of evidence. An LR of 1,000 means the evidence is 1,000 times more probable under Hp than Hd. An LR of 0.001 means the evidence is 1,000 times more probable under Hd than Hp: the finding actively supports the defence.

LR value	Log10(LR)	Direction	Verbal scale (ENFSI)
10,000	+4	Supports Hp	Strong support for Hp
100	+2	Supports Hp	Moderate support for Hp
2	+0.3	Supports Hp	Weak support for Hp
1	0	Neutral	Does not support either proposition
0.5	-0.3	Supports Hd	Weak support for Hd
0.01	-2	Supports Hd	Moderate support for Hd

A common error is to report only LRs above 1 and treat LRs below 1 as non-results. This is a form of selective reporting that distorts the picture for the court. ENFSI guidelines, the UK Forensic Science Regulator's Codes of Practice and Conduct, and the ILAC G19:08/2014 guidelines on forensic science laboratories all require that LR calculations be reported regardless of their direction. A scientist who discovers the evidence supports the defence has an obligation to report that finding.

The precision of an LR estimate also matters. Point estimates of LR are rarely exact: they depend on database choices, model assumptions, and the variability of the measurements. Some laboratories report confidence intervals or sensitivity analyses alongside the central LR value. The Bayesian credible interval approach and the bootstrapped confidence interval are both used in practice. A reported LR of 1,000 with a 95% confidence interval of 200 to 5,000 communicates a different level of certainty than a point estimate without bounds.

Validation, error rates, and LR reliability

An LR model must be validated before it is used in casework. Validation tests the model on a known dataset: for DNA, this means computing LRs for profiles where ground truth is known (same-source and different-source pairs) and checking that the model assigns high LRs to same-source pairs and low LRs to different-source pairs. The Tippett plot, which graphs the cumulative distribution of LRs for same-source and different-source pairs on the same axes, is the standard validation visualisation. A well-calibrated model shows the two curves separating cleanly.

Empirical cross-entropy (ECE) is the standard scalar metric for evaluating LR system performance. It measures how much information is lost (relative to a perfect system) when the model's log-LR outputs are used as Bayesian evidence. Lower ECE is better. The Cllr (log likelihood ratio cost) is an equivalent metric used in speaker comparison literature. Both measure the combination of discriminating power and calibration: a model that discriminates well but is miscalibrated still has poor Cllr.

Courts in multiple jurisdictions have scrutinised LR models. The New Zealand Court of Appeal in R v Pengelly (1992) addressed the admissibility of statistical DNA evidence. The UK Court of Appeal in R v Deen (1994) identified the prosecutor's fallacy in a DNA case and clarified the distinction between the RMP and the probability of guilt. In India, under the Bharatiya Sakshya Adhiniyam 2023 (which replaced the Indian Evidence Act 1872), expert opinion is admissible under Section 39 when the subject requires specialised knowledge; the LR model and its validation evidence would typically be disclosed to the defence as part of the expert's report. Equivalent disclosure obligations exist under the US Federal Rules of Evidence Rule 702, and under the EU directive on minimum standards for criminal proceedings.

Worked example

Three LR calculations from the same case file

A case involving glass, DNA, and voice evidence. Each evidence type yields a separate LR; the final section shows what combining them requires.

A suspect is arrested in connection with a burglary. Glass fragments are recovered from his jacket, a mixed DNA profile is found on the door handle, and an audio recording of a threatening call to the victim exists. Three separate LR calculations are performed.

Glass LR. The crime-scene window glass (control) has a mean RI of 1.51834 with a standard deviation of 0.00003 across 10 replicate measurements. Twelve fragments from the suspect's jacket have mean RI 1.51836. The KDE numerator, evaluated at 1.51836 using the control distribution, gives a density of 22.4. The KDE denominator, evaluated at 1.51836 using the Home Office float-glass database (n = 14,000 samples), gives a density of 0.8. LR = 22.4 / 0.8 = 28. Verbal scale: moderate support for Hp (the jacket glass and window glass share a common source).
DNA LR. The mixed profile on the door handle contains two contributors. After mixture deconvolution, the major contributor's profile matches the suspect at all 15 STR loci profiled. The RMP for the major contributor profile, calculated against the UK Caucasian database (FSS, 2018), is 1 in 4.7 x 10^17. Under Hp (suspect is the major contributor), P(E | Hp) = 1. LR = 1 / (2.1 x 10^-18) = 4.7 x 10^17. Verbal scale: very strong support for Hp.
Speaker comparison LR. The threatening call recording (40 seconds, mobile network, reasonable quality) and the suspect's police interview recording (12 minutes, studio quality) are compared using a PLDA x-vector system validated on a UK English background corpus. The system outputs a raw score of +3.2. After PAV calibration against the background corpus, this maps to a log10-LR of +1.8, equivalent to LR = 63. Verbal scale: moderate to moderately strong support for Hp. The scientist notes that the short questioned recording limits precision: the 95% confidence interval spans log10-LR 0.9 to 2.7 (LR approximately 8 to 500).
Combining the LRs. If the three evidence types are conditionally independent, the combined log10-LR = log10(28) + log10(4.7 x 10^17) + log10(63) = 1.45 + 17.67 + 1.80 = 20.92, i.e., a combined LR of approximately 8 x 10^20. In practice, the scientist would need to argue that transfer and persistence mechanisms for glass and voice are sufficiently independent given the propositions before combining. The DNA LR alone is already vastly dominant in this set; the combination adds little inference beyond what the DNA alone establishes.

Check your understanding

Question 1 of 4· 0 answered

A scientist calculates an LR of 0.04 for a fibre comparison. What does this mean?

Key Takeaways

The likelihood ratio is defined as P(E | Hp) / P(E | Hd). Values above 1 support the prosecution proposition; values below 1 support the defence proposition; the direction of the result must always be reported honestly.
DNA profile LRs are computed using allele frequency databases and the product rule. The LR equals 1 / RMP when the numerator under Hp is 1, as it is for single-source profiles where the defendant's genotype matches the crime-scene profile at all loci.
Glass refractive index LRs use kernel density estimates: the numerator KDE is fitted to replicate measurements of the control glass, and the denominator KDE is fitted to a reference database of glass sources. The LR is the ratio of the two density values evaluated at the questioned measurement.
Speaker comparison systems output raw acoustic similarity scores that must be calibrated against a background population of same-speaker and different-speaker pairs to convert them into log-LR values. Short or noisy recordings reduce precision and compress the LR toward 1.
LR models must be validated before casework use. The Tippett plot and empirical cross-entropy (Cllr) are standard validation tools. Disclosure of the model, the database, and the validation evidence is required under the rules of evidence in all major common law and civil law jurisdictions.

What does a likelihood ratio of 1000 mean in practice?

An LR of 1000 means the observed evidence is 1000 times more probable if the prosecution proposition is true than if the defence proposition is true. It does not mean the defendant is 1000 times more likely to be guilty; that posterior probability depends on the prior odds, which are set by the court, not the forensic scientist. The LR is an input to Bayes' theorem, not a verdict.

Why can an LR be less than 1, and what does that mean?

An LR below 1 means the observed evidence is more probable under the defence proposition than under the prosecution proposition. For example, an LR of 0.01 means the findings are 100 times more likely if the defendant is not the source. Such an LR supports the defence, and a scientist who calculates it must report it honestly even though the direction is unexpected. This is a core requirement of evaluative reporting.

How is the denominator of a DNA LR estimated?

The denominator is the probability of observing the crime-scene DNA profile if the true contributor is an unrelated, randomly chosen person from the relevant population. It is estimated using allele frequencies from a reference database, typically assembled from population surveys. For a multi-locus profile, the frequencies at each locus are combined using the product rule, with corrections for population substructure where appropriate.

What is a kernel density estimate and why is it used in glass LR calculations?

A kernel density estimate (KDE) is a smooth, continuous probability density function fitted to a set of discrete observations. In glass refractive index calculations, the numerator distribution is built from replicate measurements of the crime-scene glass, and the denominator distribution is built from a reference database of glass samples. KDEs are preferred over histograms because they avoid arbitrary binning and produce stable density values even in the tails of the distribution.

Can LRs from different forensic disciplines be multiplied together?

Yes, if the evidence types are conditionally independent given the propositions, their LRs can be multiplied to give a combined LR. For example, if glass and fibre evidence are treated as independent, the product of their individual LRs gives the LR for the combination. In practice, full independence is rarely guaranteed, so combining LRs requires careful evaluation of the dependence structure of the evidence.

Test yourself on Forensic Statistics with free, timed mocks.

Practice Forensic Statistics questions

Found this useful? Pass it along.

Spotted an error in this page? Report a correction or read our editorial standards.

Key Takeaways

Your journey to becoming a forensic professional starts here.