Skip to content

Computing Likelihood Ratios: Worked Examples

The likelihood ratio (LR) is the principal tool for expressing the strength of forensic evidence: it compares the probability of the observed findings under the prosecution proposition to their probability under the defence proposition. This topic walks through LR calculations for DNA profiles, glass refractive index measurements, and speaker comparison, showing how numerator and denominator probabilities are estimated and how LR values above and below 1 direct the inference.

Last updated:

Share

The likelihood ratio (LR) quantifies how much a piece of forensic evidence shifts the probability between two competing propositions. It is defined as the probability of the observed evidence given the prosecution proposition (Hp) divided by the probability of the same evidence given the defence proposition (Hd). When the LR exceeds 1, the evidence is more probable under Hp and therefore supports the prosecution case; when it falls below 1, the evidence supports the defence. The LR is not a verdict: it is an input that courts combine with prior probabilities using Bayes' theorem to reach posterior probabilities of guilt or innocence.

Forensic disciplines each have their own methods for estimating the numerator and denominator of an LR. In DNA profiling, allele frequency databases and the product rule provide the denominator. In glass comparisons, kernel density estimates fitted to refractive index measurements supply both components. In speaker comparison, automatic systems output log-LR scores calibrated against a background population of voices. The mathematics is the same across disciplines; what differs is the data source and the model used to convert measurements into probabilities.

Courts in the United Kingdom, the Netherlands, Australia, and New Zealand have formally accepted LR-based evaluative reporting as the standard framework for communicating the strength of forensic evidence. The framework has been endorsed by the European Network of Forensic Science Institutes (ENFSI) and the UK Forensic Science Regulator. In the United States, adoption has been slower, but the PCAST 2016 report on forensic science and the National Commission on Forensic Science have both recommended a move toward probabilistic, LR-based expressions of evidential weight.

By the end of this topic you will be able to:

  • State the definition of the likelihood ratio and identify the proposition assigned to the numerator and the proposition assigned to the denominator.
  • Calculate a DNA profile LR using allele frequencies and the product rule, and explain what the random match probability represents.
  • Describe how kernel density estimates are used to compute the numerator and denominator probabilities in a glass refractive index LR.
  • Explain how automatic speaker comparison systems produce log-LR scores and how those scores are calibrated against a background population.
  • Interpret LR values above 1, below 1, and equal to 1 in terms of evidential direction and strength, and identify the correct role of the LR in a Bayesian inference chain.
Key terms
Likelihood ratio (LR)
The ratio of P(evidence | Hp) to P(evidence | Hd). It measures how much more probable the observed evidence is under the prosecution proposition than under the defence proposition. LR > 1 supports Hp; LR < 1 supports Hd; LR = 1 means the evidence does not discriminate between the propositions.
Prosecution proposition (Hp)
The proposition advanced by the prosecution, typically asserting that the defendant is the source of the questioned material (e.g., 'the crime-scene DNA came from the defendant'). It occupies the numerator of the LR.
Defence proposition (Hd)
The alternative proposition, typically asserting that someone else is the source (e.g., 'the crime-scene DNA came from an unknown, unrelated person'). It occupies the denominator of the LR.
Random match probability (RMP)
The probability that a randomly chosen, unrelated individual from the relevant population would share the crime-scene DNA profile. The RMP is the denominator when the numerator under Hp is 1 (i.e., the defendant being the true contributor would certainly produce the observed profile). The LR in this simple case equals 1/RMP.
Kernel density estimate (KDE)
A non-parametric smooth probability density function fitted to a set of measurements by placing a kernel (usually Gaussian) over each data point and summing them. Used in glass and other trace evidence LR models to convert a set of reference measurements into a continuous density that can be evaluated at any measurement value.
Log-LR (log likelihood ratio)
The natural or base-10 logarithm of the LR. Log-LRs are additive for independent evidence types, making them convenient for combining across disciplines. Positive log-LR supports Hp; negative supports Hd. Speaker comparison systems typically output log-LR scores directly.

The LR framework: structure and logic

Every LR calculation begins by fixing two propositions at the same level of the hierarchy. Source-level propositions address who or what left the trace ('the defendant is the source' versus 'an unknown person is the source'). Activity-level propositions address what happened ('the defendant handled the object' versus 'the defendant never touched it'). Mixing levels in a single LR is an error: the numerator and denominator must address the same question.

The general LR formula is: LR = P(E | Hp) / P(E | Hd), where E is the complete set of observations. In simple cases, P(E | Hp) = 1 because if the defendant truly is the source, we would certainly observe the match. The LR then reduces to 1 / P(E | Hd), so estimating the denominator is the core calculation. In more complex cases, such as mixtures or partial profiles, the numerator may itself be a probability less than 1.

The verbal scale for LR values translates numerical outputs into courtroom language. The ENFSI and the UK Forensic Science Regulator's guidance use similar bands: LR from 1 to 10 is 'weak support', 10 to 100 is 'moderate support', 100 to 1000 is 'moderately strong support', 1000 to 10,000 is 'strong support', and above 10,000 is 'very strong support' for Hp. Values below 1 invert: LR of 0.1 (i.e., 10 in favour of Hd) is 'weak support for Hd'. The verbal scale is guidance, not a legal threshold.

DNA profile LR: the product rule and allele frequencies

A standard DNA profile consists of genotype observations at multiple short tandem repeat (STR) loci. At each locus, the analyst observes two alleles (or one allele if the person is homozygous). Under Hp (defendant is the source), the probability of observing a matching profile is 1. Under Hd (an unknown, unrelated person is the source), the denominator is the probability that a randomly chosen person from the relevant population has the same genotype at every locus.

For a heterozygous genotype with alleles a and b at a single locus, the genotype frequency under the Hardy-Weinberg equilibrium assumption is 2 x p(a) x p(b), where p(a) and p(b) are the allele frequencies in the reference population database. For a homozygous genotype with allele a, the frequency is p(a)^2, sometimes corrected upward using the theta (substructure) correction: p(a)^2 + p(a)(1 - p(a)) x theta. The product rule then multiplies frequencies across loci to give the random match probability (RMP). The LR = 1 / RMP.

LocusDefendant genotypeAllele frequenciesGenotype frequency
D3S135815, 17p(15)=0.22, p(17)=0.192 x 0.22 x 0.19 = 0.0836
vWA16, 18p(16)=0.26, p(18)=0.142 x 0.26 x 0.14 = 0.0728
FGA22, 24p(22)=0.13, p(24)=0.092 x 0.13 x 0.09 = 0.0234

Using the three loci in the table as a simplified example: RMP = 0.0836 x 0.0728 x 0.0234 = approximately 1.42 x 10^-4, or about 1 in 7,000. The LR = 1 / 0.000142 = approximately 7,000. Modern 15-to-20-locus profiles produce RMPs typically in the range of 1 in 10^15 to 10^20, giving LRs that are astronomically large. The random match probability concept topic covers the interpretation and misinterpretation of these figures in detail.

Glass refractive index LR: kernel density estimation

Glass fragments are a common form of trace evidence: a broken window leaves shards that may transfer to a suspect's clothing. The refractive index (RI) of glass, measured by a method such as temperature-controlled immersion microscopy (GRIM3), is a physical property that varies between glass sources. The LR question is: do the crime-scene glass fragments and the control glass from the defendant share a common source?

Under Hp, the questioned fragments came from the crime-scene window, so their RI measurements should be consistent with the RI of the control glass from that window. Under Hd, the fragments came from another, unknown glass source, so their RI distribution should reflect the general population of glass sources in the environment. The LR is therefore: LR = P(RI measurements | same source) / P(RI measurements | different source).

The numerator is computed from a kernel density estimate (KDE) fitted to replicate RI measurements of the control glass. Each replicate measurement from the questioned sample is evaluated at the corresponding point on that KDE density curve, giving a probability density value. The denominator is computed from a KDE fitted to a reference database of RI measurements from many different glass sources, representing what an analyst would expect to observe if the fragments came from a random glass pane. Multiplying (or summing in log space) across the questioned fragments gives the combined LR.

The Curran, Triggs, Buckleton, and Weir approach (published in 2000 and subsequently refined) formalised this KDE method and it is now implemented in software tools including the R package 'forensic'. The UK Home Office Forensic Science Service database, later maintained by the Forensic Science Regulator, contains tens of thousands of float glass RI measurements and provides the denominator distribution for cases in England and Wales. Equivalent databases exist in the Netherlands (Netherlands Forensic Institute) and Australia (AFP). LRs for glass RI comparisons in casework typically range from low values (a few tens) when the RI of the questioned glass is common in the reference population, to very high values (millions) when it falls in a thin tail of the distribution.

Speaker comparison LR: automatic systems and calibration

Forensic speaker comparison addresses the question of whether a known speaker (the suspect, whose voice is recorded in a police interview) is the same person as an unknown speaker (captured on a crime recording, a surveillance tape, or an intercepted call). The propositions are typically: Hp = 'the suspect spoke the questioned recording'; Hd = 'a different person, drawn from the relevant population of speakers, spoke the questioned recording'.

Automatic speaker recognition (ASR) systems extract feature vectors from the recordings, typically mel-frequency cepstral coefficients (MFCCs) representing the spectral envelope of speech, and compare the suspect and questioned recordings using a statistical model. The most widely used current architecture is the probabilistic linear discriminant analysis (PLDA) model applied to speaker embeddings (i-vectors or x-vectors). The system outputs a score: a high score indicates acoustic similarity, a low score indicates dissimilarity.

That raw score is not yet an LR. It must be calibrated: the score is mapped to a log-LR by evaluating it against a background population of same-speaker and different-speaker score distributions derived from a reference corpus. If the score falls where same-speaker pairs typically cluster, the log-LR is positive and large; if it falls where different-speaker pairs cluster, the log-LR is negative. Calibration is performed using logistic regression or the Pool Adjacent Violators (PAV) algorithm. The ENFSI Speaker Identification Working Group has published guidelines specifying validation requirements: a calibrated system must demonstrate adequate performance on a representative validation set before its outputs are used in court.

LR values above and below 1: direction and magnitude

An LR of exactly 1 means the evidence is equally probable under both propositions and provides no discriminating information. This is a neutral result, not an absence of evidence. An LR of 1,000 means the evidence is 1,000 times more probable under Hp than Hd. An LR of 0.001 means the evidence is 1,000 times more probable under Hd than Hp: the finding actively supports the defence.

LR valueLog10(LR)DirectionVerbal scale (ENFSI)
10,000+4Supports HpStrong support for Hp
100+2Supports HpModerate support for Hp
2+0.3Supports HpWeak support for Hp
10NeutralDoes not support either proposition
0.5-0.3Supports HdWeak support for Hd
0.01-2Supports HdModerate support for Hd

A common error is to report only LRs above 1 and treat LRs below 1 as non-results. This is a form of selective reporting that distorts the picture for the court. ENFSI guidelines, the UK Forensic Science Regulator's Codes of Practice and Conduct, and the ILAC G19:08/2014 guidelines on forensic science laboratories all require that LR calculations be reported regardless of their direction. A scientist who discovers the evidence supports the defence has an obligation to report that finding.

The precision of an LR estimate also matters. Point estimates of LR are rarely exact: they depend on database choices, model assumptions, and the variability of the measurements. Some laboratories report confidence intervals or sensitivity analyses alongside the central LR value. The Bayesian credible interval approach and the bootstrapped confidence interval are both used in practice. A reported LR of 1,000 with a 95% confidence interval of 200 to 5,000 communicates a different level of certainty than a point estimate without bounds.

Validation, error rates, and LR reliability

An LR model must be validated before it is used in casework. Validation tests the model on a known dataset: for DNA, this means computing LRs for profiles where ground truth is known (same-source and different-source pairs) and checking that the model assigns high LRs to same-source pairs and low LRs to different-source pairs. The Tippett plot, which graphs the cumulative distribution of LRs for same-source and different-source pairs on the same axes, is the standard validation visualisation. A well-calibrated model shows the two curves separating cleanly.

Empirical cross-entropy (ECE) is the standard scalar metric for evaluating LR system performance. It measures how much information is lost (relative to a perfect system) when the model's log-LR outputs are used as Bayesian evidence. Lower ECE is better. The Cllr (log likelihood ratio cost) is an equivalent metric used in speaker comparison literature. Both measure the combination of discriminating power and calibration: a model that discriminates well but is miscalibrated still has poor Cllr.

Courts in multiple jurisdictions have scrutinised LR models. The New Zealand Court of Appeal in R v Pengelly (1992) addressed the admissibility of statistical DNA evidence. The UK Court of Appeal in R v Deen (1994) identified the prosecutor's fallacy in a DNA case and clarified the distinction between the RMP and the probability of guilt. In India, under the Bharatiya Sakshya Adhiniyam 2023 (which replaced the Indian Evidence Act 1872), expert opinion is admissible under Section 39 when the subject requires specialised knowledge; the LR model and its validation evidence would typically be disclosed to the defence as part of the expert's report. Equivalent disclosure obligations exist under the US Federal Rules of Evidence Rule 702, and under the EU directive on minimum standards for criminal proceedings.

Check your understanding
Question 1 of 4· 0 answered

A scientist calculates an LR of 0.04 for a fibre comparison. What does this mean?

Key Takeaways

  • The likelihood ratio is defined as P(E | Hp) / P(E | Hd). Values above 1 support the prosecution proposition; values below 1 support the defence proposition; the direction of the result must always be reported honestly.
  • DNA profile LRs are computed using allele frequency databases and the product rule. The LR equals 1 / RMP when the numerator under Hp is 1, as it is for single-source profiles where the defendant's genotype matches the crime-scene profile at all loci.
  • Glass refractive index LRs use kernel density estimates: the numerator KDE is fitted to replicate measurements of the control glass, and the denominator KDE is fitted to a reference database of glass sources. The LR is the ratio of the two density values evaluated at the questioned measurement.
  • Speaker comparison systems output raw acoustic similarity scores that must be calibrated against a background population of same-speaker and different-speaker pairs to convert them into log-LR values. Short or noisy recordings reduce precision and compress the LR toward 1.
  • LR models must be validated before casework use. The Tippett plot and empirical cross-entropy (Cllr) are standard validation tools. Disclosure of the model, the database, and the validation evidence is required under the rules of evidence in all major common law and civil law jurisdictions.
What does a likelihood ratio of 1000 mean in practice?
An LR of 1000 means the observed evidence is 1000 times more probable if the prosecution proposition is true than if the defence proposition is true. It does not mean the defendant is 1000 times more likely to be guilty; that posterior probability depends on the prior odds, which are set by the court, not the forensic scientist. The LR is an input to Bayes' theorem, not a verdict.
Why can an LR be less than 1, and what does that mean?
An LR below 1 means the observed evidence is more probable under the defence proposition than under the prosecution proposition. For example, an LR of 0.01 means the findings are 100 times more likely if the defendant is not the source. Such an LR supports the defence, and a scientist who calculates it must report it honestly even though the direction is unexpected. This is a core requirement of evaluative reporting.
How is the denominator of a DNA LR estimated?
The denominator is the probability of observing the crime-scene DNA profile if the true contributor is an unrelated, randomly chosen person from the relevant population. It is estimated using allele frequencies from a reference database, typically assembled from population surveys. For a multi-locus profile, the frequencies at each locus are combined using the product rule, with corrections for population substructure where appropriate.
What is a kernel density estimate and why is it used in glass LR calculations?
A kernel density estimate (KDE) is a smooth, continuous probability density function fitted to a set of discrete observations. In glass refractive index calculations, the numerator distribution is built from replicate measurements of the crime-scene glass, and the denominator distribution is built from a reference database of glass samples. KDEs are preferred over histograms because they avoid arbitrary binning and produce stable density values even in the tails of the distribution.
Can LRs from different forensic disciplines be multiplied together?
Yes, if the evidence types are conditionally independent given the propositions, their LRs can be multiplied to give a combined LR. For example, if glass and fibre evidence are treated as independent, the product of their individual LRs gives the LR for the combination. In practice, full independence is rarely guaranteed, so combining LRs requires careful evaluation of the dependence structure of the evidence.

Test yourself on Forensic Statistics with free, timed mocks.

Practice Forensic Statistics questions

Found this useful? Pass it along.

Share

Spotted an error in this page? Report a correction or read our editorial standards.

Your journey to becoming a forensic professional starts here.

Practice with mock tests, learn from structured notes, and get your questions answered by a global forensic community, all in one place.