Verbal Equivalents of the Likelihood Ratio
Verbal probability scales translate numerical likelihood ratio values into standardised phrases such as 'moderate support' or 'very strong support' so that courts and juries can interpret forensic evidence evaluations. This topic covers the major institutional scales, the rationale for verbal translation, and the ongoing debate about which scale, if any, should be adopted universally.
Last updated:
A verbal equivalent of the likelihood ratio (LR) is a standardised phrase that maps a numerical LR value or range to a plain-language descriptor such as 'limited support', 'strong support', or 'very strong support'. Forensic scientists who evaluate evidence under the Bayesian framework calculate an LR expressing how much more probable the observed findings are under the prosecution hypothesis than the defence hypothesis. That number must then be communicated to judges and juries who may have no statistical training. Verbal scales provided by institutions such as the European Network of Forensic Science Institutes (ENFSI), the UK Forensic Science Regulator (FSR), and the Association of Forensic Science Providers (AFSP) in Australia offer a structured way to make that translation. Each scale divides the LR continuum into labelled bands, assigning a phrase to each band.
The case for verbal scales rests on a communication problem: a forensic scientist who tells a court 'the LR is 10,000' without further explanation leaves the factfinder with a number they cannot contextualise. A scientist who says 'the findings provide very strong support for the prosecution hypothesis' is more easily understood, but only if the phrase is grounded in a published scale with known LR boundaries, if the scale has been disclosed, and if the limitations of the underlying data are stated. Verbal scales are tools for communication, not substitutes for scientific rigour.
The debate around verbal scales is active and unresolved. Critics argue that phrases such as 'strong support' are interpreted differently by different readers, that scale boundaries are arbitrary, and that presenting a verbal phrase without the numerical value hides information the court needs. Defenders argue that courts require accessible language, that consistent terminology across practitioners reduces ad hoc variation, and that the alternative of presenting raw LR numbers also causes misinterpretation. Regulators and professional bodies in different countries have landed in different places, producing a field of competing scales that themselves create comparability problems.
By the end of this topic you will be able to:
- Explain why verbal probability scales exist and identify the communication problem they are designed to solve.
- Compare the ENFSI, UK FSR, and AFSP verbal LR scales by structure, LR boundaries, and wording choices.
- Identify the main misinterpretation risks associated with verbal LR phrases and explain how each risk arises.
- Describe the main arguments for and against universal adoption of a single verbal LR scale.
- Apply a published verbal scale to a given LR value and state the disclosure obligations that accompany a verbal evaluative conclusion.
- Likelihood ratio (LR)
- The ratio of the probability of the observed evidence given the prosecution hypothesis to the probability of the same evidence given the defence hypothesis. An LR greater than 1 supports the prosecution hypothesis; an LR less than 1 supports the defence hypothesis; an LR of 1 is neutral.
- Verbal scale
- A published table that assigns a plain-language phrase to each numerical range of LR values, allowing evaluative conclusions to be communicated in court without requiring the factfinder to interpret a bare number. Also called a verbal probability scale or verbal equivalence scale.
- ENFSI Guideline for Evaluative Reporting
- A 2015 document from the European Network of Forensic Science Institutes that sets out best practice for Bayesian evaluative reporting, including a recommended seven-category verbal LR scale. It is the most widely cited reference in European forensic science practice.
- UK Forensic Science Regulator (FSR)
- The statutory body in England and Wales responsible for setting and monitoring quality standards in forensic science. The FSR has published codes of practice and guidance on evaluative reporting, including positions on verbal LR scales used by accredited laboratories.
- Evaluative conclusion
- A forensic opinion that addresses the weight of the evidence, typically expressed as an LR or a verbal equivalent. Distinct from a finding (what was observed) or an identification (a categorical claim). The evaluative conclusion places the observed findings in a probabilistic framework comparing competing hypotheses.
- Source-level hypothesis
- A proposition about who or what produced a trace, for example 'this glass came from the broken window at the crime scene' versus 'this glass came from some other source'. Verbal LR scales must be read in the context of the hypothesis pair they address; the same verbal phrase may correspond to very different evidential weight at source level versus activity level.
Why verbal scales exist: the communication problem
Forensic scientists operating within the Bayesian evaluative framework produce LR values that can span many orders of magnitude. A DNA profile comparison in a population with a well-studied allele frequency database may yield an LR of 10^12 or higher. A comparison of glass fragments where the analyst has limited reference data may yield an LR of 30. Presenting both of these as bare numbers in a court report creates two distinct problems: the very large number may be misread as a 'probability of guilt' rather than a weight-of-evidence ratio, and the smaller number may be dismissed as unimpressive without context. Neither problem is resolved by simply printing the number.
Verbal scales address the contextualisation problem. By assigning 'very strong support' to LR values above a defined threshold, the scale situates the number in a qualitative framework that courts can act on. The scale communicates: this result falls in a range that the forensic community considers to constitute substantial evidential weight, as distinct from results that are merely marginal. Without such a framework, individual scientists invent their own language, which introduces inconsistency between laboratories, between disciplines, and between cases.
The communication problem is not unique to any jurisdiction. Courts in the United States, the United Kingdom, Australia, Germany, the Netherlands, and elsewhere have all encountered expert testimony based on LR reasoning and have had to decide how much weight to give it. In the Netherlands, evaluative reporting and verbal scales have been used in serious crime cases since at least the 1990s. In Australia, the AFSP scale became a reference point for practitioners across multiple forensic disciplines. In the United States, the 2009 National Academy of Sciences report 'Strengthening Forensic Science in the United States' identified communication of uncertainty as a systemic weakness, which contributed to later interest in verbal scales as a partial remedy.
The major institutional scales
Three scales dominate current professional literature. They differ in the number of categories, the LR thresholds used, and the phrases assigned to each band. The comparison below shows how the same LR value of 1,000 would be labelled under each scheme.
| LR range (approx.) | ENFSI (2015) | UK FSR (Coulson et al.) | AFSP Australia |
|---|---|---|---|
| 1 to 10 | Limited support | Weak support | Limited support |
| 10 to 100 | Moderate support | Moderate support | Moderate support |
| 100 to 1,000 | Moderately strong support | Moderately strong support | Strong support |
| 1,000 to 10,000 | Strong support | Strong support | Strong support |
| 10,000 to 1,000,000 | Very strong support | Very strong support | Very strong support |
| Above 1,000,000 | Extremely strong support | Extremely strong support | Extremely strong support |
The table is a simplified representation. The ENFSI guideline uses log10 LR boundaries (1, 10, 100, 1,000, 10,000, 1,000,000) and includes a highest category sometimes rendered as 'conclusive support' or omitted as inappropriate. The UK FSR guidance acknowledges the ENFSI scale but notes that practitioners must be consistent within an accredited laboratory system and must state clearly which scale they are using. The AFSP scale, originally published in the context of Australian forensic biology practice, uses a similar logarithmic structure but collapses two ENFSI mid-range categories into one, reducing the total number of bands.
The rationale for verbal translation
Defenders of verbal scales offer several arguments. First, consistency: if all practitioners in a jurisdiction use the same scale, then 'strong support' means the same thing from laboratory to laboratory and from case to case. A defence expert can challenge the LR calculation without also having to fight a battle over what the words mean. Second, accessibility: factfinders who are not scientists can use verbal phrases as anchors for their deliberations in a way that bare numbers do not allow. Third, precedent: courts in several jurisdictions have accepted verbal evaluative conclusions under the same frameworks used for years in disciplines such as firearms and questioned documents, even before explicit LR frameworks were articulated.
The scale choice also encodes a philosophical position about the purpose of forensic evidence communication. Presenting both the numerical LR and the verbal phrase acknowledges that the number carries information the court needs, while the phrase assists in contextualising it. Presenting only the verbal phrase without the underlying number is a more contested approach: it withholds information and makes the opinion appear more categorical than the underlying probability calculation supports.
Several professional bodies, including the ENFSI working group on evaluative reporting and the Forensic Science International editorial board, have recommended that practitioners present both the LR value (or its log10) and the corresponding verbal label, disclose the scale they are using, describe the data underpinning the LR, and state the hypothesis pair the evaluation addresses. This combination treats the verbal phrase as a communication aid rather than a replacement for transparency.
Risks of misinterpretation
Verbal scales introduce specific misinterpretation risks that practitioners must address in their reports and testimony. The most frequently documented risks are described below.
- Source probability conflation. 'Strong support for the prosecution hypothesis' can be read as 'there is a strong probability that the defendant is guilty'. The LR does not state this. It states how much more probable the evidence is under one hypothesis than another. The prior probability of guilt is the court's domain, not the scientist's.
- Scale opacity. When only the verbal phrase appears in a report without the underlying number and scale citation, the court cannot assess whether the label is appropriate. An LR of 1,001 and an LR of 9,999 both attract 'strong support' under the ENFSI scale, yet their numerical difference may matter for the specific case.
- Phrase strength inflation. Research by Martire and colleagues (2013, Australia) and Koppen and Saks (1996, US) found that lay readers interpret verbal probability phrases at higher numerical values than scientists intend. A phrase the scientist treats as corresponding to LR 100 to 1,000 may be read by a juror as implying near-certainty.
- Cross-scale inconsistency. If two experts in the same trial use different scales, their verbal conclusions are not directly comparable. 'Moderately strong support' under ENFSI corresponds to LR 100 to 1,000; the AFSP scale has no equivalent category. Courts have no reliable way to reconcile this without the underlying numbers.
- Hypothesis pair ambiguity. An LR and its verbal equivalent mean nothing without knowing what hypotheses were compared. 'Strong support for the prosecution hypothesis' where the prosecution hypothesis is 'this hair came from the defendant' is very different from the same phrase where the prosecution hypothesis is 'the defendant was present at the scene'.
Ongoing debates about standardisation
The existence of multiple competing scales is itself a problem. A practitioner in London using the UK FSR guidance and a practitioner in Amsterdam using the ENFSI scale will attach different verbal labels to the same LR value in some regions of the scale. A practitioner in Sydney using the AFSP scale produces conclusions that are not directly comparable to either. This inconsistency is precisely what verbal scales are supposed to prevent: it exists at a higher level, between institutions rather than between individual practitioners.
Two broad camps have formed in the literature. The harmonisation camp argues that all forensic science bodies should agree on a single scale, most likely a version of the ENFSI structure, and require accredited laboratories to use it. The transparency camp argues that verbal scales should be deprecated in favour of always presenting the LR numerically, with plain-language explanation of what an LR is, rather than allowing verbal shorthand that can conceal the underlying uncertainty. A third position, sometimes described as a hybrid approach, recommends presenting both the numerical LR and a verbal label from a disclosed scale, treating the verbal phrase as commentary on the number rather than a replacement for it.
Research by Biedermann, Taroni, and colleagues has argued that the choice of scale boundaries is not scientifically derived: there is no principled reason why 'moderate support' should end at LR 100 rather than LR 50 or LR 200. The boundaries reflect convention and convenience, not a calibrated relationship between numerical weight and verbal meaning. This critique does not necessarily invalidate verbal scales as communication tools, but it does challenge the claim that a particular scale is 'correct'. Practitioners should treat scale choice as a disclosure item, not a technical finding.
The debate has practical consequences in criminal proceedings. In India, the Bharatiya Sakshya Adhiniyam 2023 (which replaced the Indian Evidence Act 1872) continues to require that expert opinion be clearly explained and that the basis for the opinion be disclosed, creating a framework in which an undisclosed verbal scale would be problematic. In England and Wales, Criminal Procedure Rule 19.4 requires experts to state the basis and extent of their knowledge. In the United States, Daubert v. Merrell Dow (1993) and subsequent case law require that scientific testimony rest on sufficient facts or data and be the product of reliable principles. None of these frameworks prohibit verbal scales, but all of them demand that the scale, its basis, and its limitations be put before the court.
Applying a verbal scale in practice
A practitioner applying a verbal scale must complete several steps before the verbal phrase can appear in a report. First, the hypothesis pair must be defined precisely and stated in the report. The verbal conclusion is meaningless without it. Second, the LR must be calculated using data appropriate to the relevant population: for a source-level glass comparison, the reference database must reflect the glass distribution in the population of interest, not a generic international dataset if the case involves a specific regional source. Third, the LR value, or its log10, must be recorded in the report along with the verbal label.
The report must identify the scale by name and, where possible, by citation. 'Using the ENFSI seven-category verbal scale (ENFSI 2015, p. 16), an LR of 8,000 falls in the range corresponding to very strong support for the prosecution hypothesis' is a complete verbal conclusion. 'The evidence strongly supports the prosecution case' is not: it states a conclusion without any of the information needed to evaluate it.
Testimony in court adds further obligations. The practitioner should be prepared to explain what an LR is, what the verbal scale is, why those boundaries were chosen, what data the LR is based on, what assumptions were made, and what the limits of the conclusion are. Cross-examination on verbal scales often focuses on the arbitrariness of the category boundaries and on whether the underlying data are sufficient. A practitioner who has only learned the verbal label and not the reasoning behind it will not withstand that examination.
Accreditation standards in many jurisdictions now require laboratories to document their reporting conventions, including the verbal scale in use, as part of their quality management system. The UK Forensic Science Regulator's Codes of Practice and Conduct, the ILAC G19 guidelines for forensic science laboratories, and ISO/IEC 17025:2017 (which covers competence of testing and calibration laboratories) all create frameworks in which ad hoc verbal language is less defensible than a disclosed, documented scale.
A forensic scientist states in a report: 'The DNA evidence very strongly supports the prosecution case.' What critical information is missing from this conclusion?
Key Takeaways
- Verbal LR scales translate numerical likelihood ratio ranges into standardised phrases to help courts interpret evaluative forensic conclusions without requiring statistical literacy, but the phrases must always be accompanied by the underlying LR value, the hypothesis pair, and the scale citation.
- The three most widely cited scales are the ENFSI (2015) seven-category scale, the UK FSR guidance, and the AFSP Australia scale; all use logarithmic LR boundaries but differ in the number of categories and some boundary values, making cross-scale comparison unreliable without the underlying numbers.
- Key misinterpretation risks include source probability conflation (reading 'strong support for the prosecution hypothesis' as 'high probability of guilt'), scale opacity (omitting the LR value and scale from the report), phrase strength inflation (lay readers interpreting verbal phrases at higher certainty than intended), and hypothesis pair ambiguity.
- Scale boundaries are convention-based, not scientifically derived: there is no principled reason why one category ends at LR 1,000 rather than LR 500, and practitioners should treat scale choice as a disclosure obligation rather than a technical determination.
- Accreditation standards including ISO/IEC 17025:2017, the UK FSR Codes of Practice, and ILAC G19 require laboratories to document their reporting conventions, including verbal scale choice, making undisclosed or ad hoc verbal language increasingly difficult to defend in court.
What is a verbal equivalent of the likelihood ratio?
Why do different institutions use different verbal LR scales?
What are the main risks of using verbal LR scales?
What does the ENFSI verbal LR scale look like?
Can a court reject a forensic opinion expressed using a verbal LR scale?
Test yourself on Forensic Statistics with free, timed mocks.
Practice Forensic Statistics questionsSpotted an error in this page? Report a correction or read our editorial standards.