Verbal Equivalents of the Likelihood Ratio

Verbal probability scales translate numerical likelihood ratio values into standardised phrases such as 'moderate support' or 'very strong support' so that courts and juries can interpret forensic evidence evaluations. This topic covers the major institutional scales, the rationale for verbal translation, and the ongoing debate about which scale, if any, should be adopted universally.

Last updated: 24 Jun 2026

A verbal equivalent of the likelihood ratio (LR) is a standardised phrase that maps a numerical LR value or range to a plain-language descriptor such as 'limited support', 'strong support', or 'very strong support'. Forensic scientists who evaluate evidence under the Bayesian framework calculate an LR expressing how much more probable the observed findings are under the prosecution hypothesis than the defence hypothesis. That number must then be communicated to judges and juries who may have no statistical training. Verbal scales provided by institutions such as the European Network of Forensic Science Institutes (ENFSI), the UK Forensic Science Regulator (FSR), and the Association of Forensic Science Providers (AFSP) in Australia offer a structured way to make that translation. Each scale divides the LR continuum into labelled bands, assigning a phrase to each band.

The case for verbal scales rests on a communication problem: a forensic scientist who tells a court 'the LR is 10,000' without further explanation leaves the factfinder with a number they cannot contextualise. A scientist who says 'the findings provide very strong support for the prosecution hypothesis' is more easily understood, but only if the phrase is grounded in a published scale with known LR boundaries, if the scale has been disclosed, and if the limitations of the underlying data are stated. Verbal scales are tools for communication, not substitutes for scientific rigour.

The debate around verbal scales is active and unresolved. Critics argue that phrases such as 'strong support' are interpreted differently by different readers, that scale boundaries are arbitrary, and that presenting a verbal phrase without the numerical value hides information the court needs. Defenders argue that courts require accessible language, that consistent terminology across practitioners reduces ad hoc variation, and that the alternative of presenting raw LR numbers also causes misinterpretation. Regulators and professional bodies in different countries have landed in different places, producing a field of competing scales that themselves create comparability problems.

By the end of this topic you will be able to:

Explain why verbal probability scales exist and identify the communication problem they are designed to solve.
Compare the ENFSI, UK FSR, and AFSP verbal LR scales by structure, LR boundaries, and wording choices.
Identify the main misinterpretation risks associated with verbal LR phrases and explain how each risk arises.
Describe the main arguments for and against universal adoption of a single verbal LR scale.
Apply a published verbal scale to a given LR value and state the disclosure obligations that accompany a verbal evaluative conclusion.

Key terms

Likelihood ratio (LR): The ratio of the probability of the observed evidence given the prosecution hypothesis to the probability of the same evidence given the defence hypothesis. An LR greater than 1 supports the prosecution hypothesis; an LR less than 1 supports the defence hypothesis; an LR of 1 is neutral.
Verbal scale: A published table that assigns a plain-language phrase to each numerical range of LR values, allowing evaluative conclusions to be communicated in court without requiring the factfinder to interpret a bare number. Also called a verbal probability scale or verbal equivalence scale.
ENFSI Guideline for Evaluative Reporting: A 2015 document from the European Network of Forensic Science Institutes that sets out best practice for Bayesian evaluative reporting, including a recommended seven-category verbal LR scale. It is the most widely cited reference in European forensic science practice.
UK Forensic Science Regulator (FSR): The statutory body in England and Wales responsible for setting and monitoring quality standards in forensic science. The FSR has published codes of practice and guidance on evaluative reporting, including positions on verbal LR scales used by accredited laboratories.
Evaluative conclusion: A forensic opinion that addresses the weight of the evidence, typically expressed as an LR or a verbal equivalent. Distinct from a finding (what was observed) or an identification (a categorical claim). The evaluative conclusion places the observed findings in a probabilistic framework comparing competing hypotheses.
Source-level hypothesis: A proposition about who or what produced a trace, for example 'this glass came from the broken window at the crime scene' versus 'this glass came from some other source'. Verbal LR scales must be read in the context of the hypothesis pair they address; the same verbal phrase may correspond to very different evidential weight at source level versus activity level.

Why verbal scales exist: the communication problem

Forensic scientists operating within the Bayesian evaluative framework produce LR values that can span many orders of magnitude. A DNA profile comparison in a population with a well-studied allele frequency database may yield an LR of 10^12 or higher. A comparison of glass fragments where the analyst has limited reference data may yield an LR of 30. Presenting both of these as bare numbers in a court report creates two distinct problems: the very large number may be misread as a 'probability of guilt' rather than a weight-of-evidence ratio, and the smaller number may be dismissed as unimpressive without context. Neither problem is resolved by simply printing the number.

Verbal scales address the contextualisation problem. By assigning 'very strong support' to LR values above a defined threshold, the scale situates the number in a qualitative framework that courts can act on. The scale communicates: this result falls in a range that the forensic community considers to constitute substantial evidential weight, as distinct from results that are merely marginal. Without such a framework, individual scientists invent their own language, which introduces inconsistency between laboratories, between disciplines, and between cases.

The communication problem is not unique to any jurisdiction. Courts in the United States, the United Kingdom, Australia, Germany, the Netherlands, and elsewhere have all encountered expert testimony based on LR reasoning and have had to decide how much weight to give it. In the Netherlands, evaluative reporting and verbal scales have been used in serious crime cases since at least the 1990s. In Australia, the AFSP scale became a reference point for practitioners across multiple forensic disciplines. In the United States, the 2009 National Academy of Sciences report 'Strengthening Forensic Science in the United States' identified communication of uncertainty as a systemic weakness, which contributed to later interest in verbal scales as a partial remedy.

The major institutional scales

Three scales dominate current professional literature. They differ in the number of categories, the LR thresholds used, and the phrases assigned to each band. The comparison below shows how the same LR value of 1,000 would be labelled under each scheme.

LR range (approx.)	ENFSI (2015)	UK FSR (Coulson et al.)	AFSP Australia
1 to 10	Limited support	Weak support	Limited support
10 to 100	Moderate support	Moderate support	Moderate support
100 to 1,000	Moderately strong support	Moderately strong support	Strong support
1,000 to 10,000	Strong support	Strong support	Strong support
10,000 to 1,000,000	Very strong support	Very strong support	Very strong support
Above 1,000,000	Extremely strong support	Extremely strong support	Extremely strong support

The table is a simplified representation. The ENFSI guideline uses log10 LR boundaries (1, 10, 100, 1,000, 10,000, 1,000,000) and includes a highest category sometimes rendered as 'conclusive support' or omitted as inappropriate. The UK FSR guidance acknowledges the ENFSI scale but notes that practitioners must be consistent within an accredited laboratory system and must state clearly which scale they are using. The AFSP scale, originally published in the context of Australian forensic biology practice, uses a similar logarithmic structure but collapses two ENFSI mid-range categories into one, reducing the total number of bands.

The rationale for verbal translation

Defenders of verbal scales offer several arguments. First, consistency: if all practitioners in a jurisdiction use the same scale, then 'strong support' means the same thing from laboratory to laboratory and from case to case. A defence expert can challenge the LR calculation without also having to fight a battle over what the words mean. Second, accessibility: factfinders who are not scientists can use verbal phrases as anchors for their deliberations in a way that bare numbers do not allow. Third, precedent: courts in several jurisdictions have accepted verbal evaluative conclusions under the same frameworks used for years in disciplines such as firearms and questioned documents, even before explicit LR frameworks were articulated.

The scale choice also encodes a philosophical position about the purpose of forensic evidence communication. Presenting both the numerical LR and the verbal phrase acknowledges that the number carries information the court needs, while the phrase assists in contextualising it. Presenting only the verbal phrase without the underlying number is a more contested approach: it withholds information and makes the opinion appear more categorical than the underlying probability calculation supports.

Several professional bodies, including the ENFSI working group on evaluative reporting and the Forensic Science International editorial board, have recommended that practitioners present both the LR value (or its log10) and the corresponding verbal label, disclose the scale they are using, describe the data underpinning the LR, and state the hypothesis pair the evaluation addresses. This combination treats the verbal phrase as a communication aid rather than a replacement for transparency.

Risks of misinterpretation

Verbal scales introduce specific misinterpretation risks that practitioners must address in their reports and testimony. The most frequently documented risks are described below.

Source probability conflation. 'Strong support for the prosecution hypothesis' can be read as 'there is a strong probability that the defendant is guilty'. The LR does not state this. It states how much more probable the evidence is under one hypothesis than another. The prior probability of guilt is the court's domain, not the scientist's.
Scale opacity. When only the verbal phrase appears in a report without the underlying number and scale citation, the court cannot assess whether the label is appropriate. An LR of 1,001 and an LR of 9,999 both attract 'strong support' under the ENFSI scale, yet their numerical difference may matter for the specific case.
Phrase strength inflation. Research by Martire and colleagues (2013, Australia) and Koppen and Saks (1996, US) found that lay readers interpret verbal probability phrases at higher numerical values than scientists intend. A phrase the scientist treats as corresponding to LR 100 to 1,000 may be read by a juror as implying near-certainty.
Cross-scale inconsistency. If two experts in the same trial use different scales, their verbal conclusions are not directly comparable. 'Moderately strong support' under ENFSI corresponds to LR 100 to 1,000; the AFSP scale has no equivalent category. Courts have no reliable way to reconcile this without the underlying numbers.
Hypothesis pair ambiguity. An LR and its verbal equivalent mean nothing without knowing what hypotheses were compared. 'Strong support for the prosecution hypothesis' where the prosecution hypothesis is 'this hair came from the defendant' is very different from the same phrase where the prosecution hypothesis is 'the defendant was present at the scene'.

Ongoing debates about standardisation

The existence of multiple competing scales is itself a problem. A practitioner in London using the UK FSR guidance and a practitioner in Amsterdam using the ENFSI scale will attach different verbal labels to the same LR value in some regions of the scale. A practitioner in Sydney using the AFSP scale produces conclusions that are not directly comparable to either. This inconsistency is precisely what verbal scales are supposed to prevent: it exists at a higher level, between institutions rather than between individual practitioners.

Two broad camps have formed in the literature. The harmonisation camp argues that all forensic science bodies should agree on a single scale, most likely a version of the ENFSI structure, and require accredited laboratories to use it. The transparency camp argues that verbal scales should be deprecated in favour of always presenting the LR numerically, with plain-language explanation of what an LR is, rather than allowing verbal shorthand that can conceal the underlying uncertainty. A third position, sometimes described as a hybrid approach, recommends presenting both the numerical LR and a verbal label from a disclosed scale, treating the verbal phrase as commentary on the number rather than a replacement for it.

Research by Biedermann, Taroni, and colleagues has argued that the choice of scale boundaries is not scientifically derived: there is no principled reason why 'moderate support' should end at LR 100 rather than LR 50 or LR 200. The boundaries reflect convention and convenience, not a calibrated relationship between numerical weight and verbal meaning. This critique does not necessarily invalidate verbal scales as communication tools, but it does challenge the claim that a particular scale is 'correct'. Practitioners should treat scale choice as a disclosure item, not a technical finding.

The debate has practical consequences in criminal proceedings. In India, the Bharatiya Sakshya Adhiniyam 2023 (which replaced the Indian Evidence Act 1872) continues to require that expert opinion be clearly explained and that the basis for the opinion be disclosed, creating a framework in which an undisclosed verbal scale would be problematic. In England and Wales, Criminal Procedure Rule 19.4 requires experts to state the basis and extent of their knowledge. In the United States, Daubert v. Merrell Dow (1993) and subsequent case law require that scientific testimony rest on sufficient facts or data and be the product of reliable principles. None of these frameworks prohibit verbal scales, but all of them demand that the scale, its basis, and its limitations be put before the court.

Applying a verbal scale in practice

A practitioner applying a verbal scale must complete several steps before the verbal phrase can appear in a report. First, the hypothesis pair must be defined precisely and stated in the report. The verbal conclusion is meaningless without it. Second, the LR must be calculated using data appropriate to the relevant population: for a source-level glass comparison, the reference database must reflect the glass distribution in the population of interest, not a generic international dataset if the case involves a specific regional source. Third, the LR value, or its log10, must be recorded in the report along with the verbal label.

The report must identify the scale by name and, where possible, by citation. 'Using the ENFSI seven-category verbal scale (ENFSI 2015, p. 16), an LR of 8,000 falls in the range corresponding to very strong support for the prosecution hypothesis' is a complete verbal conclusion. 'The evidence strongly supports the prosecution case' is not: it states a conclusion without any of the information needed to evaluate it.

Testimony in court adds further obligations. The practitioner should be prepared to explain what an LR is, what the verbal scale is, why those boundaries were chosen, what data the LR is based on, what assumptions were made, and what the limits of the conclusion are. Cross-examination on verbal scales often focuses on the arbitrariness of the category boundaries and on whether the underlying data are sufficient. A practitioner who has only learned the verbal label and not the reasoning behind it will not withstand that examination.

Accreditation standards in many jurisdictions now require laboratories to document their reporting conventions, including the verbal scale in use, as part of their quality management system. The UK Forensic Science Regulator's Codes of Practice and Conduct, the ILAC G19 guidelines for forensic science laboratories, and ISO/IEC 17025:2017 (which covers competence of testing and calibration laboratories) all create frameworks in which ad hoc verbal language is less defensible than a disclosed, documented scale.

Worked example

Applying the ENFSI scale to a glass comparison LR

A forensic scientist has evaluated glass fragments recovered from a suspect's clothing against glass from a broken window at a burglary scene. Follow the steps from LR calculation to verbal report conclusion.

The scientist has calculated an LR of 3,400 using refractive index measurements and a relevant reference database. The hypothesis pair is: H1 (prosecution) = 'the glass on the suspect's clothing came from the broken window'; H2 (defence) = 'the glass on the suspect's clothing came from some other source'. The scientist now must translate this LR into a verbal conclusion using the ENFSI scale.

Locate the LR on the scale. LR = 3,400 is between 1,000 and 10,000 (log10 LR = 3.53, between 3 and 4 on the log scale). Under the ENFSI 2015 seven-category scale, this range is labelled 'strong support'.
Draft the verbal conclusion. The scientist writes: 'The findings provide strong support for the proposition that the glass on the suspect's clothing came from the broken window rather than from some other source (LR approximately 3,400; ENFSI verbal scale category: strong support, corresponding to LR values between 1,000 and 10,000).'
State the basis. The report discloses the reference database used (e.g. the GRIM database or an in-house population study), the number of reference samples, the statistical method (e.g. kernel density estimation), and the assumed population from which the control glass came.
State the limitations. The report notes that the LR does not account for the possibility of transfer from a secondary source, that the reference database has a finite size that introduces uncertainty in the tails of the distribution, and that the evaluation addresses source only, not activity.
Verify the hypothesis direction. Before finalising, the scientist confirms the verbal phrase is applied in the correct direction: LR above 1 supports H1 (prosecution hypothesis), so 'strong support' applies to H1. If the LR had been 1/3,400, the phrase would be 'strong support for the defence hypothesis'.

Check your understanding

Question 1 of 4· 0 answered

A forensic scientist states in a report: 'The DNA evidence very strongly supports the prosecution case.' What critical information is missing from this conclusion?

Key Takeaways

Verbal LR scales translate numerical likelihood ratio ranges into standardised phrases to help courts interpret evaluative forensic conclusions without requiring statistical literacy, but the phrases must always be accompanied by the underlying LR value, the hypothesis pair, and the scale citation.
The three most widely cited scales are the ENFSI (2015) seven-category scale, the UK FSR guidance, and the AFSP Australia scale; all use logarithmic LR boundaries but differ in the number of categories and some boundary values, making cross-scale comparison unreliable without the underlying numbers.
Key misinterpretation risks include source probability conflation (reading 'strong support for the prosecution hypothesis' as 'high probability of guilt'), scale opacity (omitting the LR value and scale from the report), phrase strength inflation (lay readers interpreting verbal phrases at higher certainty than intended), and hypothesis pair ambiguity.
Scale boundaries are convention-based, not scientifically derived: there is no principled reason why one category ends at LR 1,000 rather than LR 500, and practitioners should treat scale choice as a disclosure obligation rather than a technical determination.
Accreditation standards including ISO/IEC 17025:2017, the UK FSR Codes of Practice, and ILAC G19 require laboratories to document their reporting conventions, including verbal scale choice, making undisclosed or ad hoc verbal language increasingly difficult to defend in court.

What is a verbal equivalent of the likelihood ratio?

A verbal equivalent is a standardised phrase, such as 'moderate support' or 'strong support', assigned to a numerical range of likelihood ratio (LR) values. Scales published by bodies such as ENFSI and the UK Forensic Science Regulator allow forensic scientists to communicate the weight of evidence in plain language, helping courts interpret evaluative conclusions without requiring numerical literacy.

Why do different institutions use different verbal LR scales?

No single scale has been adopted as a universal standard. ENFSI, the UK Forensic Science Regulator, AFSP Australia, and other bodies have each developed scales based on their own operational experience and research. The scales differ in the number of categories, the LR thresholds that divide them, and the wording used, which is itself a source of ongoing debate about consistency and potential misinterpretation.

What are the main risks of using verbal LR scales?

Key risks include: the verbal phrase may convey more or less certainty than the underlying number implies; jurors may interpret phrases such as 'strong support' as claiming certainty rather than a probability ratio; different practitioners using different scales produce non-comparable conclusions; and scale labels may be misread as expressing the probability of a proposition rather than the weight of evidence for it.

What does the ENFSI verbal LR scale look like?

The ENFSI Guideline for Evaluative Reporting (2015) recommends a seven-category scale. LR values below 1 support the defence hypothesis, and values above 1 support the prosecution hypothesis, with categories running from 'limited support' (LR 1 to 10) through 'moderate', 'moderately strong', 'strong', 'very strong', 'extremely strong', to 'conclusion' at the highest LR values. The exact boundaries vary between disciplines.

Can a court reject a forensic opinion expressed using a verbal LR scale?

Courts in multiple jurisdictions have scrutinised evaluative reports that use verbal LR scales, sometimes finding them unclear or insufficiently explained. In England and Wales, the Criminal Procedure Rules require expert evidence to state the range of opinion and the basis for it. A scientist who presents a verbal LR phrase without explaining the underlying scale, the data it rests on, and its limitations may face challenge under those rules or their equivalents elsewhere.

Test yourself on Forensic Statistics with free, timed mocks.

Practice Forensic Statistics questions

Found this useful? Pass it along.

Spotted an error in this page? Report a correction or read our editorial standards.

Key Takeaways

Your journey to becoming a forensic professional starts here.