Conclusion Scales: SWGDOC, ENFSI and Courtroom Language

The conclusion-scale debate that defines how every examiner's opinion lands in court: the SWGDOC nine-point scale (identification, strong probability, probability, indications, no conclusion, indications did not, probably did not, strong probability did not, elimination), the ENFSI 'Standard for Formulation of Evaluative Forensic Science Expert Opinions' and the likelihood-ratio framework gaining ground in Europe, the courtroom-language translation problem (how a 'strong probability' opinion is heard by a jury), and the PCAST 2016 call for population-frequency anchoring that the field is still working through.

Last updated: 19 Jun 2026

Forensic document examiners use structured conclusion scales to express opinions about authorship in a form that courts can evaluate. The two dominant frameworks are the SWGDOC nine-point verbal scale, which runs from "identification" through "no conclusion" to "elimination" without numerical anchors, and the ENFSI evaluative standard, which requires examiners to reason under two explicit competing hypotheses and express the result on a verbal likelihood-ratio scale anchored to stated numerical intervals. A third persistent problem sits alongside both: empirical research consistently shows that jurors interpret verbal probability phrases such as "strong probability" as expressing far higher certainty than the scales intend, a gap no conclusion framework has fully resolved. The PCAST 2016 report identified the absence of validated population-frequency data as the specific limitation preventing the "identification" claim from being numerically defended.

How a forensic examiner expresses their opinion in court is as consequential as the examination itself. The conclusion scale chosen carries epistemological commitments about what forensic evidence can establish, and it creates communication risks that courtrooms do not consistently manage. This gap between scientific intent and jury perception has generated sustained controversy across questioned document examination over the past two decades.

Key takeaways

The SWGDOC nine-point scale runs from Identification (position 1) through No Conclusion (position 5, a legitimate finding) to Elimination (position 9), with seven intermediate probability positions carrying no numerical anchors.
The ENFSI "Standard for Formulation of Evaluative Forensic Science Expert Opinions" (2015, revised 2016) uses a likelihood-ratio verbal scale anchored to stated numerical intervals (e.g., "very strong support" corresponds to LR 100-10,000).
PCAST 2016 found handwriting examination had foundational validity but that "validity as applied" was not established due to insufficient large-scale black-box examiner studies measuring error rates under realistic conditions.
Empirical studies consistently show jurors interpret "strong probability" as near-certainty (85-99%), well above the sub-identification confidence the SWGDOC scale intends the phrase to convey.
Indian CFSL laboratories broadly follow a SWGDOC-style graduated verbal scale; the Bharatiya Sakshya Adhiniyam 2023 imposes no specific conclusion format, leaving the choice to institutional SOP.

Three frameworks are in active use internationally. In North America, the nine-point SWGDOC (Scientific Working Group for Forensic Document Examination) scale, inherited from the American Board of Forensic Document Examiners and ASQDE traditions, organises opinions from "identification" at one pole to "elimination" at the other, with seven intermediate positions that describe degrees of probability without quantifying them. In Europe, the ENFSI (European Network of Forensic Science Institutes) "Standard for Formulation of Evaluative Forensic Science Expert Opinions" (2015, revised 2016) defines a different framework built on the likelihood-ratio approach, in which the examiner asks how much more likely the observations are under one hypothesis than another and expresses the result on a verbal equivalence scale anchored to stated numerical intervals. In India, the Forensic Science Laboratories affiliated with CFSL New Delhi and CFSL Hyderabad largely follow the SWGDOC-style graduated verbal scale, though the Bharatiya Sakshya Adhiniyam 2023 expert-witness provisions impose no specific conclusion scale format, leaving the choice to institutional SOP.

The conclusion scale debate is inseparable from the admissibility and testimony framework that courts impose: expert-witness testimony and cognitive bias mitigation covers how the scales are presented and challenged under Daubert, CrimPR Part 19, and the BSA 2023. The same likelihood-ratio debate runs through fingerprint individualization statistics, where the post-NAS debate over population-frequency anchoring mirrors the PCAST 2016 challenge to handwriting identification claims. Laboratory accreditation requirements that govern how conclusions are reported are detailed under ISO 17025, NABL, ASCLD-LAB and proficiency testing.

Underneath the framework debate lies a communication problem that no conclusion scale has fully solved: the words examiners choose are heard differently by juries than they were intended by the authors of the scales.

By the end of this topic you will be able to:

Distinguish between the nine positions of the SWGDOC conclusion scale, state what each position claims epistemologically, and identify the conditions under which each is appropriate.
Explain the ENFSI likelihood-ratio evaluative framework, including how competing hypotheses are framed and how the verbal LR scale maps to numerical intervals.
Describe the courtroom-language translation problem, citing the empirical finding on juror interpretation of 'strong probability', and outline the responses proposed by laboratories and regulators.
State the specific findings of the PCAST 2016 report on handwriting examination and explain what 'foundational validity' versus 'validity as applied' means for conclusion-scale claims.
Construct a compliant forensic document examination report structure across SWGDOC, ENFSI, and CFSL frameworks, identifying the required sections and how each conclusion scale changes the evaluation section.

The SWGDOC Nine-Point Scale

The Scientific Working Group for Forensic Document Examination produced its nine-point scale as a consensus standard to harmonise the conclusion language that US and Canadian document examiners had used inconsistently for decades. The scale runs from positive identification to definitive elimination, with seven intermediate positions in between. Its nine positions, stated in the order they appear on the positive side through to the negative side, are: identification; strong probability; probability; indications; no conclusion; indications did not; probably did not; strong probability did not; and elimination.

"Identification" means the examiner has found individualising agreement in all significant habitual writing characteristics between the questioned writing and the known standard, with no significant unexplained differences, to a degree that supports the conclusion that they share a common source. The examiner is stating that, to the degree supportable by the methodology and the quality of the material examined, the same writer produced both. Critically, "identification" does not mean mathematical certainty: no forensic handwriting identification system has published population-frequency data sufficient to assign a numerical error rate to this opinion. The PCAST 2016 report identified this gap explicitly and recommended that large-scale black-box proficiency studies be conducted before the identification claim is made at the most certain end of any scale.

The intermediate positions require careful definition. "Probability" (position 3) and "strong probability" (position 2) both indicate that significant agreement supports a common source, but not to the degree that would support a definitive identification. The examiner typically reaches these positions when the known standards are limited in quantity, the questioned writing is brief, or natural variation in the writer's hand creates ambiguity. "Indications" (position 4) signals that some agreement is present but that the evidence is insufficient to reach even a probable conclusion.

"No conclusion" (position 5, the scale's midpoint) is a finding, not a failure. It means that the evidence before the examiner is equally consistent with either a common source or different sources. It may reflect degraded writing quality, insufficient exemplars, or a combination of similarities and differences that genuinely balance.

The negative positions mirror the positive side. "Elimination" (position 9) is the most definitive negative finding and carries the same evidentiary weight as identification. An examiner should only reach elimination when they have found fundamental differences in writing habits that cannot be attributed to disguise, natural variation, or changed writing conditions.

The SWGDOC nine-point conclusion scale from identification (positive pole) through no conclusion (midpoint) to elimination (negative pole). Positions 2-4 and 6-8 are intermediate probability statements that require explicit qualification in a report.

The ENFSI Likelihood-Ratio Framework

The European Network of Forensic Science Institutes published its "Standard for Formulation of Evaluative Forensic Science Expert Opinions" in 2015, with a revised edition in 2016, as a framework intended to replace verbal probability scales with a more formally structured approach. The ENFSI standard does not require examiners to compute a numerical LR in all cases: it requires them to reason explicitly about two competing hypotheses and to express the direction and approximate magnitude of the support their findings provide for one hypothesis versus the other.

The likelihood ratio (LR) is defined as the probability of the observed evidence given the prosecution hypothesis (Hp) divided by the probability of the same evidence given the defence hypothesis (Hd). In a handwriting context, Hp might be "the questioned letter was written by the defendant" and Hd might be "the questioned letter was written by some other person drawn from a relevant population." An LR greater than one supports Hp; an LR less than one supports Hd; an LR equal to one provides no support for either hypothesis.

Because numerical LR values are rarely computed from validated population databases in the handwriting domain (this is precisely what PCAST 2016 noted as missing), the ENFSI framework uses a verbal equivalence scale that maps approximate LR ranges to standard phrases. The ENFSI verbal scale, from strongest support for Hp downward, runs: "extremely strong support" (LR greater than 10,000); "very strong support" (1,000 to 10,000); "strong support" (100 to 1,000); "moderate support" (1 to 10); and the mirror image for Hd support. The scale is anchored to numerical intervals that are stated explicitly in the report, so the reader understands the examiner's approximate numerical intent even when the number itself is not computed.

The LR approach has been more fully operationalised in forensic voice comparison (e.g. at Netherlands Forensic Institute and University College London's Phonetics Research Laboratory) and in DNA mixture interpretation than in handwriting, where the absence of validated population-frequency databases for writing habits makes quantitative LR computation difficult. The ENFSI Forensic Document Examiners Working Group has published best practice manuals and is actively working toward the validation studies that would support more formal LR use in the discipline.

ENFSI verbal LR scale: eight positions from 'extremely strong support for Hp' (LR above 10,000) through 'neutral' (LR = 1) to 'very strong support for Hd' (LR below 1/1,000), each anchored to a stated numerical interval that must appear in the report.

The Courtroom-Language Translation Problem

The courtroom-language translation problem has been studied empirically in several contexts. Work by Friedman and colleagues (2001), by Nordby (2002), and more recently by work commissioned by the Law Commission of England and Wales (2011 expert evidence review) has consistently found that jurors and even legal professionals assign probability values to forensic verbal expressions that differ substantially from the values the authors of those scales intended.

In a representative experimental design, mock jurors are presented with verbal conclusion statements drawn from standard forensic-science scales and asked to assign a numerical probability (0 to 100 per cent) to the underlying claim. The phrase "strong probability" consistently receives estimates in the 85 to 99 per cent range from juror cohorts, a range that overlaps substantially with the probability range most people associate with near-certainty. The SWGDOC scale does not define "strong probability" numerically; the phrase is intended to convey something well below certainty. The gap between intent and reception is the translation problem.

Several responses to this problem have been proposed. Some laboratories have moved to language that explicitly anchors the opinion: "The observations are approximately X times more likely if the questioned and known writings share a common author than if they do not" (the ENFSI-style formulation). Others have added numerical qualifier paragraphs to reports, explaining that "strong probability" corresponds to the examiner's estimate that, based on the strength of the agreement, the probability of a common source is substantially higher than chance but does not reach the identification threshold. The UK Forensic Science Regulator's guidance on evaluative reporting (2020) recommends the LR-style approach for new methods, while acknowledging that the handwriting database required to fully implement it does not yet exist.

In US federal courts, the Daubert analysis applied to document-examination testimony after 2000 has forced several examiners to defend the empirical basis for their conclusion language under cross-examination. The most sustained attack came in a series of cases following the 1999 Kumho Tire decision, in which document examiners were asked to identify the error rate for their opinions. Because the SWGDOC scale carries no numerical error-rate anchors, some courts admitted opinions only at the "probability" or "strong probability" level rather than "identification," on the grounds that the identification claim exceeded what the validated literature would support.

In India, the relevant cases are less documented in the published scientific literature, but the Allahabad High Court and the Delhi High Court have admitted CFSL document examination reports using the graduated verbal scale without, to date, subjecting the numerical underpinning to a Daubert-equivalent challenge. The Bharatiya Sakshya Adhiniyam 2023 does not specify a conclusion scale format, leaving the matter to CFSL and state FSL SOPs.

PCAST 2016 and the Population-Frequency Gap

The President's Council of Advisors on Science and Technology published "Forensic Science in Criminal Courts: Ensuring Scientific Validity of Feature-Comparison Methods" in September 2016. The report evaluated eight pattern-comparison forensic disciplines against a standard it labelled "foundational validity as applied": the requirement that a discipline be validated through appropriate black-box studies measuring error rates under realistic conditions, and that practitioners demonstrably achieve those rates.

PCAST found that handwriting examination had "foundational validity established" based on the Srihari 2002 CEDAR study and related work, but that "validity as applied" was not established because there were insufficient well-designed studies measuring the error rates of practicing examiners under realistic casework conditions. The report called specifically for large-scale black-box studies using realistic materials and examiner populations.

For conclusion scales, the PCAST critique has a specific implication. The identification claim, sitting at the positive pole of the SWGDOC scale, implicitly asserts that the observations are so characteristic of the writer that no other writer could have produced them. This is a population-frequency claim: it requires knowledge of how often these specific feature combinations appear in the relevant writing population. Without population-frequency databases comparable to those developed for DNA (CODIS database allele frequencies) or latent fingerprints (NIST ELFT accuracy studies), the identification claim rests on the examiner's subjective sense of feature rarity rather than on measured frequency.

The OSAC (Organisation of Scientific Area Committees) Documents Subcommittee has responded to the PCAST critique by prioritising the development of validation study protocols and by revising practice standards to require more explicit documentation of the basis for conclusions. The NIST-sponsored 2020 report "Forensic Handwriting Examination and Human Factors" surveyed the existing literature and identified specific gaps still to be addressed, including the need for studies with larger examiner samples and more diverse questioned-document materials.

Applying Conclusion Scales: Three Jurisdiction Comparisons

In the United States, the American Board of Forensic Document Examiners (ABFDE) and the ASQDE have both endorsed the nine-point SWGDOC scale as the standard for members. ABFDE certification requires examiners to demonstrate familiarity with the scale and with the conditions under which each conclusion is appropriate. The FBI Questioned Documents Unit and the US Secret Service Forensic Services Division use SWGDOC-based conclusion language in their reports. Federal court reports typically include a conclusion section that states the examiner's finding in SWGDOC terms, followed by an explanatory paragraph that describes the features that support the conclusion. Under Daubert challenges, examiners have increasingly added language that acknowledges the absence of numerical error-rate data.

In the United Kingdom, the Forensic Science Regulator's Codes of Practice and Conduct (updated through 2021 and 2023) require that forensic reports include a "statement of findings" and an "evaluative opinion" where the two are distinguishable. The evaluative opinion should, where possible, be expressed in the LR framework or with an explicit statement of the hypotheses under evaluation. UK handwriting experts accredited through the Forensic Science Regulator's framework are increasingly expected to frame conclusions in ENFSI-compatible language, particularly since UKAS (United Kingdom Accreditation Service) has aligned its accreditation criteria with ISO 17025, which requires documented uncertainty estimation for quantitative methods. Several UK document examiners now append an explicit "verbal scale equivalence" statement to LR-framed conclusions, helping non-specialist readers translate the opinion.

In India, CFSL New Delhi's document examination division uses a graduated verbal scale that maps broadly onto the SWGDOC structure, with typical conclusion categories including "the questioned writing is that of X" (equivalent to identification), "the questioned writing was in all probability written by X" (approximately SWGDOC strong probability), and "no definite opinion can be expressed" (equivalent to SWGDOC no conclusion). State FSL reports follow similar conventions, though without uniform national standardisation. The NABL T-126 accreditation criteria for forensic science laboratories do not mandate a specific conclusion scale; they require that reports be technically accurate and that uncertainty be addressed where relevant. The Bharatiya Sakshya Adhiniyam 2023 carries forward the Indian Evidence Act's expert-witness provisions, which require the expert to state the basis for their opinion but do not mandate a particular conclusion format.

Writing the Report: Practical Guidance Across Scales

A forensic document examination report must do several things simultaneously. It must identify the materials examined. It must describe the examination process, including the instruments used, the illumination conditions, the exemplars compared, and the methodology applied. It must state the findings, meaning what the examiner observed in both the questioned and the known writings, including similarities and differences. It must state the conclusion, in a form compatible with the conclusion scale the examiner is using. And it must qualify the conclusion, explaining any limitations imposed by the quality of the material, the amount of comparable text, or the availability of exemplars.

The failure mode in underqualified reports is common and consequential. An examiner who states "I have compared the questioned signature against twenty-three known signatures from the defendant and conclude it is the defendant's signature" without addressing the quality of the comparison material, the presence of natural variation in the defendant's signing habit, or the degree of difference detected alongside the agreement, has produced a conclusion that opposing counsel can challenge on any of those unaddressed grounds. The resulting cross-examination often damages not just the conclusion but the examiner's credibility.

Good report structure follows the examination sequence and makes each step transparent. The SWGDOC practice standards and the ENFSI Best Practice Manual for Handwriting Examination both include model report structures. Both recommend that the examiner's notes (from which the report is derived) be retained and available for disclosure, and both treat the conclusion as a derivative of the documented examination, not a standalone judgment.

Material identification
List every document examined: reference number, number of pages, condition, date received, and chain of custody. Identify which items are the questioned documents and which are the known standards.
Examination description
Describe the instruments used (VSC model, magnification, illumination modes), the examination sequence (analysis of the questioned writing before comparison with the known), and any limitations encountered (poor photocopy quality, insufficient comparable text, altered writing conditions).
Findings
State what was observed: the specific writing features examined, the agreements noted, the differences noted, and whether any differences require explanation. Both agreements and differences must be documented.
Evaluation
Explain how the balance of agreements and differences supports the conclusion reached, addressing the alternatives. For LR-framed reports, state the hypotheses evaluated and the direction of support.
Conclusion
State the conclusion in the appropriate scale language. For SWGDOC, name the position and its plain-English meaning. For ENFSI/LR, state the verbal equivalence and approximate LR interval. Qualify the conclusion explicitly if limitations reduced the strength of the opinion.
Qualifications and limitations
Identify any factor that limited the examination or reduced the examiner's confidence: degraded original, photocopy-only comparison, insufficient exemplar quantity, dissimilar writing conditions between questioned and known.

Framework	Anchor concept	Number of positions	Numerical grounding	Primary jurisdictions
SWGDOC nine-point	Verbal probability relative to identification / elimination	9	None specified; PCAST identifies this as a gap	US, Canada, India (broadly)
ENFSI verbal LR scale	Likelihood ratio under two stated hypotheses	7 (with numerical LR intervals)	LR intervals stated (often estimated, not computed)	UK, Netherlands, Germany, EU labs
ABFDE / ASQDE scale	Effectively SWGDOC; endorsed by professional bodies	9	None; same PCAST critique applies	US (ABFDE and ASQDE members)
CFSL graduated verbal	Qualitative match / probable match / no conclusion	Typically 4-5	None	India (CFSL New Delhi, CFSL Hyderabad, state FSLs)

Key terms

SWGDOC nine-point scale: The conclusion scale developed by the Scientific Working Group for Forensic Document Examination, running from 'identification' (most positive) through 'no conclusion' (midpoint) to 'elimination' (most negative), with seven intermediate probability positions.
Identification (SWGDOC): The most positive conclusion on the SWGDOC scale, meaning the examiner has found individualising agreement in all significant habitual writing characteristics, with no significant unexplained differences, sufficient to conclude common authorship.
Elimination (SWGDOC): The most definitive negative conclusion on the SWGDOC scale, meaning fundamental differences in writing habits that cannot be attributed to disguise, natural variation, or changed conditions conclusively exclude common authorship.
Likelihood ratio (LR): The ratio of the probability of the observed evidence given one hypothesis (typically prosecution hypothesis) to the probability of the same evidence given the alternative hypothesis (typically defence). LR greater than 1 supports the prosecution hypothesis.
ENFSI evaluative standard: The European Network of Forensic Science Institutes 'Standard for Formulation of Evaluative Forensic Science Expert Opinions' (2015/2016), which requires examiners to reason explicitly about two competing hypotheses and to express their opinion as support for one hypothesis relative to the other, using a verbal LR scale anchored to numerical intervals.
Courtroom-language translation problem: The empirically documented gap between what forensic verbal conclusion terms (such as 'strong probability') are intended to convey by examiners and what those same phrases are understood to mean by juries and legal professionals.
PCAST 2016 report: The President's Council of Advisors on Science and Technology 2016 report 'Forensic Science in Criminal Courts: Ensuring Scientific Validity of Feature-Comparison Methods,' which found that handwriting examination had foundational validity but that 'validity as applied' was not established due to insufficient black-box examiner studies.
Foundational validity as applied: PCAST's standard requiring that a forensic discipline be validated through black-box studies measuring real-examiner error rates under realistic conditions, not just laboratory demonstration of the underlying science.
Population-frequency anchoring: The practice of grounding a conclusion about feature rarity in empirically measured frequency data for those features in the relevant writing population, analogous to allele frequency tables in DNA analysis. Currently absent for most handwriting features.
No conclusion (SWGDOC position 5): The midpoint of the SWGDOC scale, indicating that the evidence is equally consistent with a common source or different sources. It is a considered finding, not a failure to examine, and arises legitimately from limited materials or balanced evidence.

Practice

Question 1 of 5· 0 answered

An examiner compares a questioned ransom note against court-ordered request writings from a suspect. The note shows consistent slant and similar letter proportions, but the exemplar quantity is limited to one page and several features show unexplained differences. Which SWGDOC conclusion position is most appropriate?

Worked example

Will dispute, translating a CFSL 'in all probability' opinion into court-usable evidence

The CFSL report says 'in all probability.' The defence asks what that actually means numerically. The examiner has to answer.

Scene: A Delhi High Court probate dispute. The CFSL New Delhi report states: "the disputed signature on the 2019 will was in all probability written by the testator." Defence counsel applies to exclude or limit the opinion, arguing the phrase is scientifically unanchored.

Step 1 (Mapping the phrase): The examiner gives evidence that "in all probability" corresponds to SWGDOC position 2 (Strong Probability), meaning significant agreement in individualising features with no unexplained differences, but not reaching the identification threshold due to limited exemplar quantity (only six course-of-business signatures available, all from 2015-2016, four years before the will).

Step 2 (ENFSI equivalence): The examiner explains that under the ENFSI verbal LR scale, the same findings would be expressed as "strong support" for the hypothesis that the testator signed, corresponding approximately to LR 10-100. The examiner acknowledges this is a professional estimate, not a computed ratio from population data, because validated Indian signature frequency databases do not exist for this specific combination of features.

Step 3 (Juror translation risk): The examiner is asked whether the court might understand "in all probability" as meaning 95 percent certainty. The examiner addresses this directly, citing the courtroom-language translation literature: the phrase is intended to convey something below identification, not near-certainty. The limiting factor, a four-year exemplar gap and small exemplar set, is the primary reason the conclusion does not reach identification.

Conclusion: The court admits the opinion but directs the examiner's limiting paragraph to be read to the jury alongside the conclusion. The defence expert's counter-opinion uses ENFSI language but reaches the same approximate LR range. Both experts converge on "strong support" or its SWGDOC equivalent. The court finds the opinion useful and properly qualified.

Can a forensic document examiner decline to give a conclusion and report only observations?

Yes, and in some cases this is the professionally correct course. If material quality is insufficient, the exemplar base is inadequate, or the balance of agreements and differences genuinely does not support any scale position, the examiner states SWGDOC No Conclusion (position 5) and explains why. Examiners are not obligated to force a positive or negative conclusion because the instructing party wants one. No Conclusion with a full explanation is a legitimate and important finding, not a failure.

Does the ENFSI likelihood-ratio framework require the examiner to compute an actual number?

Not always. The ENFSI standard requires reasoning in terms of two competing hypotheses and expressing the direction and approximate magnitude of support. Where validated population databases exist (DNA mixture interpretation, forensic voice comparison), a numerical LR can be computed. In handwriting examination, where those databases are not yet available, the examiner estimates a verbal LR category based on assessed feature rarity, with explicit acknowledgment that the LR is an estimate, not a computed value.

Is the SWGDOC nine-point scale still in use now that SWGDOC has been absorbed into OSAC?

SWGDOC ceased independent activity when OSAC (Organisation of Scientific Area Committees) was established under NIST in 2014. The OSAC Documents Subcommittee is now the standards body for US forensic document examination and continues to draw on the SWGDOC framework. The nine-point scale remains in common use in the US and Canada. Whether OSAC will revise it to incorporate LR-style language or population-frequency anchoring in response to PCAST 2016 is an active question in the standards pipeline.

How do laboratory accreditation requirements affect how examiners word their conclusions?

In England and Wales, the FSR Codes of Practice (updated through 2023, now with statutory authority under the Forensic Science Regulator Act 2021) require accredited providers to express evaluative opinions in a form that states the hypotheses under evaluation, moving toward ENFSI-compatible language. UKAS accreditation against ISO 17025 is the operational mechanism. In India, NABL T-126 does not mandate a specific conclusion scale format. The full accreditation framework is covered under [quality systems: ISO 17025, NABL, ASCLD-LAB and proficiency testing](/topics/questioned-document/quality-systems-iso-17025-nabl-ascld-lab-and-proficiency-testing).

Test yourself on Questioned Document with free, timed mocks.

Practice Questioned Document questions

Found this useful? Pass it along.

Spotted an error in this page? Report a correction or read our editorial standards.

Your journey to becoming a forensic professional starts here.