SCAN Statement Analysis: Claims, Methods, and the Scientific Critique

Scientific Content Analysis (SCAN) claims that deceptive statements have detectable linguistic patterns, but empirical testing consistently finds its accuracy at or below chance, raising serious questions about its continued use in investigations.

Last updated: 19 Jun 2026

Scientific Content Analysis (SCAN), developed by Avinoam Sapir, is a proprietary statement-analysis method that claims to identify deception in written accounts through approximately fifteen linguistic indicators, including pronoun shifts, hedging language, missing time, and extraneous information. Controlled empirical studies testing SCAN against independently verified ground truth have consistently found its accuracy at or below chance, with no peer-reviewed study demonstrating above-chance performance at a statistically significant level. SCAN has not been admitted as expert evidence under any rigorous reliability standard, including the US Daubert framework, yet it continues to be used as an investigative tool, which can still shape which suspects are interviewed and how.

Avinoam Sapir's premise was that lying is cognitively harder than telling the truth, and that extra effort leaves traces in the words a person writes. Where a truthful writer draws on genuine memory, a deceptive writer must simultaneously manage what they claim to remember, what they claim not to remember, and how to make a fabricated account feel real. Sapir argued that this extra cognitive load produces systematic, detectable differences in language, and he formalised those differences into a training programme and consultancy. Tens of thousands of investigators have since been trained in Scientific Content Analysis.

The appeal is straightforward. Most of what investigators do before an arrest is work with language: witness statements, tip-offs, suspect accounts, denial letters. A systematic method for reading those documents and identifying deception would be practically useful. The question forensic linguists ask is whether SCAN is that method, or whether it is a structured way of imposing a narrative on a text that empirical scrutiny has consistently failed to validate.

This topic covers what SCAN claims, the specific indicators it proposes and the reasoning behind them, and the empirical evidence accumulated since controlled testing began in the 1990s. The research verdict is clear and consistent. A secondary question concerns why the technique survives in practice, and what that persistence reveals about how investigative methods are evaluated in real-world contexts.

By the end of this topic you will be able to:

Describe the core claims and proposed indicators of SCAN (Scientific Content Analysis) and the cognitive reasoning Sapir uses to justify them.
Explain what controlled empirical studies found when SCAN was tested against ground-truth-verified statements, and why those results undermine the method's validity.
Distinguish between linguistic description of a text and validated deception detection, and apply that distinction to evaluate a SCAN analyst's conclusion.
Identify the structural and institutional factors that allow a technique with no demonstrated accuracy to persist in investigative practice.
Assess a reported pronoun shift or hedging instance against the full range of mundane and cultural explanations before drawing any inference about veracity.

Key terms

SCAN (Scientific Content Analysis): A proprietary statement-analysis method developed by Avinoam Sapir that claims to identify deceptive content in written statements through analysis of linguistic features including pronoun use, missing time, and unexplained information changes.
Pronoun shift: A SCAN indicator based on the claim that deceptive writers drop first-person pronouns or shift to third-person reference when describing events they are fabricating, because they are not drawing on genuine first-person memory.
Lack of conviction: A SCAN category covering hedging phrases like 'I think', 'I believe', 'I don't remember', which Sapir proposes signal deception because a person certain of the truth would state it without qualification.
Extraneous information: Information in a statement that the SCAN analyst judges to be beyond the stated scope of the question. SCAN treats such additions as potentially significant: people include irrelevant detail to distract or to over-establish innocence.
Ground truth: In deception-detection research, independently verified knowledge of whether a statement was truthful or deceptive, established through confession, DNA, or other objective means, against which a method's accuracy can be measured.
Base rate problem: The statistical challenge facing any deception-detection method: if only a minority of statements examined are actually deceptive, even a method that is slightly better than chance can produce large numbers of false positives in practice.

What SCAN claims and how it works

SCAN analysis starts by having the subject write a statement in their own words, without guidance or prompting, describing the event in question. The analyst then reads the statement looking for approximately twelve proposed indicators. Sapir's reasoning is broadly cognitive: a person telling the truth produces a statement organised around what they actually remember. A person constructing a false account must manage what they claim to remember, what they claim not to remember, and how to make a fabricated account feel genuine, all at once. This extra cognitive load, Sapir argues, leaves traces in the text.

The pronoun shift indicator is central to the method. SCAN predicts that deceptive writers avoid first-person singular pronouns ('I') when describing the events they are lying about, and may shift to 'we' or drop the subject altogether ('Then went to the car'). The reasoning is that using 'I' claims personal agency and first-person experience: a liar supposedly avoids it at the precise moment of fabrication. Similarly, 'lack of conviction' language (hedges, memory disclaimers) is treated as suspicious because an honest person claiming a clear memory would not hedge.

Key SCAN indicators and their claimed cognitive basis.

The empirical critique: what controlled studies found

Aldert Vrij, a professor of applied social psychology at the University of Portsmouth, led the most sustained empirical programme testing SCAN. In a series of studies in the 1990s and 2000s, Vrij and colleagues had participants produce truthful and deceptive accounts, then gave the statements to SCAN-trained analysts, untrained raters, and sometimes to automated analysis. Results across studies converged: SCAN analysts performed at chance (50 percent accuracy in a two-option true/false task) and sometimes below it.

A 2016 study by Bogaard, Meijer, Vrij, and Merckelbach published in Frontiers in Psychology directly tested whether the SCAN indicators appeared more often in deceptive statements than truthful ones. Most did not. Pronoun shifts and hedging language appeared with similar frequency in truthful accounts. The indicator that came closest to significance was 'missing time', but even this showed only a weak and unstable correlation. A 2020 systematic review by ten Brinke and colleagues found no peer-reviewed study demonstrating above-chance SCAN accuracy.

The theoretical foundations are equally weak. SCAN assumes that deceptive and truthful writers consistently differ in their use of specific features. But linguistic variation is driven by many factors: education, first language, topic familiarity, emotional state, cultural norms around assertion and hedging. A writer who habitually hedges for politeness, or who genuinely does not remember a peripheral detail, will produce 'lack of conviction' language that SCAN flags as deceptive. The signal SCAN is looking for, if it exists at all, is completely obscured by the noise of ordinary linguistic variation.

Why SCAN survives in practice

SCAN's continued use in investigative practice, particularly in US law enforcement and in Dutch and Israeli police contexts, represents a genuine puzzle given the strength of the empirical evidence against it. Several factors explain the persistence.

Proprietary structure: the SCAN course is taught and licenced by Sapir's organisation. Academic critics do not have access to the full claimed methodology. This insulates the technique from peer review.
Selection bias in practitioner experience: investigators remember and recount cases where a SCAN flag preceded a confession or breakthrough. Cases where SCAN flagged nothing significant, or flagged the wrong person, are rarely documented or shared.
Confirmation bias in application: once an analyst has identified a SCAN 'hit', investigators may pursue the flagged person more intensively, increasing the likelihood of eventually obtaining corroborating evidence or a confession, which retrospectively validates the original flag.
Absence of a feedback mechanism: in most investigative contexts there is no systematic comparison of SCAN outputs against eventual trial outcomes, so the failure rate is invisible to the practitioners using it.

The self-reinforcing feedback loop that sustains SCAN in practice: an initial flag triggers intensive investigation, which inflates the apparent hit rate while the failure rate stays invisible.

Linguistic description versus validated deception detection

The critical conceptual divide between forensic linguists and SCAN practitioners is the distinction between linguistic description and validated deception detection. A linguist can describe a statement: this text uses fewer first-person pronouns than a typical account; this passage contains a higher-than-average density of hedging language; this section is organised differently from the surrounding narrative. Those observations can be accurate and useful.

What a linguist cannot do, on the basis of those observations alone, is conclude that the writer is lying. Moving from description to detection requires a validated empirical relationship between the described feature and ground-truth deception, and that relationship has not been established for any of SCAN's indicators. This is not a theoretical objection peculiar to linguistics. It is the same standard applied to fingerprint comparisons, DNA mixture interpretation, and bite-mark analysis: show your error rate, and demonstrate that your conclusions are grounded in a reproducible, testable method.

Worked example

Analysing a SCAN 'pronoun shift' flag

The same linguistic feature, three very different explanations.

A written statement contains the following passage. The suspect is describing their movements on the night in question.

I got home around nine. Had something to eat, watched TV. Then we went out again around eleven.

A SCAN analyst would flag the shift from 'I' to 'we' as potentially significant: the subject began with first-person singular and shifted to plural at the point describing going back out, suggesting possible concealment of who they were with or what they were doing during that second outing.

Deception interpretation (SCAN): the shift to 'we' signals the subject is distancing themselves from sole responsibility for the later outing, or concealing a companion.
Mundane linguistic explanation: the subject was alone earlier in the evening and left with a flatmate or partner later; 'we' is simply accurate. The subject did not explain the transition because it felt obvious to them.
Cultural/stylistic explanation: in many dialects and registers, shifting between 'I' and 'we' is a normal grammatical feature of informal narrative, especially when moving between individual action and shared action. It does not signal deception; it signals a change in social context.

The SCAN analyst is choosing the deception interpretation and treating the other two as less likely without any evidence-based basis for that preference. A forensic linguist describing the same passage would note the pronoun shift, would identify the three possible explanations, would note that the mundane explanations are at least as plausible as the deception one, and would conclude that the feature alone gives no basis for a conclusion about veracity.

Check your understanding

Question 1 of 4· 0 answered

What is the central empirical finding from controlled studies of SCAN accuracy?

Key Takeaways

SCAN, developed by Avinoam Sapir, claims that deceptive written statements contain detectable linguistic features including pronoun shifts, hedging, missing time, and extraneous information.
Controlled empirical studies with known ground truth consistently find SCAN accuracy at or below chance, with no peer-reviewed study demonstrating above-chance performance.
SCAN continues in investigative practice due to its proprietary structure, practitioner selection bias, and the absence of systematic feedback about its failure rate.
The distinction between linguistic description (what a text does) and validated deception detection (what the features prove) is the critical conceptual boundary SCAN fails to respect.
SCAN has not been admitted as expert evidence under any rigorous reliability standard; its use remains investigative rather than evidential, but it can still shape who is investigated and how.

What is SCAN and who developed it?

Scientific Content Analysis was developed by Avinoam Sapir, an Israeli police investigator who later established the Laboratory for Scientific Interrogation in Phoenix, Arizona. Sapir argues that deceptive written statements produce characteristic linguistic patterns that a trained analyst can detect. The method is sold as a proprietary training course and has been widely adopted by police services, particularly in the United States and the Netherlands.

What are the main SCAN indicators of deception?

SCAN identifies about fifteen proposed indicators including: pronoun shifts (switching from 'I' to 'we' or dropping pronouns when describing contested events); lack of conviction language (phrases like 'I don't remember'); information judged to be extraneous; missing time in the account; and changes in language describing relationships. Each is claimed to signal concealment or fabrication.

What does the scientific research say about SCAN's accuracy?

Studies by Aldert Vrij and others testing SCAN accuracy against independently verified ground truth found the method performing at or below chance. Trained SCAN analysts did no better than untrained raters. A systematic review found no peer-reviewed study demonstrating above-chance accuracy at a statistically significant level.

If SCAN does not work, why do practitioners still use it?

Several reasons: the training produces confident pattern-recognition that feels useful; selection bias means practitioners remember successful cases; the proprietary structure prevents peer review; and there is no systematic comparison of SCAN outputs against trial outcomes, so the failure rate stays invisible to the practitioners using it.

Test yourself on Forensic Linguistics with free, timed mocks.

Practice Forensic Linguistics questions

Found this useful? Pass it along.

Spotted an error in this page? Report a correction or read our editorial standards.

Key Takeaways

Your journey to becoming a forensic professional starts here.