Practice with mock tests, learn from structured notes, and get your questions answered by a global forensic community, all in one place.
When courts dispute what a contract clause or statute actually means in ordinary language, forensic linguists offer corpus-based empirical evidence about word usage. This topic covers the methods, landmark cases, and the limits of what linguistic expert evidence can and cannot decide.
Last updated:
Most legal disputes ultimately turn on language. A contract says one thing; one party reads it one way, the other reads it differently. A statute uses a phrase; a prosecutor applies it broadly, a defence attorney argues for a narrow reading. Courts have been resolving these disputes for centuries using nothing more than a judge's intuition about ordinary meaning, occasionally supplemented by a dictionary. In the last two decades, a growing number of courts in the United States, the UK, Canada, and Australia have started asking whether there is a more rigorous, empirically grounded method.
The answer emerging from both academia and practice is corpus linguistics: the use of large searchable databases of real-world language use to document how a disputed word or phrase was actually deployed in comparable contexts at the relevant time. If the question is what 'vehicle' meant to an ordinary speaker in 1956 when a statute was enacted, a corpus of 1950s American English usage can tell you how the word was distributed across contexts, what it most often modified and was modified by, and whether public transit buses were routinely included in the category or excluded from it.
This topic covers the linguistic methods available for contract and statutory disputes, landmark cases where they have been applied, the New Originalism framework that has made corpus linguistics a constitutional-law methodology, and the limits that responsible expert witnesses acknowledge. The linguist's contribution here is empirical; the legal conclusion is always for the court.
A missing comma, a sentence that parses two ways, and a ten-million-dollar outcome.
The Oxford comma case, formally O'Connor v. Oakhurst Dairy decided by the First Circuit Court of Appeals in 2017, is the best-known recent example of how punctuation and syntactic structure create genuine legal ambiguity. Maine's overtime exemption statute listed several types of activity exempt from overtime protection, ending with 'the canning, processing, preserving, freezing, drying, marketing, storing, packing for shipment or distribution of agricultural produce, meat products, and seafood products'.
The question was whether the final item exempted 'packing for shipment or distribution' (one activity: packing, done for either shipment or distribution) or 'packing for shipment' and separately 'distribution' (two activities). The absence of a serial comma before 'or distribution' left both parsings grammatically plausible. Delivery drivers who distributed but did not pack argued the exemption did not apply to pure distributors; the dairy argued it applied to both activities.
The court sided with the drivers on grammatical grounds: the listformat of the preceding items suggested parallel structure, and the no-comma reading made distribution a standalone exempt activity that was grammatically inconsistent with the list's structure elsewhere. The case settled for approximately $5 million. A single punctuation mark had structured the entire dispute.
From academic proposal to state supreme court majority opinions in fifteen years.
The systematic application of corpus linguistics to statutory interpretation in US courts is associated primarily with the work of law professors Stephen Mouritsen and James Lee at Brigham Young University, whose 2012 article 'Corpus Linguistics as a Tool in Legal Interpretation' and subsequent work provided both the methodological framework and early case studies. Their approach is simple in outline: take the disputed statutory term, construct a corpus of text from roughly the same time as the statute's enactment, run a concordance analysis, and report what the distribution of usage shows.
Lockhart v. United States (2016) at the US Supreme Court provided an early high-profile application, though the majority opinion resolved the case without adopting corpus linguistics as the method. Justice Sotomayor's majority and Justice Kagan's dissent both discussed the ordinary meaning of 'related to' in a sex-offender sentencing provision; the exchange made visible both the promise and the complexity of corpus-based analysis. Neither justice used a formal corpus, but the decision's structure mirrored what a corpus analysis would do: asking how broadly 'related to' was used in comparable statutory and everyday contexts.
Several state supreme courts have gone further. The Utah Supreme Court in State v. Rasabout (2015) explicitly adopted corpus analysis for statutory interpretation. Michigan, Wisconsin, and Georgia state courts have cited corpus evidence in majority opinions. The US Supreme Court's decision in Niz-Chavez v. Garland (2021) saw Justice Gorsuch writing majority language that closely tracked corpus-based arguments about 'a notice' versus 'notice' in immigration law, though without formally calling it corpus linguistics.
Contracts are the densest concentration of language disputes outside constitutional law.
Contract interpretation disputes arise when the parties have different readings of what their agreement says. The primary legal rule in common-law contracts is the plain-meaning rule: courts first ask what the text says in its ordinary sense, and admit extrinsic evidence of intent only when the text is ambiguous. This makes ordinary meaning a genuine legal threshold, which is exactly where a linguist's analysis is most directly applicable.
Lawrence Solan's work on contract language, particularly in 'The Language of Judges' (1993) and 'The Language of Contracts' (2010), documents the recurring patterns through which contractual language produces interpretive disputes: undefined terms that seem obvious but carry industry-specific meanings, sentences that are syntactically complete but referentially ambiguous (pronouns with multiple plausible antecedents), and scope clauses that use terms of degree ('material', 'substantial', 'reasonable') that are individually clear but unclear at the boundaries.
If meaning is fixed at enactment, historical corpora become constitutional evidence.
The New Originalism, as developed by constitutional scholars over the past three decades, holds that the correct interpretation of a constitutional or statutory provision is the one an ordinary, competent speaker of English would have understood it to have at the time of adoption. This is a claim about historical public meaning, not about subjective legislative intent, and it converts a linguistic question (what did this word mean in 1789, or 1868, or 1954?) into an empirical one that corpus linguistics is suited to address.
The Corpus of Founding Era American English (COFEA), developed at BYU specifically for legal scholars, contains approximately 130 million words of American English text from the late 18th century. Analysing a constitutional term in COFEA gives a frequency distribution of its usage across legal, political, commercial, and everyday writing of the founding era. Justice Amy Coney Barrett, before her appointment to the Supreme Court, wrote about the methodological promise and limitations of such corpora in academic work.
The limitations are real and the method's leading practitioners acknowledge them. Historical corpora are uneven in their coverage of different text types, social registers, and geographic regions. The question of what an 'ordinary' speaker meant may have had no single answer in a linguistically diverse society. And the most contested constitutional questions often involve terms whose ordinary meaning in the founding era is precisely what both sides dispute, so the corpus data rarely resolves the legal question cleanly.
The forensic linguist supplies evidence; the legal conclusion belongs to the court.
A responsible forensic linguist in a statutory or contract interpretation case maintains clear boundaries. The expert's remit is the empirical question: how was this term used? What does the syntactic structure of this clause support as its most natural reading? In which register and how frequently was this phrase deployed in comparable contexts? These are questions linguistics can address with the tools of the trade.
The legal questions are different. What did the legislature intend? Which party should win under the applicable interpretive canons? Is the technical or the ordinary meaning controlling given the context of this statute? These are legal and normative questions. A linguist who testifies to those questions has stepped outside expert competence. Courts have excluded or discounted corpus-based expert evidence when the expert presented frequency data as if it resolved the legal question rather than contributing evidence for the court to weigh.
From instruction to conclusion: the five-step analytical sequence.
Forensic linguists engaged in statutory or contract interpretation work generally follow a sequence that produces a defensible, transparent record of the analysis, one that opposing experts can scrutinise and courts can evaluate.
What was the central legal issue in O'Connor v. Oakhurst Dairy (2017)?
Test yourself on Forensic Linguistics with free, timed mocks.
Practice Forensic Linguistics questionsSpotted an error in this page? Report a correction or read our editorial standards.