Contract Disputes and Statutory Interpretation: The Linguist as Expert

When courts dispute what a contract clause or statute actually means in ordinary language, forensic linguists offer corpus-based empirical evidence about word usage. This topic covers the methods, landmark cases, and the limits of what linguistic expert evidence can and cannot decide.

Last updated: 19 Jun 2026

When courts dispute what a contract clause or statute means in ordinary language, forensic linguists provide corpus-based empirical evidence about how the disputed word or phrase was actually used by comparable speakers at the relevant time. The core method is concordance analysis: retrieving thousands of real-world instances of the term from a temporally and register-appropriate corpus, coding each instance by sense, and reporting the frequency distribution. This evidence informs the court's ordinary-meaning analysis but does not resolve the legal question, which remains for the judge.

Most legal disputes ultimately turn on language. A contract says one thing; one party reads it one way, the other reads it differently. A statute uses a phrase; a prosecutor applies it broadly, a defence attorney argues for a narrow reading. Courts have been resolving these disputes for centuries using nothing more than a judge's intuition about ordinary meaning, occasionally supplemented by a dictionary. In the last two decades, a growing number of courts in the United States, the UK, Canada, and Australia have started asking whether there is a more rigorous, empirically grounded method.

The answer emerging from both academia and practice is corpus linguistics: the use of large searchable databases of real-world language use to document how a disputed word or phrase was actually deployed in comparable contexts at the relevant time. If the question is what 'vehicle' meant to an ordinary speaker in 1956 when a statute was enacted, a corpus of 1950s American English usage can tell you how the word was distributed across contexts, what it most often modified and was modified by, and whether public transit buses were routinely included in the category or excluded from it.

This topic covers the linguistic methods available for contract and statutory disputes, landmark cases where they have been applied, the New Originalism framework that has made corpus linguistics a constitutional-law methodology, and the limits that responsible expert witnesses acknowledge. The linguist's contribution here is empirical; the legal conclusion is always for the court.

By the end of this topic you will be able to:

Explain how syntactic ambiguity in a contract clause or statute creates genuine legal disputes and what forensic linguistic analysis adds beyond simply noting that two readings are possible.
Describe the corpus-linguistics methodology applied to statutory interpretation, including corpus selection, concordance analysis, and the scope of permissible expert conclusions.
Identify the claims that corpus evidence can and cannot support, distinguishing empirical meaning-distribution findings from normative legal conclusions about intent or interpretive canons.
Explain the New Originalism framework and why it makes historical corpora relevant as constitutional evidence, including the acknowledged limitations of that approach.
Apply the five-step analytical framework for forensic linguistic analysis of disputed legal text, from precise question definition through appropriately scoped conclusion.

Key terms

Corpus linguistics: The study of language through systematic analysis of large bodies of real-world text (corpora). Forensic corpus linguistics uses these databases to answer empirical questions about word meaning, frequency, and context, usually to establish what a term meant to ordinary speakers at a particular time.
Ordinary meaning: The meaning a reasonable person with ordinary linguistic competence would assign to a word or phrase. It is the default canon of statutory interpretation in most common-law jurisdictions: courts begin with ordinary meaning and depart from it only if the text is technical, ambiguous, or if ordinary meaning produces an absurd result.
Syntactic ambiguity: Ambiguity arising from the grammatical structure of a sentence rather than from any individual word's meaning. The Oxford comma case is a famous example: the list structure of the sentence created two plausible parsings with different legal consequences. Syntactic analysis can often show which parsing is more strongly supported by the surrounding text.
New Originalism: A US constitutional and statutory interpretation theory associated with scholars including Antonin Scalia, Randy Barnett, and Jack Balkin, holding that courts should apply the original public meaning of a constitutional or statutory provision as it was understood at the time of enactment. Corpus linguistics is proposed as the empirical tool for establishing that historical meaning.
Corpus jurisprudence: The practice of using corpus analysis as evidence in statutory or constitutional interpretation. Associated in the US with Brigham Young University's Law and Corpus Linguistics project (Lee and Mouritsen). Several US state supreme courts and circuit courts have adopted corpus evidence in opinions; the US Supreme Court has engaged with it cautiously.
Concordance: A list of all occurrences of a target word in a corpus, shown in their surrounding context (KWIC: keyword in context). Concordance analysis lets a linguist assess the most common uses of a disputed term, what it typically modifies, and whether a claimed usage pattern exists or is statistically rare.

How syntactic ambiguity drives legal disputes

The Oxford comma case, formally O'Connor v. Oakhurst Dairy decided by the First Circuit Court of Appeals in 2017, is the best-known recent example of how punctuation and syntactic structure create genuine legal ambiguity. Maine's overtime exemption statute listed several types of activity exempt from overtime protection, ending with 'the canning, processing, preserving, freezing, drying, marketing, storing, packing for shipment or distribution of agricultural produce, meat products, and seafood products'.

The question was whether the final item exempted 'packing for shipment or distribution' as one activity (packing, performed for either purpose) or 'packing for shipment' and separately 'distribution' as two distinct activities. The absence of a serial comma before 'or distribution' left both parsings grammatically plausible. Delivery drivers who distributed but did not pack argued the exemption did not cover pure distributors; the dairy argued it covered both.

The court sided with the drivers on grammatical grounds: the listformat of the preceding items suggested parallel structure, and the no-comma reading made distribution a standalone exempt activity that was grammatically inconsistent with the list's structure elsewhere. The case settled for approximately $5 million. A single punctuation mark had structured the entire dispute.

Corpus linguistics in statutory interpretation: the US development

The systematic application of corpus linguistics to statutory interpretation in US courts is associated primarily with the work of law professor Stephen Mouritsen at Brigham Young University, whose 2012 article 'Hard Cases and Hard Data: Assessing Corpus Linguistics as an Empirical Path to Plain Meaning' (Columbia Science and Technology Law Review) and subsequent work with James Lee provided both the methodological framework and early case studies. Their approach is simple in outline: take the disputed statutory term, construct a corpus of text from roughly the same time as the statute's enactment, run a concordance analysis, and report what the distribution of usage shows.

Lockhart v. United States (2016) at the US Supreme Court provided an early high-profile application, though the majority opinion resolved the case without adopting corpus linguistics as the method. Justice Sotomayor's majority and Justice Kagan's dissent both discussed the ordinary meaning of 'related to' in a sex-offender sentencing provision; the exchange made visible both the promise and the complexity of corpus-based analysis. Neither justice used a formal corpus, but the decision's structure mirrored what a corpus analysis would do: asking how broadly 'related to' was used in comparable statutory and everyday contexts.

Several state supreme courts have gone further. The Utah Supreme Court in State v. Rasabout (2015) saw Associate Chief Justice Thomas Lee advocate corpus analysis for statutory interpretation in a notable concurrence, though the majority declined to adopt the methodology in that case. Michigan, Wisconsin, and Georgia state courts have cited corpus evidence in majority opinions. The US Supreme Court's decision in Niz-Chavez v. Garland (2021) saw Justice Gorsuch writing majority language that closely tracked corpus-based arguments about 'a notice' versus 'notice' in immigration law, though without formally calling it corpus linguistics.

Corpus linguistics in US courts: from academic proposal to Supreme Court engagement.

Contract disputes: what linguistic analysis contributes

Contract interpretation disputes arise when the parties hold different readings of their agreement. Under the plain-meaning rule that governs common-law contract interpretation, courts begin with what the text says in its ordinary sense and admit extrinsic evidence of intent only when the text is ambiguous. That threshold is precisely where linguistic analysis is applicable.

Lawrence Solan's work on contract and statutory language, particularly in 'The Language of Judges' (1993) and 'The Language of Statutes: Laws and Their Interpretation' (2010), documents the recurring patterns through which contractual language produces interpretive disputes: undefined terms that seem obvious but carry industry-specific meanings, sentences that are syntactically complete but referentially ambiguous (pronouns with multiple plausible antecedents), and scope clauses that use terms of degree ('material', 'substantial', 'reasonable') that are individually clear but unclear at the boundaries.

Referential ambiguity: 'The company shall provide notice to the client at its registered address.' Whose registered address? The company's or the client's? Syntactic analysis and corpus comparison of similar clauses in comparable contracts can show which reading is normative for the contract type.
Term-of-degree disputes: 'The contractor shall complete the work within a reasonable time.' Corpus analysis of 'reasonable time' in construction contracts shows the distributional centre of the phrase (typically measured in weeks for projects of a given size and type), giving courts an empirical baseline rather than an abstract appeal to what is 'reasonable'.
Industry register versus ordinary meaning: A term that has a specific technical meaning in an industry may have been used by one party in that technical sense and by the other in its everyday sense. Expert evidence on register distribution, showing that in a particular industry corpus the term is used in the technical sense in 90 percent of instances, informs the ordinary-meaning analysis without deciding it.

The New Originalism and historical corpus methods

The New Originalism, as developed by constitutional scholars over the past three decades, holds that the correct interpretation of a constitutional or statutory provision is the one an ordinary, competent speaker of English would have understood it to have at the time of adoption. This is a claim about historical public meaning, not about subjective legislative intent, and it converts a linguistic question (what did this word mean in 1789, or 1868, or 1954?) into an empirical one that corpus linguistics is suited to address.

The Corpus of Founding Era American English (COFEA), developed at BYU specifically for legal scholars, contains approximately 138 million words of American English text from the late 18th century. Analysing a constitutional term in COFEA gives a frequency distribution of its usage across legal, political, commercial, and everyday writing of the founding era. Justice Amy Coney Barrett, before her appointment to the Supreme Court, wrote about the methodological promise and limitations of such corpora in academic work.

The limitations are real and the method's leading practitioners acknowledge them. Historical corpora are uneven in their coverage of different text types, social registers, and geographic regions. The question of what an 'ordinary' speaker meant may have had no single answer in a linguistically diverse society. And the most contested constitutional questions often involve terms whose ordinary meaning in the founding era is precisely what both sides dispute, so the corpus data rarely resolves the legal question cleanly.

Limits and responsible expert testimony

A responsible forensic linguist in a statutory or contract interpretation case maintains clear boundaries. The expert's remit is the empirical question: how was this term used? What does the syntactic structure of this clause support as its most natural reading? In which register and how frequently was this phrase deployed in comparable contexts? These are questions linguistics can address with the tools of the trade.

The legal questions are different. What did the legislature intend? Which party should win under the applicable interpretive canons? Is the technical or the ordinary meaning controlling given the context of this statute? These are legal and normative questions. A linguist who testifies to those questions has stepped outside expert competence. Courts have excluded or discounted corpus-based expert evidence when the expert presented frequency data as if it resolved the legal question rather than contributing evidence for the court to weigh.

What linguistic expert evidence can show: frequency and distribution of a term's usage; which parsing is more syntactically natural; whether a claimed technical meaning is well-attested in the relevant register; whether a term had a different ordinary meaning at the time of drafting than it does today.
What it cannot show: legislative or drafting intent; which interpretation the law should adopt; whether extrinsic evidence of purpose overrides the textual analysis; the legal consequence of the meaning finding.

A framework for the linguistic analysis of disputed legal text

Forensic linguists engaged in statutory or contract interpretation work generally follow a sequence that produces a defensible, transparent record of the analysis, one that opposing experts can scrutinise and courts can evaluate.

Define the question precisely
State the specific disputed meaning precisely: is the question about a single word, a phrase, or the syntactic structure of a clause? Narrow questions produce more useful corpus evidence than broad ones.
Select and describe the corpus
Choose a corpus that is temporally, geographically, and register-appropriate to the text. Document the corpus's composition, size, and known limitations. Historical texts require historical corpora; industry contracts require an industry corpus where possible.
Run the concordance and code the results
Retrieve all instances of the disputed term or phrase and code each instance for which sense it reflects. Report inter-coder reliability if the coding required judgment. Publish the coding criteria so they can be examined.
Report frequency and context
Present the distribution: what proportion of instances reflect sense A versus sense B? What collocates and contexts are associated with each sense? Acknowledge instances that are ambiguous or that do not fit either proposed sense cleanly.
State the conclusion within the appropriate scope
Report what the corpus evidence shows about ordinary meaning distribution and stop there. Do not infer intent or decide the legal question. Make clear what the expert has measured and what lies beyond the measurement.

Five-step forensic linguistic analysis framework for interpretation disputes.

Worked example

The 'vehicle' question in a 19th-century public park ordinance

Classic Hart-Fuller hypothetical, now solved with a corpus.

H.L.A. Hart and Lon Fuller's famous 1958 debate used the hypothetical of a statute banning 'vehicles' from a public park: does it cover cars (yes), bicycles (probably), baby carriages (probably not), a WWII tank placed as a memorial (debatable)? The debate illustrated how ordinary meaning analysis works. Corpus linguistics gives this hypothetical an empirical dimension.

Corpus selection: The ordinance was enacted in 1895. A corpus of 1890s American newspaper and municipal record text is assembled using available digitised archives. The target term is 'vehicle'.
Concordance: 2,847 instances of 'vehicle' are retrieved. Coding shows 94 percent refer to horse-drawn carriages, early automobiles, and wheeled conveyances used in commerce or personal transport. Six percent refer to bicycles (in contexts debating whether they require separate regulation). No instance refers to a baby carriage or a memorial installation.
Register analysis: In the municipal regulation sub-corpus specifically, 'vehicle' is used exclusively in transportation and road-use contexts. The corpus finds no municipal document using 'vehicle' to describe a stationary ornamental object.
Conclusion: The corpus evidence supports that in 1895 ordinary and municipal usage, 'vehicle' referred to wheeled conveyances for active transportation, not stationary objects. Whether the memorial tank is a vehicle in the statute's sense is still a legal judgment, but the corpus provides an empirical anchor: the term's centre-of-gravity usage does not include stationary installations.

The Hart-Fuller debate was about legal theory. This analysis is about empirical linguistics. Both are needed: the corpus tells the court what ordinary speakers meant; the legal analysis decides which meaning controls and why. The expert witness supplies the first; the judge decides the second.

Check your understanding

Question 1 of 4· 0 answered

What was the central legal issue in O'Connor v. Oakhurst Dairy (2017)?

Key Takeaways

Forensic linguists enter contract and statutory disputes at the point where the ordinary meaning of a disputed word, phrase, or syntactic structure is contested, providing empirical evidence about usage distribution rather than relying on judicial intuition.
The Oxford comma case showed that punctuation and syntactic structure create genuine, legally consequential ambiguity; syntactic analysis by an expert can show which parsing is more naturally supported by the text.
The New Originalism has made corpus linguistics a growing methodology in US constitutional and statutory interpretation, with courts at state and federal levels beginning to cite corpus evidence in opinions, though the US Supreme Court has engaged cautiously.
Corpus frequency data tells courts how a term was ordinarily used but cannot establish legislative intent, resolve cases where frequency data is balanced between readings, or decide which legal interpretation should prevail.
Responsible expert testimony in this area requires a transparent five-step analysis: defining the question, selecting an appropriate corpus, coding concordance results, reporting distributions, and stating conclusions within the expert's scope.

What is corpus linguistics and how is it used in legal disputes?

Corpus linguistics is the study of language through large, searchable databases of real-world text. In legal disputes, it is used to document the ordinary meaning of a disputed word or phrase at the time a contract was signed or a statute was enacted, by showing how the term was actually used in comparable contexts across thousands of documents.

What is the famous Oxford comma legal case?

O'Connor v. Oakhurst Dairy (First Circuit, 2017) turned on the absence of an Oxford comma in a Maine overtime exemption statute. The resulting syntactic ambiguity meant the exemption applied more narrowly than the dairy intended, costing it approximately $10 million in unpaid overtime. The case became a widely cited illustration of how punctuation and syntax produce genuine legal ambiguity.

What is the New Originalism and how does corpus linguistics fit into it?

The New Originalism is a US interpretation theory holding that courts should apply the original public meaning of a text at the time of enactment. Corpus linguistics, particularly using historical corpora like COFEA, is proposed as a rigorous empirical method for establishing that historical public meaning, replacing or supplementing a judge's intuition.

What are the limits of corpus linguistics in legal interpretation?

Corpus analysis tells courts about ordinary usage frequency but cannot establish legislative intent, capture technical meaning that diverges from ordinary use, resolve cases where the corpus data is ambiguous or balanced between two readings, or answer the normative legal question of which meaning should govern. It is evidence input, not a decision procedure.

Can a forensic linguist testify about what a contract clause means?

Yes, with qualifications. A forensic linguist can testify about ordinary meaning, common usage patterns in the relevant register, syntactic ambiguity, and internal consistency. The linguist cannot decide which party is right as a legal matter, since contract interpretation also involves extrinsic evidence of intent and trade usage outside the linguist's expertise.

Test yourself on Forensic Linguistics with free, timed mocks.

Practice Forensic Linguistics questions

Found this useful? Pass it along.

Spotted an error in this page? Report a correction or read our editorial standards.

Key Takeaways

Your journey to becoming a forensic professional starts here.