Corpus linguistics
Definition
The study of language through large, principled collections of texts or transcripts. In forensic work, corpus methods are used to count how often specific features appear in a suspected author's known writing versus in disputed texts, and to assess whether particular formulations are typical or rare.
- Core method
- Systematic counting of language patterns across large collections of real texts, used to answer empirical questions about meaning, frequency, and authorship.
- Forensic applications
- Authorship comparison, contract interpretation, trademark disputes, and establishing what terms meant to ordinary speakers at a particular time.
Common questions
How do forensic linguists use corpus linguistics to compare writing?+
They collect large databases of a suspect author's known writing, then count how often specific words, phrases, and patterns appear in that body of work. They compare these frequencies to disputed or anonymous texts to see whether the authorship is likely the same person. If a formulation is rare in the known corpus but common in the disputed text, that's evidence against matching.
Why is corpus linguistics useful in contract or trademark disputes?+
Contracts and trademark cases often turn on what a word or phrase actually means to ordinary people, not what a dictionary says. Forensic linguists use large, real-world text databases to show how speakers and consumers actually use that word in practice, at a specific time. This empirical evidence is often more persuasive than static definitions.
What makes corpus linguistics different from just reading documents?+
Corpus linguistics is systematic and quantitative. Instead of making impressions from a handful of texts, linguists analyze thousands of real examples, count patterns, and measure frequencies. This rigor eliminates guesswork and produces evidence that holds up in court.
Related terms
- Concordance
- A list of all occurrences of a target word in a corpus, shown in their surrounding context (KWIC: keyword in context). Concordance...
- Corpus jurisprudence
- The practice of using corpus analysis as evidence in statutory or constitutional interpretation. Associated in the US with Brigham Young University's Law...
- Descriptiveness
- The characteristic of a mark that directly describes a feature, quality, or ingredient of the goods or services. Descriptive marks receive little...
- Entrapment analysis
- The examination of recorded conversations between undercover law-enforcement agents and suspects to assess whether the agent's language induced the suspect to commit...
- Genericness
- The condition in which a brand name has become the common noun for a class of goods or services in public usage,...
- IAFL
- The International Association of Forensic Linguists, founded in 1992, is the main professional body for the field. It publishes the journal Language...
- Likelihood of confusion
- The central test in trademark infringement: whether a reasonably attentive consumer would be confused about the commercial source of goods or services...
- Linguistic profiling
- Broadly, any use of linguistic features to infer a speaker's identity or background. In its pejorative sense, the discriminatory use of accent...
- New Originalism
- A US constitutional and statutory interpretation theory associated with scholars including Antonin Scalia, Randy Barnett, and Jack Balkin, holding that courts should...
- Ordinary meaning
- The meaning a reasonable person with ordinary linguistic competence would assign to a word or phrase. It is the default canon of...
- Phonological similarity
- The degree to which two marks sound alike when pronounced, assessed through minimal pair analysis, prosodic structure (syllable count and stress), and...
- Secondary meaning
- The association that consumers make between a mark and a specific source, acquired over time through extensive use, even if the mark...
Explained in these topics
- Contract Disputes and Statutory Interpretation: The Linguist as ExpertThe study of language through systematic analysis of large bodies of real-world text (corpora). Forensic corpus linguistics uses these databases to answer empi...
- History and Landmark Cases: Svartvik, Evans, and CoulthardThe study of language through large, principled collections of texts or transcripts. In forensic work, corpus methods are used to count how often specific feat...
- Trademark and Brand-Name Disputes: Linguistic EvidenceThe empirical study of language through large, structured databases of natural-language text, used in trademark cases to assess how consumers actually use a wo...