Comparison corpus
Definition
The body of known writings from a candidate author used to characterise their stylistic profile. In the Ramsey case, the comparison corpora for the candidates were small, which expanded the range of plausible matches and made any confident attribution methodologically unsound.
- Also called
- Reference corpus or known writing samples
- Key limitation
- Small corpora expand the range of plausible matches and weaken conclusions
Common questions
What's a comparison corpus in authorship analysis?+
A comparison corpus is a collection of known writing samples from someone whose authorship you're investigating. These samples show how they typically write, including their word choices, sentence structure, and style quirks. The quality and size of this collection directly affects how reliable your attribution conclusions can be.
Why does the size of a comparison corpus matter?+
A small corpus leaves too much room for error. When the corpus is limited, many different authors could plausibly match the writing in question, making confident attribution methodologically unsound. A larger, more representative collection of the candidate's writing reduces false matches and strengthens conclusions.
What makes a comparison corpus reliable for attribution?+
The corpus needs to be authentic, sizable, and comparable to the document being analyzed. If the corpus samples come from similar contexts and time periods as the target document, and if there's enough material to establish clear patterns, the attribution becomes more trustworthy.
Related terms
- Authenticity analysis
- The preliminary question: is the note what it claims to be (an external demand by an unknown party) or is it fabricated?...
- Context bias
- A cognitive distortion in which an analyst's conclusions are influenced by prior exposure to investigative assumptions, suspect profiles, or investigator expectations. One...
- Fabrication marker
- A feature in a purported ransom note that is inconsistent with what genuine ransom communications contain, suggesting the note was produced by...
- Idiolect
- The language variety specific to an individual, comprising their characteristic vocabulary, syntactic preferences, spelling habits, punctuation patterns, and discourse-level style. Authorship attribution...
- Length anomaly
- The statistically unusual length of the Ramsey ransom note relative to the population of genuine ransom notes. Length is itself a feature,...
- Questioned document
- Any document whose authorship, authenticity, or content is in dispute. In forensic linguistics, the Ramsey ransom note is a canonical example: its...
- Ransom note
- A written or recorded communication from a person who has taken something (a person, property, or information) and is demanding compensation or...
- Register
- The variety of language associated with a particular situation, task, or relationship. Register varies along dimensions of formality, technicality, and interactional mode....
- Register mixing
- The use of vocabulary and structures from incompatible social or professional contexts within a single text. The Ramsey note moves from corporate-bureaucratic...
Explained in these topics
- The JonBenet Ramsey Ransom Note: Authorship in a High-Profile CaseThe body of known writings from a candidate author used to characterise their stylistic profile. In the Ramsey case, the comparison corpora for the candidates...
- Ransom Note Analysis: Features, Authenticity, and AttributionA collection of authentic writing samples from a candidate author, used as the reference base for authorship attribution. The quality, volume, and comparabilit...