Disputed Authorship in Literature and History

Disputed authorship in literary and historical texts ranges from the founding scientific test case of the Federalist Papers to contemporary unmasking of pseudonymous novelists. This topic traces the development of stylometry through canonical studies, examines the ethics of attribution without consent, and defines the different standards of evidence required for scholarly publication versus court.

Last updated: 19 Jun 2026

Disputed authorship analysis applies stylometric and computational methods to assign anonymous or contested texts to a specific author. The field's empirical foundation was established by Mosteller and Wallace's 1964 Bayesian study of the Federalist Papers, which attributed all twelve Hamilton-Madison disputed essays to Madison using function-word frequencies. The same toolkit is now applied to living pseudonymous authors and to courtroom evidence, but those contexts impose distinct ethical obligations and different evidentiary standards that academic attribution does not require.

Disputed authorship is one of the oldest problems in textual scholarship. The questions are consistent across centuries: whether a classical text attributed to Plato was genuinely his, whether a play in the Shakespeare canon has a second hand, whether a novel published under a pen name was written by someone already well known. What changed in the second half of the twentieth century was the availability of statistical tools, which transformed attribution from an argument based on reading impressions into a reproducible calculation with a quantified confidence level.

The canonical starting point is the Federalist Papers. Mosteller and Wallace's 1964 study of function-word frequencies in the disputed essays by Hamilton and Madison remains one of the most influential papers in the history of stylometry, not because the attribution was surprising (Madison was already favoured by historians) but because it demonstrated that a statistical method could match the historical consensus and do so with a quantified confidence level. That proof-of-concept opened a research programme that has been running ever since.

But the more the tools improved, the sharper the ethical questions became. Attributing a manuscript dead for 150 years is different from identifying a living novelist who chose a pseudonym for good reasons. And scholarly publication is a different arena from a courtroom, with different standards of evidence, different consequences for error, and different power relationships between the analyst and the subject. This topic covers all three dimensions: the history of the method, the contemporary ethics, and the evidentiary standards that separate publishable scholarship from legally admissible proof.

By the end of this topic you will be able to:

Explain why the Federalist Papers are the canonical benchmark for stylometry and what Mosteller and Wallace's 1964 study demonstrated.
Describe the methodological constraints that make Shakespeare co-authorship detection more tractable than whole-play alternative-candidate attribution.
Distinguish the ethical considerations that apply to attributing living pseudonymous authors from those that apply to historical manuscript attribution.
Compare the evidentiary standards required for academic publication of an attribution claim with those required for court admissibility under the Daubert framework.
Apply the convergence principle to evaluate the relative strength of single-feature versus multi-method stylometric results.

Key terms

Federalist Papers: A collection of 85 essays written by Alexander Hamilton, James Madison, and John Jay under the pseudonym 'Publius' in 1787-1788, advocating for ratification of the U.S. Constitution. Twelve papers were disputed between Hamilton and Madison, making them the canonical test set for stylometry.
Mosteller-Wallace study: The 1964 statistical analysis by Frederick Mosteller and David Wallace that attributed all twelve disputed Federalist Papers to Madison using function-word frequency distributions. The study is the founding empirical demonstration of function-word-based stylometry.
Pseudonymous attribution: The identification of the real author behind a pen name or anonymous publication. When the attributed author is living and has chosen anonymity deliberately, the act raises consent and privacy questions that scholarly attribution of historical texts does not.
Daubert standard: The U.S. federal standard for admissibility of expert testimony, derived from Daubert v. Merrell Dow Pharmaceuticals (1993). It requires that the method be testable, have a known error rate, be peer-reviewed, and be generally accepted in the relevant scientific community.
Literary attribution: The scholarly practice of assigning an anonymous or disputed text to a specific author, typically in a historical context. Differs from forensic attribution in its standards of evidence, its open publication, and its typically non-adversarial setting.

The Federalist Papers: the founding test case

The Federalist Papers were published between October 1787 and August 1788 as a series of newspaper essays arguing for ratification of the U.S. Constitution. All eighty-five appeared under the pseudonym 'Publius'. The authors were identified as Hamilton, Madison, and Jay long before the statistical work, but the authorship of twelve specific papers was disputed between Hamilton and Madison for generations, based on conflicting lists left by both men.

Mosteller and Wallace approached the problem as a Bayesian inference exercise. They selected a set of common function words, computed the frequency distributions for each word across papers of known authorship, and used those distributions to calculate posterior probabilities for the disputed papers. Every one of the twelve disputed papers pointed to Madison, with odds in his favour ranging from very strong to overwhelming. The study was published in 1963 in the Journal of the American Statistical Association (volume 58, pages 275-309) and has been replicated and extended by dozens of subsequent researchers.

What makes the Federalist Papers ideal as a test set is not just the historical documentation of authorship but the specific character of the task. Hamilton and Madison were both educated, prolific, contemporary authors writing in the same genre about the same subject. This controls for topic, period, and genre variation: any stylistic difference detected must reflect individual habit rather than subject matter or era. Few real-world attribution problems have such clean conditions.

Mosteller-Wallace approach: frequency distributions of function words distinguish Hamilton from Madison.

Shakespeare authorship and its methodological limits

No authorship question has generated more stylometric effort than whether the plays attributed to William Shakespeare were written by him. Numerous candidates have been proposed, including Francis Bacon, Christopher Marlowe, and the Earl of Oxford. Stylometric analysis has been brought to bear repeatedly, with results generally supporting Shakespeare's authorship of most of the canonical plays while finding credible evidence of co-authorship with John Fletcher in several late works.

The methodological difficulties are significant. The surviving Shakespeare corpus is small (approximately 900,000 words across the plays and poems), and all of it is in a single genre written over a short period. The comparison corpora for alternative candidates vary enormously in size and genre. The writing is often collaborative in ways that are impossible to disentangle cleanly. And the proposed non-Shakespearean candidates wrote in styles that do not match the plays nearly as well as Shakespeare's known style does, a result that consistently emerges from rigorous analysis.

Elena Ferrante, Robert Galbraith, and the ethics of living-author attribution

In July 2013, the Sunday Times published a report identifying J.K. Rowling as the author of The Cuckoo's Calling, a crime novel published under the name Robert Galbraith. The identification drew on stylometric analyses by two scholars: Patrick Juola of Duquesne University, who was the first analyst contacted by the Sunday Times and whose findings led the newspaper to confront Rowling, and Peter Millican of Oxford University, who conducted a second independent analysis that corroborated Juola's results. Rowling confirmed the attribution the same day. The book immediately became a bestseller. The attribution was accurate, the method was valid, and Rowling's pseudonymity was destroyed within days of publication.

In 2016, the Italian journalist Claudio Gatti attributed the pseudonymous novelist Elena Ferrante to Anita Raja, a translator, using a combination of financial records and stylometric analysis. Ferrante had maintained her anonymity for over twenty years and had explained in interviews that her anonymity was central to how she wanted her work to be received. The attribution was contested by some stylometrists who argued the methodology was insufficiently rigorous, and Ferrante herself declined to confirm it.

Case	Attribution method	Subject's consent	Outcome and controversy
Federalist Papers (Hamilton/Madison, 1964)	Function-word Bayesian analysis	N/A (both subjects deceased, 19th c.)	Widely accepted; replicated many times
Robert Galbraith = J.K. Rowling (2013)	Function-word stylometry and PCA	Not sought; author confirmed after publication	Accurate; Rowling's chosen pseudonymity destroyed
Elena Ferrante = Anita Raja (2016)	Financial records + stylometric analysis	Not sought; attribution denied/unconfirmed	Methodologically contested; significant ethical criticism

The ethical debate turns on competing values that do not resolve neatly. One side: stylometric analysis is research; researchers have the right to publish valid findings; authorship is not a privacy-protected attribute in most legal systems. The other side: living authors who choose pseudonymity have reasons, and those reasons may include safety, creative freedom, and the desire to have their work judged on its own terms. Stripping that anonymity without consent harms real people. Both positions have serious advocates in the field, and no professional consensus has resolved the question.

Scholarly versus legal standards of attribution evidence

A stylometric result that is publishable in a peer-reviewed linguistics journal may fall well short of what a court requires. The distinctions matter because the same methodology is deployed in both contexts and the gap between them is not always made explicit to the audiences consuming the results.

Academic publication standard
A valid method, a reasonable corpus, replication across multiple features or methods, and peer review. Error rates are desirable but not always compulsory. Open-set validation is rare in literary stylometry; most studies are closed-set.
Expert witness report standard
All of the above, plus: a known error rate specific to the case conditions (text length, candidate number, genre), interpretability for a lay jury, compliance with admissibility standards, and a statement of scope that distinguishes what the evidence can and cannot support.
Courtroom admissibility standard (Daubert, U.S.)
The method must be testable, have a known error rate, be peer-reviewed, and be generally accepted in the relevant scientific community. The court applies these criteria at a pre-trial hearing; methods that fail are excluded before the jury hears them.

The practical implication for a forensic linguist is that stylometric evidence that would be accepted at a humanities conference may not survive a Daubert challenge. The converse is also true: a very conservative result that states only that the questioned text is 'more consistent with Candidate A than with the other candidates at a 91 per cent accuracy level' is both less dramatic and more defensible than a flat attribution claim.

Three evidentiary tiers for stylometric attribution: each column lists the requirements that must be satisfied, showing what courts demand beyond scholarly publication.

What counts as sufficient evidence in each context

The clearest way to think about this is to ask: what would it take to falsify the claim? For an academic attribution, the answer is a better-validated competing method that produces a different result. For a forensic attribution, the answer is a known error rate that bounds the probability of a false attribution, plus replication by at least one independent analyst, plus a comparison corpus that is large enough to have stable estimates. The burden is higher in court because the consequence of error is higher.

Several real forensic attributions have been rejected by courts precisely because the expert could not state a reliable error rate, had a comparison corpus that was too small, or could not explain the method in terms a jury could follow. These rejections are not failures of stylometry as a discipline; they are the correct functioning of admissibility standards. A method that cannot be explained or validated should not influence a criminal verdict, regardless of how impressive it looks to a specialist audience.

Worked example

Applying the Mosteller-Wallace method to a modern dispute

How the 1964 approach translates to a contemporary anonymous editorial.

In 2018, an anonymous op-ed was published in the New York Times claiming to be from a senior Trump administration official describing internal resistance to the President's decisions. The publication prompted immediate stylometric speculation about the author's identity. This is a closed-set problem (senior White House staff), open-set in that the true author is in an unknown subset of a large population.

Corpus construction: collect publicly attributed writing from candidate officials. The op-ed was 960 words, so reference corpora need to be substantially larger for stable estimates.
Feature selection: function words (the Mosteller-Wallace set of about 70 grammatical words) and character n-grams, chosen before examining the op-ed to avoid feature-selection bias.
Delta analysis: the op-ed vector is compared to each candidate's corpus centroid. Two candidates appear closer than the others, but the 960-word length means confidence intervals are wide.
Cross-validation: leave-one-out CV on 960-word segments of each candidate corpus. At this text length, correct attribution falls to 76 per cent, meaning the 24 per cent error rate must be stated prominently.
Report: the op-ed is most consistent with Candidate X among those tested. The 24 per cent error rate at this text length and candidate-set size means this result should be treated as suggestive rather than conclusive, and should not be published as a definitive identification.

The op-ed was eventually attributed to Miles Taylor in 2020, when he self-identified. Post-hoc stylometric analysis did find features consistent with his writing, but this is exactly the kind of confirmation bias risk that makes pre-identification blind analysis so important. The Mosteller-Wallace method is sound. Its application requires the same discipline about error rates and scope as any other forensic method.

Check your understanding

Question 1 of 4· 0 answered

Why are the Federalist Papers considered the canonical test set for computational stylometry?

Key Takeaways

The Federalist Papers, with their documented ground truth, are the canonical benchmark for stylometry: Mosteller and Wallace's 1964 function-word study attributed all twelve disputed papers to Madison and established the empirical foundation of the field.
Shakespeare authorship debates are methodologically tractable at the co-authorship level (rolling Delta detects style zones within plays) but the attribution-to-an-alternative-candidate question has not been resolved because no proposed alternative's style matches as well as Shakespeare's known writing does.
Attributing living pseudonymous authors raises consent and privacy questions absent from historical cases: the cases of Elena Ferrante and Robert Galbraith (J.K. Rowling) have made these questions central to professional ethics in the field.
Academic publication and legal admissibility require different evidentiary standards: courts additionally require an interpretable method, a known error rate for the specific case conditions, and compliance with jurisdiction-specific admissibility criteria such as Daubert.
The convergence principle is as important in literary stylometry as in forensic casework: results that hold across multiple independent methods are more credible than results from a single feature or a single classifier.

Why are the Federalist Papers the classic test case for stylometry?

The Federalist Papers are ideal as a stylometric test set because the authorship of most papers is historically documented. Seventy-seven essays were written under the pseudonym 'Publius' by Alexander Hamilton, James Madison, and John Jay in 1787-1788. Twelve papers were disputed between Hamilton and Madison. Mosteller and Wallace's 1964 statistical analysis attributed all twelve disputed papers to Madison using function-word frequencies, and this result has been repeatedly replicated. Because the ground truth is documented, the case lets researchers measure how well a method actually performs.

What did Mosteller and Wallace's 1964 study demonstrate?

Frederick Mosteller and David Wallace showed that the disputed Federalist Papers could be attributed to Madison with high statistical confidence using the relative frequencies of common grammatical words such as 'by', 'from', 'to', and 'upon'. Madison and Hamilton had consistent and significantly different frequency profiles for these words across their undisputed papers. This was one of the first rigorous statistical demonstrations that function-word frequencies carry a stable authorial signal, predating modern computational stylometry by decades.

Is it ethical for researchers to attribute pseudonymous living authors without their consent?

This is a live ethical debate in the field. The attribution of Elena Ferrante (2016) and the identification of J.K. Rowling as Robert Galbraith (2013) were both conducted and published without the authors' consent. Both authors had chosen pseudonymity deliberately. Critics argue that researchers should obtain consent or withhold living-author attributions regardless of methodological accuracy. Defenders argue that the analysis itself is scholarly work and that the right to publish research findings is separate from the subjects' preferences. No consensus position has been reached in the field.

What is the difference between evidence sufficient for academic publication and evidence sufficient for a court?

Academic publication of an attribution claim requires peer review and methodological soundness: a well-constructed feature set, a validated error rate, and a result that replicates across methods. Courts require all of that plus interpretability (the method must be explainable to a lay jury), a known error rate for the specific case conditions, and typically compliance with jurisdiction-specific admissibility standards such as Daubert in the U.S. Forensic attribution also faces the practical constraint that it must address the open-set problem, while most academic stylometry studies operate in closed sets.

Test yourself on Forensic Linguistics with free, timed mocks.

Practice Forensic Linguistics questions

Found this useful? Pass it along.

Spotted an error in this page? Report a correction or read our editorial standards.

Key Takeaways

Your journey to becoming a forensic professional starts here.