Spectrographic Voiceprint History and Its Modern Rejection

The cautionary tale that defines what voice identification cannot be: the Lawrence Kersta 1962 introduction of the spectrographic voiceprint at Bell Labs and the IAVI International Association of Voice Identification's promotion through the 1970s, the 1979 NAS report 'On the Theory and Practice of Voice Identification' that concluded the method had no scientific foundation for individualization claims, the FBI's withdrawal of spectrographic voice identification testimony in 1989, the residual use of spectrographic analysis as a visualisation tool rather than an identification method, and the lessons the field absorbed about premature claims of forensic individualisation.

Last updated: 19 Jun 2026

The spectrographic voiceprint method, introduced by Lawrence Kersta at Bell Telephone Laboratories in 1962, claimed that speech patterns plotted on a sound spectrogram were as unique as fingerprints. The 1979 National Academy of Sciences report found no scientific foundation for that individualisation claim, identifying the absence of standardised decision criteria, independent proficiency testing, and field-representative validation. The FBI formally withdrew from spectrographic voice identification testimony in 1989. Modern forensic speaker comparison replaced visual spectrogram matching with a Bayesian likelihood ratio framework that explicitly quantifies uncertainty rather than asserting categorical identification.

Lawrence Kersta coined the term "voiceprint" in 1962, claiming that spectrographic speech patterns were as unique as fingerprints. The claim spread through US and Canadian courts in the 1970s before independent research found error rates of 6 to 13 per cent. The 1979 National Academy of Sciences report found no scientific foundation for forensic individualisation. The FBI withdrew from spectrographic voice identification testimony in 1989.

Key takeaways

Kersta's original Bell Labs studies used controlled studio recordings and small, matched speaker pools; these conditions did not represent real forensic casework.
The 1979 NAS panel concluded that spectrographic voice identification lacked validated error rates, standardised decision criteria, and independent proficiency testing.
Certification by the technique's inventor (IAVI) was circular validation: it did not constitute independent proof of accuracy.
Institutional inertia kept spectrographic evidence in US courts for a full decade after the NAS verdict; Daubert (1993) gave courts the tools to exclude it definitively.
Modern forensic speaker comparison uses likelihood ratios anchored to acoustic measurements, not visual spectrogram matching, addressing the core NAS objection.

In 1962, Lawrence Kersta, an acoustical engineer at Bell Telephone Laboratories in Murray Hill, New Jersey, published a brief paper in Nature claiming that the frequency patterns of an individual's speech, plotted on a sound spectrogram, were as unique as fingerprints. He coined the word "voiceprint" and, within a decade, had founded the International Association of Voice Identification (IAVI) and was testifying in criminal courts across the United States and Canada that a trained analyst could match a suspect's voice to a crime-scene recording with sufficient reliability to support a conviction. That claim borrowed the authority of the standards and admissibility framework for fingerprint evidence without equivalent validation.

Fingerprint individualisation had been in courtrooms for half a century by then; tool-mark and hair comparison evidence were expanding; the forensic community was accustomed to the idea that physical traces carried enough individuality to identify a specific person. The spectrogram produced a visual pattern a jury could inspect on a chart. Kersta's early studies, conducted on Bell Labs recordings under controlled studio conditions, appeared to show error rates in the 0 to 6 per cent range, and law enforcement agencies and prosecutors recognised the potential investigative value.

Over the next two decades, independent research systematically tested the claims behind voiceprint identification. The 1979 National Academy of Sciences report on voice identification concluded that the scientific foundation was absent. The FBI eventually withdrew from spectrographic voice identification testimony in 1989. The episode remains the clearest historical case study in what happens when a forensic technique is deployed in court before adequate validation, and it shaped the principles that now govern the introduction of any new forensic science discipline.

By the end of this topic you will be able to:

Explain what a sound spectrogram measures and why Lawrence Kersta's 1962 voiceprint claim was considered credible at the time.
Identify the four specific deficiencies the 1979 NAS panel found in spectrographic voice identification, and distinguish the rejected individualisation claim from legitimate spectrogram use.
Describe the institutional factors that delayed the FBI's withdrawal from spectrographic testimony for a decade after the NAS verdict.
Explain how the Bayesian likelihood ratio framework differs epistemologically from the categorical match/no-match conclusion of the voiceprint era.
Apply the lessons of the voiceprint episode to evaluate premature individualisation claims in other forensic disciplines.

Lawrence Kersta, Bell Labs and the Birth of Voiceprint

The sound spectrograph, the instrument at the centre of the voiceprint controversy, was developed at Bell Laboratories during World War II as a tool for military signals intelligence. It plots the frequency content of a speech signal over time, with frequency on the vertical axis, time on the horizontal axis, and intensity represented by the darkness of the trace. A typical forensic spectrogram shows roughly 0 to 8,000 Hz over the duration of a spoken phrase, resolved into the formant bands that correspond to the resonances of the vocal tract.

Kersta's 1962 Nature paper, "Voiceprint Identification," presented results from two studies:

Trained listeners matched spectrograms of the same word spoken by each of nine speakers, achieving near-perfect accuracy.
Trained listeners attempted identification from spectrograms of connected speech samples drawn from a pool of 123 male speakers, again achieving high accuracy.

The critical conditions of both studies received insufficient scrutiny: trained Bell Labs personnel, matched studio-recorded samples, and a small controlled pool. These conditions did not represent real forensic casework.

By the mid-1960s, Kersta was testifying in criminal courts. A 1966 California case, People v. King, was among the earliest, though the trial court treated the evidence cautiously. By the early 1970s, courts in the US, Canada, and the UK were hearing spectrographic voice identification evidence with increasing regularity. In several cases, defendants were convicted partly on voiceprint evidence presented by IAVI-trained examiners who had completed Kersta's own training programme. The circular structure of the validation, where the technique's validity was supported primarily by the testimony of examiners trained by its inventor, was a structural flaw that would later feature prominently in the NAS critique.

Spectrogram schematic: frequency bands (formants F1-F3) plotted against time for a short speech segment; the voiceprint claim held that these patterns were person-specific enough for identification.

Courtroom Expansion and Early Scientific Resistance in the 1970s

Through the 1970s, voiceprint testimony spread to homicide cases, kidnapping prosecutions, and organised crime investigations in the United States and to criminal proceedings in the United Kingdom and Australia. A spectrogram chart, labelled and annotated by an expert, appeared to offer the jury tangible scientific evidence rather than opinion.

Several independent research groups began reporting results that contradicted Kersta's original accuracy claims:

Bolt, Cooper, David, Denes, Pickett, and Stevens (Journal of the Acoustical Society of America, 1970): trained examiners made substantially more errors than Kersta's data suggested, particularly when recordings came from different conditions (telephone vs. studio, whisper vs. normal voice, delayed vs. contemporaneous samples).
Tosi and colleagues (Michigan State University, 1972, funded by the Law Enforcement Assistance Administration): identification error rates of 6 to 13 per cent among trained examiners using 250 male speakers, substantially higher than the near-zero rates claimed for courtroom use.

The Michigan State study was read ambiguously. Some interpreted it as validating the technique (accuracy was above chance). Critics countered that a 13 per cent error rate, applied to an identification that could result in imprisonment and with no defined numerical threshold for a conclusion, was ethically indefensible. Legal challenges multiplied.

Jurisdiction responses diverged:

United Kingdom: courts generally treated spectrographic voice identification more sceptically than US courts throughout this period.
Canada: R. v. Trudel (1975) admitted the evidence; subsequent Canadian case law moved toward requiring corroborating identification before convictions could rest significantly on voiceprint.
United States: the FBI Technical Development Unit began evaluating voiceprint for operational use in the late 1960s. By the mid-1970s, FBI examiners were testifying in federal criminal cases. The Bureau invested substantial resources in training a cadre of voiceprint examiners, a decision that would create institutional inertia when the scientific foundation was later more formally questioned.

The 1979 NAS Report: No Scientific Foundation

In 1978, the National Academy of Sciences convened a committee at the request of the Law Enforcement Assistance Administration to evaluate the scientific basis for forensic speaker identification. The committee comprised acousticians, phoneticians, engineers, and statisticians. Its report, "On the Theory and Practice of Voice Identification," was published in 1979 and constitutes one of the most significant evaluations of a forensic technique ever conducted.

The NAS committee's core finding was unambiguous: spectrographic voice identification had not been validated to a standard that would support its use as reliable evidence of speaker identity in criminal cases. The committee identified four specific deficiencies:

Unrepresentative research conditions. Supporting studies used controlled, matched samples and restricted speaker pools. Real forensic casework involves telephonic recordings, background noise, temporal gaps, and recording distortion.
No standardised decision criteria. The technique had no defined boundary between a match and a non-match. Conclusions were entirely the examiner's subjective judgment.
No independent proficiency testing. The accuracy claimed by IAVI-trained examiners had not been verified by any party independent of the training organisation.
Unacceptably high error rates. The existing research evidence, reviewed in full, showed error rates substantially higher than those acceptable for a forensic identification technique.

The committee was careful to note that spectrograms have legitimate uses as analytical tools for phonetic analysis. The rejected claim was the specific one of forensic individualisation: the assertion that a given voice, heard on a recording, came from and only from one specific person, to a degree of certainty suitable for a criminal conviction.

The report prompted immediate legal challenges to pending and past convictions based on voiceprint evidence. In the US, several state courts re-examined the admissibility of the technique. California's Supreme Court, which had previously admitted voiceprint evidence, revisited the question in People v. Kelly (1976) and its aftermath; federal circuits diverged on admissibility under the Frye standard of general scientific acceptance. The NAS report made it substantially harder to establish that "general acceptance" existed, though it did not produce an immediate uniform ban.

Kersta's 1962 claims	NAS 1979 findings	Implication for courts
Error rates near 0% for trained examiners	Error rates of 6-13% in controlled research; higher in field conditions	Claimed accuracy does not survive independent replication
Visual spectrogram comparison is objective	No standardised decision criteria; conclusions are subjective examiner judgment	No verifiable threshold for a match or non-match conclusion
Training by IAVI certifies competency	No independent proficiency testing; validation circular (inventor trains examiners)	Certification does not prove operationally validated accuracy
Studio research generalises to casework	Field conditions (telephone noise, temporal gap, disguise) degrade accuracy substantially	Research conditions were unrepresentative of actual forensic use

The FBI's 1989 Withdrawal and the Long Tail of Spectrographic Evidence

The National Academy of Sciences report did not produce an immediate uniform withdrawal. The FBI continued to conduct and testify to spectrographic voice identification through the 1980s. Several factors contributed to this persistence. The FBI had trained a cadre of examiners with decades of experience; those examiners continued to believe in the technique's reliability based on their own casework experience. Prosecutors and law enforcement agencies that had used voiceprint evidence in convictions had institutional and legal interests in the technique's continued validity. And the courts' Frye standard, which asked whether a technique was generally accepted within the relevant scientific community rather than whether it had been validated to a defined scientific standard, left room for argument that the IAVI community of practitioners itself constituted a relevant reference group.

The turning point came internally within the FBI. A series of internal proficiency tests and examiner studies conducted in the late 1980s produced results consistent with the NAS critique: error rates that were substantially higher in blind testing than in open testing, and a lack of reproducibility between different examiners examining the same samples. In 1989, the FBI formally withdrew spectrographic voice identification from its testimony repertoire. FBI examiners would no longer offer voiceprint conclusions in federal court.

The withdrawal did not immediately resolve the admissibility question in state courts. Some state and local law enforcement agencies continued to use voiceprint examiners trained in the IAVI tradition through the 1990s. Post-Daubert (1993), the requirement that trial courts act as gatekeepers for scientific evidence using factors including error rate, peer review, and general acceptance, gave defendants stronger tools to challenge spectrographic evidence. By the early 2000s, spectrographic individualisation was effectively excluded from courts in the United States, Canada, the United Kingdom, and Australia. The Daubert framework's insistence on measured error rates proved particularly hostile to a technique that had never produced a validated, peer-reviewed error-rate figure suitable for forensic use.

In Europe, the parallel development of the Bayesian likelihood ratio framework for speaker comparison (described in the modern automated speaker recognition topic) provided courts with a scientifically defensible alternative methodology whose uncertainty was explicitly quantified, making the limitations of the old spectrographic approach even more apparent in comparison.

What the Field Absorbed: Premature Individualisation Claims and Their Cost

The spectrographic voiceprint controversy produced four lessons that now operate as guardrails across forensic science.

Lesson 1: Visual pattern similarity is not scientific individualisation. Matching patterns in two spectrograms, two tool marks, or two handwriting samples tells an observer only that the patterns are similar. Without a baseline frequency (how often equally similar patterns appear in different-source comparisons), a similarity observation cannot support a probability statement about source identity. Forensic science now requires error rates from research involving known-source and known-different-source pairs under casework-representative conditions. This is exactly the demand that the 2009 NAS critique of fingerprint individualization made three decades later for friction-ridge analysis.

Lesson 2: Certification by the technique's inventors is not validation. IAVI certification meant an examiner had completed Kersta's course and met Kersta's criteria. It did not mean conclusions had been tested against ground truth by an independent party. Modern forensic accreditation frameworks, including ILAC-G19 and the ISO 17025 standard applied to forensic laboratories, require proficiency testing by bodies independent of the training organisation.

Lesson 3: Institutional constituencies resist evidence-based revision. The decade-long gap between the 1979 NAS report and the 1989 FBI withdrawal was not primarily a scientific disagreement. It was an institutional one. The UK's Forensic Science Regulator now publishes Codes of Practice requiring laboratories to demonstrate that every casework method has been validated against an appropriate scientific standard before use, not after it has accumulated a decade of courtroom history.

Lesson 4: Court gatekeeping requires statutory or procedural teeth. In India, the Bharatiya Sakshya Adhiniyam 2023 (Section 79, expert evidence) preserves the court's role in evaluating the scientific basis of expert testimony, parallel to Daubert's gatekeeping function in the United States and the gatekeeper role established in R v. Bonython (1984) in Australia. Any new forensic technique deployed in Indian courts is, in principle, subject to the same scientific-validity scrutiny that eventually excluded voiceprint from Western courts. In practice, the development of robust validation and proficiency-testing infrastructure in India's state and central forensic science laboratories remains a work in progress.

From Voiceprint to Likelihood Ratio: The Disciplinary Recovery

The collapse of the spectrographic voiceprint paradigm created space for an alternative framework that had been developing in the acoustic phonetics research community since the early 1980s. The Bayesian likelihood ratio approach, which asks "how much more likely is the observed acoustic evidence if the recordings came from the same speaker than if they came from different speakers," offers a conclusion that is explicitly conditional and quantified rather than categorical. An examiner who reports a likelihood ratio of 100 is saying that the evidence is 100 times more likely under the same-speaker hypothesis than under the different-speaker hypothesis, not that the voices are identical, and not that no other person could have produced those patterns.

This epistemological shift was formalised in European forensic science organisations during the 1990s and 2000s. The European Network of Forensic Science Institutes (ENFSI), through its Speaker Identification Working Group, developed guidance documents recommending likelihood ratio reporting as the standard framework for forensic speaker comparison conclusions. The ENFSI Best Practice Manual for the Methodology of Forensic Speaker Comparison (2021, updated 2022) codifies this requirement (the topic is covered fully in the companion article on modern automated speaker recognition and the ENFSI BPM).

In the United Kingdom, the courts moved incrementally. R v. Robb (1991) admitted acoustic phonetic evidence on the understanding that the expert was offering an informed opinion, not a scientific individualisation. Subsequent case law, particularly in the Crown Court, has required experts to quantify their uncertainty or at minimum to acknowledge the probabilistic nature of their conclusions. The progression from Robb to the current Crown Prosecution Service guidance reflects the same movement from "matching voice patterns" to "quantified likelihood ratio" that occurred in DNA evidence between the late 1980s and mid-1990s.

In the United States, the post-Daubert landscape has been more uneven. The National Commission on Forensic Science's 2016 report on speaker identification flagged the absence of a large-scale, population-representative database for likelihood ratio calibration as a continuing gap. NIST's Speaker Recognition Evaluation series provides rigorous benchmarking of automated systems (the topic of the companion article), but the translation from automated system performance to the testimony of a human forensic phonetician remains contested.

The voiceprint episode thus closed with the discipline of forensic speaker comparison not abandoned but reformed: more cautious about what conclusions it claims, more explicit about uncertainty, and governed by a proficiency-testing and accreditation infrastructure that the IAVI era never possessed.

1962-1970s: Expansion without validation
Kersta introduces voiceprint at Bell Labs; IAVI founded; testimony spreads to US, Canadian, and UK courts before independent replication of accuracy claims.
1970s: Independent research raises doubts
Bolt et al. (1970) and Tosi et al. (1974) report error rates of 6-13% in independent studies; legal challenges increase; courts begin dividing on admissibility.
1979: NAS report condemns the technique
National Academy of Sciences panel finds no scientific foundation for forensic individualisation; identifies absence of standardised criteria, independent proficiency testing, and field-representative validation research.
1979-1989: Institutional resistance delays withdrawal
FBI and state forensic agencies continue voiceprint testimony; Frye standard provides cover; a decade passes between scientific verdict and institutional response.
1989: FBI withdraws spectrographic testimony
Internal proficiency tests confirm NAS findings; FBI formally ends voiceprint conclusions; post-Daubert (1993) courts develop stronger gatekeeping tools.
1990s-2000s: Bayesian framework replaces categorical claim
ENFSI speaker identification working group develops likelihood ratio guidance; UK courts move from R v. Robb's expert opinion to quantified LR conclusions; modern automated systems benchmarked via NIST SRE.

Voiceprint era: examiner asserts categorical match with no stated error rate; modern LR framework: examiner reports a ratio with explicit hypothesis, population baseline, and quantified uncertainty.

Key terms

Sound spectrograph: An instrument that plots the frequency content of a speech signal over time, with frequency on the vertical axis, time on the horizontal, and acoustic energy represented by trace darkness. Developed at Bell Labs in the 1940s; the visual output is a spectrogram.
Voiceprint: Lawrence Kersta's 1962 term for the claim that the spectrographic pattern of an individual's speech is unique enough to serve as a forensic identifier analogous to a fingerprint. The term is now largely abandoned in scientific literature because the uniqueness claim was not validated.
IAVI (International Association of Voice Identification): Organisation founded by Kersta in the late 1960s to train and certify forensic voice identification examiners in the spectrographic comparison method. Certification by IAVI was found by the 1979 NAS panel not to constitute independent validation.
NAS 1979 report: The National Academy of Sciences report 'On the Theory and Practice of Voice Identification,' which concluded that spectrographic voice identification lacked the scientific foundation to support forensic individualisation claims, including standardised criteria, independent proficiency testing, and validated error rates.
Formant: A resonance peak in the acoustic spectrum of speech, produced by the shape of the vocal tract. The first three formants (F1, F2, F3) carry most of the information distinguishing vowel sounds and differ between speakers to some degree.
Frye standard: The US legal test for admissibility of scientific evidence, originating from Frye v. United States (1923), which required that a technique be 'generally accepted' within its relevant scientific community. Replaced in federal courts by the Daubert standard (1993), which additionally requires peer review, known error rates, and methodological reliability.
Daubert standard: The US Supreme Court's 1993 standard for admissibility of expert scientific testimony, requiring federal courts to act as gatekeepers by evaluating whether the method is testable, has been peer-reviewed, has known error rates, and is generally accepted. Made it substantially harder to admit voiceprint evidence.
Likelihood ratio (LR): In forensic speaker comparison, the ratio of the probability of the observed acoustic evidence given that the two recordings came from the same speaker to the probability of the same evidence given that they came from different speakers. The framework that replaced the categorical match/no-match conclusion of the voiceprint era.
Proficiency testing: Blind or semi-blind testing of an examiner's conclusions against known ground truth by a body independent of the training organisation. Required under ISO 17025 and ILAC-G19 for accredited forensic laboratories; its absence in the IAVI certification scheme was a core critique in the 1979 NAS report.
ENFSI BPM: The European Network of Forensic Science Institutes Best Practice Manual for Forensic Comparison of Speech (2015, revised 2022), which codifies the likelihood ratio framework as the required reporting standard for forensic speaker comparison in ENFSI member laboratories.

Practice

Question 1 of 5· 0 answered

The 1979 National Academy of Sciences report on voice identification concluded primarily that:

Worked example

Court Rejection of a Voiceprint Match in a 1990s US Murder Case and Its Legacy

The voiceprint identification that helped convict a defendant in 1972 was challenged as its scientific basis eroded - illustrating how a forensic discipline's courtroom currency can collapse when the research catches up with the claims.

Scene: A 1972 US murder trial in Michigan. The prosecution calls Lawrence Kersta, the Bell Labs scientist who coined the term "voiceprint" and whose 1962 paper had claimed speaker identification accuracy comparable to fingerprints. Kersta testifies that spectrographic analysis of recorded threatening calls matches the defendant's known voice sample. The jury convicts. The conviction is affirmed on appeal.

Step 1 (the foundational claim): Kersta's methodology involved visual matching of spectrogram patterns: an analyst compared two spectrograms by eye and declared them identical if the visual patterns matched across key frequency bands. The claim was that each person's vocal tract geometry produced a unique spectrographic signature, analogous to a fingerprint. This claim was never subjected to a controlled blind proficiency test prior to courtroom use.

Step 2 (the research response): From the 1970s onward, controlled studies systematically undermined the voiceprint claim. Hazen (1973) found that trained voiceprint examiners working under Kersta's method had error rates of 10 to 30 per cent on closed-set tests. Tosi et al. (1972) found similar results. The National Academy of Sciences On the Theory and Practice of Voice Identification (1979) concluded the method lacked scientific foundation for the individualization claims being made in court.

Step 3 (legacy in admissibility): By the 1980s and 1990s, courts in New York, California, and other jurisdictions that had previously admitted voiceprint evidence began excluding it under Frye (general acceptance had eroded) and later under Daubert (known error rate was unacceptably high). The method's rejection was judicially complete in the US federal system by the mid-1990s. It stands as the clearest case in forensic science history of a discipline achieving widespread court acceptance before its foundational validity was tested, and then losing that acceptance as the research caught up.

Conclusion: The voiceprint rejection is directly relevant to contemporary forensic science debates: the NAS 2009 report explicitly cited the voiceprint trajectory as a warning about other forensic disciplines where categorical claims were made without error-rate validation. Modern forensic speaker recognition, using LR-based evaluative reporting and NIST SRE-calibrated systems, is explicitly designed to avoid the voiceprint pattern by anchoring every opinion to a documented, validated methodology with a known performance profile.

Can a spectrogram still be used as legitimate forensic evidence today?

Yes, as an acoustic measurement tool rather than as a visual pattern-matching identifier. The spectrogram is standard in forensic phonetics. Modern examiners use it to measure formant frequencies, fundamental frequency, voice quality features, and consonant transitions as part of a multi-feature acoustic analysis. These measurements are then incorporated into a likelihood ratio calculation. The rejected claim was the specific one of forensic individualisation by visual spectrogram matching. Using a spectrogram to measure F2 transitions and incorporating that measurement into an LR is scientifically legitimate; eyeballing two spectrograms and declaring the voices identical is not.

Why did the FBI keep using voiceprint evidence for a decade after the 1979 NAS report condemned it?

The decade-long gap between the 1979 NAS verdict and the 1989 FBI withdrawal was primarily institutional rather than scientific. The Bureau had trained a cadre of voiceprint examiners with years of operational experience who believed in the technique. Prosecutors and law enforcement agencies had institutional interests in its continued validity. The Frye 'general acceptance' standard left room to argue that the IAVI practitioner community itself constituted the relevant accepting reference group. The turning point came internally: late-1980s FBI proficiency tests produced results consistent with the NAS critique, including error rates substantially higher in blind testing than in open testing. The FBI formally withdrew in 1989; Daubert (1993) then gave courts much stronger tools to exclude residual claims.

How does Indian law treat forensic voice identification evidence today?

The Bharatiya Sakshya Adhiniyam 2023 (Section 79) governs expert evidence in India, requiring courts to assess the basis of an expert's opinion. Indian courts have admitted voice identification evidence in phone-tap cases, but admissibility and weight are determined case by case without the equivalent of a Daubert gatekeeping standard. The Supreme Court's rulings on intercepted telephone evidence (People's Union for Civil Liberties v. Union of India, 1997, and subsequent judgments) address interception legality more than the scientific validity of voice identification methodology. This contrasts with Australia's post-Bonython (1984) framework and the UK's post-Robb trajectory. For the modern probabilistic framework now used by ENFSI member countries, see the companion topic on modern automated speaker recognition.

Test yourself on Fingerprint Sciences with free, timed mocks.

Practice Fingerprint Sciences questions

Found this useful? Pass it along.

Spotted an error in this page? Report a correction or read our editorial standards.

Your journey to becoming a forensic professional starts here.