Skip to content

Neural TTS cloning

Definition

A text-to-speech system that adapts to a target speaker using a short enrollment recording, generating new utterances in that speaker's voice from text input. The input is text; the output is fully synthesised speech. Also called voice cloning or speaker-adaptive TTS.

Related terms

Anti-spoofing countermeasure (CM)
A classifier, also called a CM system, trained to output a score indicating the probability that a given audio segment is genuine...
ASVspoof
A recurring evaluation campaign and dataset series that benchmarks anti-spoofing countermeasures against corpora of genuine and spoofed utterances. Editions in 2015, 2017,...
Equal error rate (EER)
The point on a classifier's detection error tradeoff curve where the false accept rate equals the false reject rate. Lower EER indicates...
Tandem detection cost function (t-DCF)
The primary evaluation metric in ASVspoof from 2019 onward. It measures the cost of errors when a countermeasure is integrated with an...
Voice conversion
A signal-processing or deep-learning technique that transforms the vocal characteristics of a source speaker's utterance to match a target speaker, while preserving...

Explained in

  • Voice Conversion and Cloning DetectionA text-to-speech system that adapts to a target speaker using a short enrollment recording, generating new utterances in that speaker's voice from text input....

Your journey to becoming a forensic professional starts here.

Practice with mock tests, learn from structured notes, and get your questions answered by a global forensic community, all in one place.