Neural TTS cloning

Definition

A text-to-speech system that adapts to a target speaker using a short enrollment recording, generating new utterances in that speaker's voice from text input. The input is text; the output is fully synthesised speech. Also called voice cloning or speaker-adaptive TTS.

Related terms

Anti-spoofing countermeasure (CM): A classifier, also called a CM system, trained to output a score indicating the probability that a given audio segment is genuine...
ASVspoof: A recurring evaluation campaign and dataset series that benchmarks anti-spoofing countermeasures against corpora of genuine and spoofed utterances. Editions in 2015, 2017,...
Equal error rate (EER): The point on a classifier's detection error tradeoff curve where the false accept rate equals the false reject rate. Lower EER indicates...
Tandem detection cost function (t-DCF): The primary evaluation metric in ASVspoof from 2019 onward. It measures the cost of errors when a countermeasure is integrated with an...
Voice conversion: A signal-processing or deep-learning technique that transforms the vocal characteristics of a source speaker's utterance to match a target speaker, while preserving...

Explained in

Voice Conversion and Cloning DetectionA text-to-speech system that adapts to a target speaker using a short enrollment recording, generating new utterances in that speaker's voice from text input....

Neural TTS cloning

Related terms

Explained in

Your journey to becoming a forensic professional starts here.