Voice Conversion

Definition

A signal-processing or deep-learning technique that transforms the vocal characteristics of a source speaker's utterance to match a target speaker, while preserving the linguistic content. The input is real speech; the output is modified real speech with a different perceived identity.

Related terms

Anti-Spoofing Countermeasure (CM): A classifier, also called a CM system, trained to output a score indicating the probability that a given audio segment is genuine...
ASVspoof: A recurring evaluation campaign and dataset series that benchmarks anti-spoofing countermeasures against corpora of genuine and spoofed utterances. Editions in 2015, 2017,...
Diffusion Model: A generative neural network architecture (Ho et al., 2020; Stable Diffusion, Rombach et al., 2022) that learns to reverse a noise-addition process...
Encoder-Decoder: A neural architecture where an encoder compresses an input into a compact latent representation and a decoder reconstructs an output image from...
Equal Error Rate (EER): The point on a classifier's detection error tradeoff curve where the false accept rate equals the false reject rate. Lower EER indicates...
GAN: Generative Adversarial Network. A framework with two neural networks, a generator that creates synthetic data and a discriminator that tries to distinguish...
Latent Space: The compressed, lower-dimensional representation of data learned by a neural network's internal layers. Generative models sample from or navigate this space to...
NeRF (Neural Radiance Field): A neural representation that encodes a 3-D scene as a continuous volumetric function, allowing novel viewpoints to be rendered. In talking-head systems,...
Neural TTS Cloning: A text-to-speech system that adapts to a target speaker using a short enrollment recording, generating new utterances in that speaker's voice from...
Tandem Detection Cost Function (T-DCF): The primary evaluation metric in ASVspoof from 2019 onward. It measures the cost of errors when a countermeasure is integrated with an...

Explained in these topics

Deepfake Generation: GANs, Diffusion, and Face-Swap PipelinesTransforming the timbre and identity of one speaker's voice to match another while preserving the linguistic content. Used in voice-cloning attacks to imperson...
Voice Conversion and Cloning DetectionA signal-processing or deep-learning technique that transforms the vocal characteristics of a source speaker's utterance to match a target speaker, while prese...

Voice Conversion

Related terms

Explained in these topics

Your journey to becoming a forensic professional starts here.