Multimedia Forensics: Voice Spectrography and MFCC Features
Published:
Questions
30
Duration
30 min
Faculty-reviewed
0
Updated
26 May 2026
Practice with national-level exam (FACT, FACT Plus, NET, CUET, etc.) mocks, learn from structured notes, and get your doubts solved in one place.
Published:
Questions
30
Duration
30 min
Faculty-reviewed
0
Updated
26 May 2026
Score, per-question explanations and topic breakdown shown right after you submit.
Free ForensicSpot account required to save your progress — you’ll sign in when you start.
This set drills the acoustic and computational foundations of forensic voice examination as tested in UGC-NET Forensic Science Paper II Unit VIII. Wide-band spectrography (analysis filter bandwidth 300 Hz) resolves formant bars F1 through F4 clearly but smears individual pitch harmonics; narrow-band spectrography (45 Hz bandwidth) resolves individual harmonics and tracks the fundamental frequency F0 but blurs formant structure. Knowing which bandwidth to choose for a given evidential question is a daily decision in a forensic audio unit. Pitch tracking algorithms covered here include autocorrelation, cepstral peak picking, and the YIN algorithm; each has a different error mode when voice is creaky or whispery. Formant analysis maps the resonant frequencies of the vocal tract, with F1 inversely related to vowel height and F2 related to vowel backness, giving each speaker a characteristic vowel space. MFCC extraction follows the canonical pipeline: pre-emphasis filter (coefficient 0.97) boosts high frequencies before framing (20 to 40 ms frames with 50 percent overlap), a Hamming window reduces spectral leakage, FFT converts each frame to the frequency domain, a Mel-scale filterbank maps the spectrum to perceptual frequency bins, log compression mimics the auditory dynamic-range mechanism, and the DCT decorrelates the filterbank energies into 13 standard cepstral coefficients. The Mel scale (Stevens, Volkmann, and Newman 1937) places equal perceptual pitch intervals at equal linear distances. Delta and delta-delta coefficients append first- and second-order temporal derivatives to capture speaking rate and spectral dynamics. LPC models speech production as a source-filter system; the filter order governs how many formant peaks the model can represent. VAD removes silence frames before feature extraction. Phonetic alignment tools Praat and HTK anchor acoustic measurements to specific phones.
Aimed at UGC-NET Forensic Science Paper II aspirants covering Unit VIII, NFSU MSc students in multimedia forensics, CFSL and state FSL audio-forensics trainees, and candidates preparing for IAFPA-aligned competency assessments. CDAC speech-research groups and ENFSI Forensic Speech and Audio Analysis Working Group guidelines inform cohort-selection and reliability questions in this set.
Topics covered:
Work through each question before checking the explanation, and revisit every wrong answer against the cited Rose, Hollien, Maher, and Rabiner and Schafer references. Allow 30 minutes.
Questions are written and edited by the ForensicSpot team and cited from peer-reviewed forensic textbooks, official syllabi and primary case law. Each one is verified before publishing. Detailed explanations show after you submit, so the test stays a real test. See a mistake? Tell us.