Microphone and Codec Identification in Audio Forensics

Microphones and audio codec implementations leave measurable signatures in recordings through frequency response curves, noise floors, and quantisation artefacts. This topic explains how spectral fingerprinting and bitstream analysis allow forensic examiners to link a recording to a device class or to detect recompression inconsistent with claimed recording conditions.

Last updated: 24 Jun 2026

Microphone and codec identification is the branch of audio forensics concerned with linking a recording to a physical device class or detecting post-recording processing that contradicts a file's claimed history. Every microphone introduces a characteristic frequency response curve, a self-noise floor, and mechanical resonances determined by its capsule type and housing. Every lossy audio codec leaves quantisation artefacts at predictable spectral positions that depend on the codec family, the bitrate setting, and the encoder implementation. When these signatures are inconsistent within a single recording, or inconsistent with the device a witness claims was used, that inconsistency is forensically significant. The two principal analytical methods are spectral fingerprinting, which examines the overall frequency distribution and noise profile, and bitstream analysis, which inspects the encoded data structure for evidence of recompression or format conversion.

Audio recordings are submitted as evidence in criminal proceedings, civil disputes, and regulatory investigations worldwide. Their probative value depends on whether they are authentic originals, or whether they have been edited, recompressed, or fabricated. Device identification contributes to authentication by establishing whether the claimed recording conditions are consistent with the recording's technical signatures. A voice memo recorded on a consumer smartphone should carry the frequency response of that device's MEMS microphone and the quantisation characteristics of the AAC encoder built into its operating system. A recording that lacks those signatures, or that carries signatures from a different device class, requires explanation.

The field has expanded significantly as audio evidence has diversified. Courtrooms in the United States, United Kingdom, India, and across the European Union now regularly receive recordings from smartphones, IP telephony systems, smart speakers, body-worn cameras, and surveillance systems, each with its own technical signature set. The rise of AI voice synthesis adds a further question: whether the recording is from a real microphone at all, or is a generated artefact. Microphone and codec analysis is therefore both a classic source-attribution technique and an active area of research responding to synthetic media threats.

By the end of this topic you will be able to:

Explain how microphone capsule type and housing geometry produce measurable frequency response signatures that can be used to classify a recording's source device.
Describe the spectral artefacts that lossy codecs introduce and explain how bitrate-dependent quantisation signatures are used to detect double compression.
Distinguish between encoder-family identification and bitrate identification in bitstream analysis, and explain what each can and cannot prove about a recording's history.
Describe how microphone signature analysis contributes to detecting AI-generated or voice-cloned audio alongside other authentication methods.
Summarise the documentation and disclosure obligations that apply when presenting microphone and codec identification findings in court across different jurisdictions.

Key terms

Spectral fingerprint: The characteristic frequency response profile of a recording device, including roll-off curves, resonance peaks, and self-noise level. Comparing a recording's spectral fingerprint against reference databases can link it to a device class.
Self-noise (noise floor): The electrical noise inherent in a microphone circuit and capsule, measured in dB(A) equivalent input noise. Different microphone types have characteristic noise floors: MEMS microphones used in smartphones typically exhibit 58 to 65 dB(A) EIN; high-quality condenser microphones achieve below 20 dB(A).
Double compression: The condition in which a lossy-encoded audio file has been decoded and re-encoded, leaving two overlapping sets of quantisation artefacts. The inner artefact layer reflects the first encoding pass and is forensically detectable even after the second encoding.
Quantisation noise: The distortion introduced when a continuous audio signal is rounded to discrete digital values. Lossy codecs concentrate quantisation noise at predictable frequency bands determined by the codec's psychoacoustic model and the target bitrate.
MEMS microphone: A Micro-Electro-Mechanical Systems microphone: the capsule type used in virtually all modern smartphones and tablets. MEMS microphones have a characteristically flat midrange response with a defined roll-off below about 100 Hz and above 16 kHz, and a self-noise of 58 to 68 dB(A) EIN.
Psychoacoustic masking model: The perceptual model used by lossy codecs such as MP3 and AAC to decide which frequency components can be discarded or coarsely quantised without audible loss. The specific masking thresholds and sub-band structures differ by codec family, leaving identifiable spectral patterns in the encoded file.

Microphone physics and frequency response signatures

A microphone converts acoustic pressure variations into an electrical signal. The fidelity and coloration of that conversion depend on the capsule design, the diaphragm material and size, the polar pattern, and the housing geometry. These physical parameters are not arbitrary: they are engineering constraints that produce predictable and measurable frequency response characteristics.

MEMS microphones, standard in smartphones and laptops, use a microscale silicon diaphragm over a fixed backplate. They produce a characteristically flat response from around 100 Hz to 10 kHz, with a gentle high-frequency roll-off and a steeper low-frequency roll-off below 100 Hz. Their self-noise floor is relatively high compared to studio condenser microphones, typically 58 to 68 dB(A) equivalent input noise. Electret condenser microphones, used in many conference systems and consumer recorders, share the backplate structure but use a charged polymer diaphragm; their response extends higher and their self-noise is lower. Dynamic microphones, common in broadcast and body-worn surveillance units, use electromagnetic induction; they have a narrower high-frequency response and a characteristic proximity effect that boosts low frequencies at close range.

Forensic examiners build frequency response profiles from recordings made in controlled conditions, comparing the average spectral shape of silent passages and low-signal segments against device reference databases. The comparison is statistical rather than absolute: noise floor measurements are averaged over multiple frames, and roll-off slopes are fitted to reference curves. The result is a probability-weighted assignment to a device class, not a serial-number-level identification. However, class attribution is often sufficient for forensic purposes: establishing that a recording's microphone characteristics are inconsistent with an iPhone 14 rear microphone but consistent with a lapel condenser microphone can contradict a witness's account of how the recording was made.

Codec artefacts and bitrate-dependent signatures

Lossy audio codecs compress data by applying a psychoacoustic masking model: they identify frequency components that are perceptually masked by louder nearby components and either discard them or quantise them coarsely. The specific sub-band structures, masking thresholds, and quantisation step sizes differ by codec family (MP3, AAC, Ogg Vorbis, Opus, WMA) and by the target bitrate within each family. These differences leave measurable spectral signatures.

Codec	Sub-band structure	Typical spectral cut-off at 128 kbps	Common forensic indicator
MP3 (MPEG-1 Layer III)	32 polyphase sub-bands + MDCT	~16 kHz	Spectral holes at MDCT block boundaries; frame sync headers
AAC-LC	1024-point MDCT, filterbank	~18 kHz at 128 kbps	Consistent spectral envelope shaping; no sub-band aliasing
Ogg Vorbis	MDCT, variable block size	~18 kHz at 128 kbps	Variable frame sizes; floor curve artefacts
Opus	SILK + CELT hybrid	Full bandwidth at 32+ kbps	Distinct frame structure; SILK mode visible at low bitrates
WMA Standard	MDCT, modified block sizes	~16 kHz at 128 kbps	Microsoft-specific quantiser rounding signatures

The most forensically useful signature is the spectral ceiling: the highest frequency present in a recording. A 128 kbps MP3 file typically cuts off sharply at around 16 kHz because the encoder's psychoacoustic model drops those components at that bitrate. If a file is presented as an original 44.1 kHz WAV recording but its spectrum shows an abrupt cut-off at 16 kHz, the file has been down-converted from an MP3 at some point, even though it is now packaged as a lossless format. This is a simple but highly reliable indicator of intermediate lossy compression.

Within a codec family, different bitrate settings produce different cut-off frequencies and different quantisation noise distributions. A 64 kbps MP3 cuts off below 14 kHz; a 192 kbps MP3 may extend to 18 kHz or above. The bitrate can therefore be estimated from the spectral ceiling, though some encoders apply variable bitrate (VBR) settings that produce a distribution of cut-off frequencies across the file. The encoder implementation also matters: LAME, FhG, and iTunes AAC produce subtly different quantisation patterns even at the same nominal bitrate. Encoder identification from these patterns has been demonstrated in research settings and is beginning to appear in casework.

Detecting double compression and recompression

Double compression is the condition in which a file has been encoded with a lossy codec, decoded to PCM, and then re-encoded, either with the same codec at a different bitrate or with a different codec. Each encoding pass leaves its own set of quantisation artefacts. When the second pass encodes data that already contains the first pass's artefacts, the second encoder's masking model interacts with the residual distortion from the first pass, producing a characteristic interference pattern.

The detection method depends on the codec combination. For MP3-to-MP3 double compression, the most reliable indicator is the distribution of quantisation noise across the MP3 frame boundaries: a singly-encoded file shows consistent noise statistics across frames, while a doubly-encoded file shows a periodic variation that corresponds to the first encoder's frame structure being partially preserved through the second encoding pass. This is detectable even when the second encoding is at a higher bitrate than the first, a scenario sometimes called up-encoding that is used to disguise the compression history of an edited file.

Cross-codec double compression, such as AAC followed by MP3, is detectable through the combined spectral signatures. The AAC spectral envelope shaping will be present in the residual, and the MP3 MDCT block structure will be superimposed on it. These combined artefacts are more complex to interpret but remain detectable with spectral analysis software such as Adobe Audition's spectral display, Sonic Visualiser with the appropriate plugin set, or specialist forensic tools including Praat for pitch-related analysis and hex editors for container-level bitstream inspection.

Bitstream analysis: container and header inspection

Beyond spectral analysis of the decoded audio, the encoded bitstream itself carries metadata that can be examined directly. MP3 files contain frame headers at fixed intervals; each header specifies the bitrate, sample rate, channel mode, and encoder flags used for that frame. Inconsistencies between headers in the same file, or between the headers and the container-level metadata, indicate that the file has been assembled from multiple sources or has been partially re-encoded.

AAC files in the M4A or MP4 container carry an ESDS (Elementary Stream Descriptor) atom that records the encoder configuration. WAV and AIFF files carry format chunks specifying sample rate, bit depth, and channel count. These container-level fields are sometimes modified by editing software independently of the audio data, producing a mismatch that is itself forensically informative: a WAV file claiming 44.1 kHz sample rate but containing audio with a 16 kHz spectral ceiling was not recorded at 44.1 kHz with that spectrum intact.

ID3 tags in MP3 files and XMP or EXIF metadata in other containers can carry recording date, device model, and encoder version strings. These fields are user-writable and are therefore unreliable as standalone evidence, but they can be cross-checked against the technical signatures in the audio data. A file whose ID3 tag claims to be recorded by a specific smartphone model but whose microphone signature is inconsistent with that model's known characteristics presents an evidentiary contradiction that warrants investigation. Metadata analysis for audio files follows the same chain-of-custody principles as for image files, as covered in the authentication and integrity fundamentals for multimedia evidence.

Microphone analysis and AI-generated audio detection

AI voice synthesis systems, including text-to-speech engines and voice conversion models, generate audio from neural network outputs rather than from a physical acoustic source. Early synthesis systems produced audio with unrealistic spectral flatness above 8 kHz, consistent background noise levels that do not vary as a real room's acoustics would, and a complete absence of the microphone self-noise that all physical recordings carry. These absences were reliable detection features.

Current synthesis systems are more sophisticated. Some apply convolution reverb to simulate room acoustics, add synthesised microphone noise at levels calibrated to match specific device classes, and encode through real codec pipelines to produce files that carry plausible compression signatures. Detection therefore requires looking for inconsistencies rather than mere absences: for instance, microphone self-noise that is statistically too consistent across frames (real microphone noise is a random process and shows natural variation), room acoustics that do not match the reported recording location, or pitch periodicity patterns that are subtly different from natural phonation.

Voice cloning, in which a synthesis model is conditioned on a target speaker's voice, presents an additional layer of complexity. The prosody and spectral content of the cloned voice may closely match the genuine speaker, making perceptual detection unreliable. Microphone and codec analysis contributes to detection by identifying whether the recording's physical signatures are consistent with a genuine recording by that speaker in the claimed context. The topic of voice conversion and cloning detection covers the signal-level classifiers designed specifically for that task.

Evidential standards and court presentation

Audio authentication findings are admissible in courts across multiple jurisdictions, but the standards for admissibility and presentation differ. In the United States, the Federal Rules of Evidence require that expert testimony rest on sufficient facts or data, be the product of reliable principles and methods, and that the expert has applied those methods to the facts of the case (Daubert standard, Kumho Tire v Carmichael 1999). The Audio Engineering Society standard AES27 provides a reference framework for authentication methodology. The Scientific Working Group for Digital Evidence (SWGDE) publishes technical guidelines that courts have cited in assessing reliability.

In the United Kingdom, the Forensic Science Regulator's Codes of Practice and Conduct require accredited laboratories to follow documented procedures and to disclose the uncertainty in their findings. In India, Section 63 of the Bharatiya Sakshya Adhiniyam 2023 requires a certificate of authenticity for electronic records tendered as evidence; the certificate must identify the device, the method of production, and the person certifying the record. Across the European Union, ISO/IEC 27037 provides the international standard for digital evidence handling and is increasingly referenced in court decisions concerning digital evidence. These are not isolated requirements: the core obligation in every jurisdiction is to document the chain of custody, the analytical method, the tools used, and the basis for each conclusion, so that another qualified examiner could reproduce the analysis and evaluate the findings.

Expert reports should state the limitations of device-class identification clearly. Spectral fingerprinting places a recording within a class of microphones; it does not establish that the recording came from a specific device. Double compression detection establishes that intermediate encoding occurred; it does not establish when, by whom, or for what purpose. These distinctions matter for how findings are framed in testimony. Overstating the specificity of a class-level attribution is a common error that opposing experts and careful judges will identify.

Worked example

Tracing a recording's compression history from submitted file to court report

A disputed WhatsApp voice message is submitted as evidence in a fraud case. The claimant says it is an unedited original recording. The defence requests authentication analysis.

The submitted file is an .opus file, the format used by WhatsApp for voice messages. The following steps trace the analysis from initial examination to court-ready findings.

Container inspection. The file header is examined in a hex editor and with MediaInfo. The Opus container reports a sample rate of 48 kHz, a single audio stream at approximately 32 kbps, and an encoder string consistent with WhatsApp's libopus build. No anomalies in the container metadata at this stage.
Spectral ceiling check. The decoded audio is loaded into Sonic Visualiser. The spectrogram shows a consistent spectral ceiling at approximately 16 kHz rather than the 24 kHz ceiling expected for a 48 kHz Opus file at 32 kbps. The Opus codec at 32 kbps should preserve content up to 20 kHz or above, depending on the signal. A hard cut at 16 kHz indicates the audio data fed into the Opus encoder was already band-limited by a prior encoding step.
Intermediate codec identification. The 16 kHz cut-off corresponds to the typical spectral ceiling of a 128 kbps MP3 encoding. The quantisation noise pattern below 16 kHz shows the periodic frame-boundary variation characteristic of MP3 MDCT blocks. The finding is that the audio was encoded as MP3 at approximately 128 kbps before being converted to Opus.
Microphone signature check. The noise floor in silent passages is measured. The spectral shape of the noise is flat from 100 Hz to 14 kHz with a high-frequency roll-off consistent with a MEMS microphone at approximately 62 dB(A) EIN. This is consistent with a smartphone microphone, which is expected for a WhatsApp voice message. No inconsistency is found at this stage.
Conclusion and report. The file is a genuine Opus recording of a MEMS-microphone source, consistent with a smartphone. However, the audio data inside the Opus container was not recorded directly into Opus: it was previously encoded as MP3 at approximately 128 kbps. WhatsApp records directly to Opus, so the MP3 intermediate step was not part of the normal WhatsApp workflow. This indicates the audio was processed outside WhatsApp before being packaged into the submitted .opus file. The report states the finding, the method, the tools (MediaInfo version, Sonic Visualiser version, plugin set), and the limitation that the analysis cannot determine when the intermediate MP3 encoding occurred or by whom.

Check your understanding

Question 1 of 4· 0 answered

A recording is submitted as an original 44.1 kHz WAV file. Its spectrogram shows a hard spectral cut-off at approximately 16 kHz. What does this most likely indicate?

Key Takeaways

Every microphone type, including MEMS capsules in smartphones, electret condensers, and dynamic microphones, imparts a characteristic frequency response curve and self-noise floor that can be measured and compared against reference databases to classify a recording's source device.
Lossy codecs leave bitrate-dependent spectral artefacts, most visibly a hard spectral ceiling at the codec's cut-off frequency. A WAV file showing a 16 kHz ceiling has been through an intermediate 128 kbps MP3 encoding even if the current container is lossless.
Double compression is detectable through the periodic variation in quantisation noise statistics at frame boundaries. Up-encoding to a higher bitrate does not erase the first compression pass's spectral ceiling or frame-level artefacts.
AI-generated audio may lack physical microphone self-noise or carry synthesised noise with unrealistically consistent statistics. Microphone analysis is one element of a broader authentication workflow for synthetic media, complementing dedicated voice-conversion detectors.
Court reports must state that spectral fingerprinting identifies device class, not individual devices, and that double compression detection establishes that intermediate encoding occurred without establishing who performed it or when. Admissibility requirements differ by jurisdiction: AES27 and SWGDE guidelines in the US, FSR Codes in the UK, Section 63 Bharatiya Sakshya Adhiniyam 2023 in India, and ISO/IEC 27037 across the EU.

What is spectral fingerprinting in audio forensics?

Spectral fingerprinting is the analysis of a recording's frequency response profile to identify characteristics that are consistent with a particular microphone class or device type. Every microphone capsule design, diaphragm size, and housing geometry imparts a distinctive roll-off curve, resonance peak pattern, and noise floor. Comparing those characteristics against a reference database can link an audio recording to a specific class of device, supporting or contradicting claims about its origin.

What is double compression and why does it matter in audio authentication?

Double compression occurs when an audio file has been encoded with a lossy codec, decoded, and then re-encoded. Each lossy encoding cycle leaves quantisation artefacts at the codec's characteristic frequencies. When a file carries signatures of two encoding passes, such as spectral holes at MP3 frame boundaries beneath a AAC layer, the examiner can infer that the file has been processed after its claimed original recording. This is a common indicator of editing or forgery.

How do codec bitrate signatures help detect tampering?

Lossy codecs like MP3 and AAC apply psychoacoustic models that reduce data at predictable frequency bands depending on the target bitrate. The resulting spectral gaps and quantisation noise patterns are characteristic of each bitrate setting and encoder implementation. If a recording's bitstream carries internal evidence of a lower-bitrate encoding pass inside a higher-bitrate container, that discrepancy indicates the file has been re-encoded, which may indicate tampering or a falsified recording date.

Can microphone characteristics distinguish a genuine recording from an AI-generated one?

Partially. Genuine recordings made with a physical microphone carry the device's frequency response curve, self-noise, and room acoustics. Many AI-generated speech synthesis systems produce audio that lacks these physical signatures or applies them inconsistently. An examiner can look for the absence of expected microphone noise profiles, unrealistic spectral flatness in high frequencies, or acoustic environments that are physically inconsistent. However, some synthesis systems now add simulated room noise and microphone coloration, so microphone analysis is one element of a broader authentication workflow rather than a standalone test.

What standards govern presentation of audio authentication evidence in court?

There is no single global standard. In the United States, the Scientific Working Group for Digital Evidence (SWGDE) and the Audio Engineering Society (AES) standard AES27 provide technical guidelines. In the United Kingdom, the Forensic Science Regulator's Codes of Practice and Conduct require documented methodology and peer review. Under India's Bharatiya Sakshya Adhiniyam 2023, electronic records require a certificate of authenticity under Section 63. EU courts increasingly reference ISO/IEC 27037 for digital evidence handling. In all jurisdictions the examiner must document the tools used, the chain of custody, and the basis for each finding.

Test yourself on Multimedia Authentication and Deepfake Forensics with free, timed mocks.

Practice Multimedia Authentication and Deepfake Forensics questions

Found this useful? Pass it along.

Spotted an error in this page? Report a correction or read our editorial standards.

Key Takeaways

Your journey to becoming a forensic professional starts here.