Practice with mock tests, learn from structured notes, and get your questions answered by a global forensic community, all in one place.
Video and audio files are two-layer structures: a container that organises tracks and timing, and a codec that compresses the actual media data. The interaction between these layers creates characteristic artefacts that forensic examiners use to detect re-encoding, reconstruct timelines, and identify source devices.
Last updated:
Every video file you encounter as evidence is a nested structure: an outer container that organises multiple tracks and encodes timing, and inner codec streams that compress the actual image and sound data. Understanding this two-layer structure is not just academic. It is what lets a forensic examiner tell the difference between a video that has been re-encoded (and therefore may have been manipulated) and one that has only been rewrapped (and is forensically equivalent to the original), or detect the I-frame irregularities that mark where an edit was made.
Inter-frame compression is the other key idea. Video codecs like H.264 do not store each frame as a complete image. Most frames store only the differences from their neighbours, so decoding a single frame requires knowledge of the frames around it. This Group of Pictures (GOP) structure is efficient for playback but critical for forensic analysis: tampering with one frame can corrupt many around it, and the pattern of frame types is an encoder fingerprint.
Audio forensics adds another layer. Lossless formats like WAV preserve the actual sample values; lossy formats like MP3 and AAC apply psychoacoustic models that discard information permanently. Each codec leaves characteristic compression artefacts, and a recording that claims to be original but shows the spectral holes of MP3 encoding is telling a story about its history. This topic builds the structural vocabulary an examiner needs before touching any video or audio evidence.
The container is the file's filing system, not its content.
A container format defines a byte-level specification for how multiple streams of data (video, audio, subtitles, chapters, metadata) are interleaved and indexed within a single file. The container handles the administrative structure: which byte range contains which track, at what time offset, and in what order samples should be presented. It is analogous to a ZIP archive that carries separate files but also encodes the relationship between them.
| Container | Extension | Governing spec | Common use context |
|---|---|---|---|
| ISO Base Media File Format | .mp4 / .m4v / .m4a | ISO/IEC 14496-12 | Phones, tablets, streaming, web; most common container in current casework |
| Matroska | .mkv / .mka / .webm | Matroska spec (IETF draft) | Open-source community, long-term archiving, flexible codec support |
| QuickTime | .mov | Apple QTFF spec | Apple devices; structurally similar to MP4, shares atom/box hierarchy |
| AVI | .avi | Microsoft RIFF/AVI spec | Legacy Windows recordings, older CCTV systems |
| MXF | .mxf | SMPTE 377-1 | Broadcast, professional production, digital cinema |
For a forensic examiner, the container structure is the first place to look. MP4 and MOV files are built from a hierarchy of typed units called atoms (QuickTime) or boxes (ISO BMFF). Key boxes include moov (the index of all track metadata and sample locations), mdat (the actual media data), and ftyp (the brand identifier that specifies which variant of the spec the file targets). The position of moov relative to mdat in the file reveals whether the file was optimised for streaming (moov first) or captured in a single pass (moov last), which is a useful indicator of the recording workflow.
Most video frames do not actually exist as complete images.
H.264 (Advanced Video Coding, AVC) is the most widely deployed video codec in forensic casework. It compresses video using a combination of intra-frame compression (applied within a single frame) and inter-frame compression (exploiting similarities between adjacent frames). Understanding both is necessary to assess what information survives compression and what does not.
Intra-frame compression in H.264 divides each frame into macroblocks (16×16 pixel regions) and predicts each macroblock from its neighbours within the same frame, then encodes the prediction residual with a transform (4×4 or 8×8 DCT), quantisation, and entropy coding. Inter-frame compression predicts a macroblock from a nearby block in a reference frame, encoding only the motion vector and the residual.
H.265 (HEVC) extends H.264's approach with larger coding units (up to 64×64), improved intra prediction modes, and better parallel processing. For forensics, HEVC produces fewer blockiness artefacts at equivalent bitrates, which can make some visual-quality based manipulation detectors less reliable at high compression. VP9 and AV1, used in web streaming, follow similar inter-frame principles with open-source codecs that do not require licensing fees.
MPEG-2, though older (1995), remains common in broadcast footage, standard-definition CCTV, and DVD-sourced material. Its macroblock structure (16×16 blocks, no recursion) produces a more visible blocky appearance at high compression, which can be useful for estimating the original recording bitrate or detecting resampling.
The audio codec is a forensic timeline of every re-encoding the recording has been through.
Audio evidence appears most commonly in four formats: WAV (PCM), FLAC, MP3, and AAC. The distinction between lossless and lossy matters more acutely for audio than for video because forensic audio tasks often depend on fine spectral features. Speaker identification, gunshot detection, and background noise analysis all require the original frequency content, not a perceptually filtered approximation.
| Format | Compression type | Typical use | Forensic implication |
|---|---|---|---|
| WAV (PCM) | None (uncompressed) | Studio recording, phone call logs, court-ordered intercepts | Gold standard; all spectral information intact |
| FLAC | Lossless | Archival, audiophile; also some CCTV systems | Bit-exact reconstruction; metadata includes encoding software |
| MP3 (MPEG-1 Layer III) | Lossy, perceptual coding | Consumer music, voice memos, older phones | Pre-echo, temporal masking holes, frequency cutoff at bitrate-dependent ceiling |
| AAC (Advanced Audio Coding) | Lossy, perceptual coding | Modern phones, streaming, iOS recordings | Similar artefacts to MP3 but better at low bitrates; HEAAC used below 32 kbps |
MP3 encoding applies a short-time Fourier transform (MDCT), models the audibility of each frequency component against the current masking threshold, and discards or coarsely quantises components below the threshold. This produces a spectral ceiling: at 128 kbps, all frequency content above roughly 16 kHz is discarded. A recording that claims to be uncompressed PCM but shows a flat spectral noise floor up to 16 kHz and then zero energy above it was at some point encoded as MP3, regardless of what its file header says.
One operation leaves a new generation of artefacts; the other does not.
Rewrapping moves compressed bitstream data from one container to another without touching the codec layer. A Matroska MKV file containing an H.264 stream rewrapped into an MP4 container produces a file whose video data is bit-for-bit identical to the original. The codec artefacts, PRNU-equivalent noise patterns, and encoder fingerprints are all preserved. This operation is common in legitimate workflows: a video editor may rewrap for compatibility without re-encoding the content.
Re-encoding is different. It decodes the compressed bitstream back to raw frames (or samples), then re-applies the codec at some quality setting. Every re-encoding round introduces new quantisation error and potentially new artefacts. The original encoder fingerprint is diluted. If a manipulated region was introduced during re-encoding, the entire video may show fresh artefacts that obscure what was there before. Detecting whether a video has been re-encoded at least once is therefore one of the first questions a forensic video examiner asks.
Detecting re-encoding uses several signals. A sudden change in the bitrate profile, I-frame density, or quantisation parameter (QP) curve mid-file is a candidate marker. The presence of encoder-specific metadata from a different tool (e.g., an x264 encoding tag in a file claimed to be from a specific camera model) is another. Comparing the DCT coefficient statistics across the file can reveal regions with distinct quantisation histories.
Every encoder makes choices that are not strictly required. Those choices are the fingerprint.
Encoders have degrees of freedom beyond what the codec standard requires: GOP size, reference frame count, quantisation parameter curves, rate control mode, in-loop filter strength, and the specific entropy coding tables chosen. Different manufacturers and software applications make different choices, and those choices leave patterns in the bitstream. Forensic analysis of the video bitstream can infer the encoder in much the same way that EXIF metadata identifies a camera model.
A video file has extension .mp4 but ffprobe reports the video codec as H.265 (HEVC). What does this tell you about the container and codec relationship?
Test yourself on Forensic Audio, Video and Image Analysis with free, timed mocks.
Practice Forensic Audio, Video and Image Analysis questionsSpotted an error in this page? Report a correction or read our editorial standards.