Detecting Frame Deletion and Insertion in Video
Removing or inserting frames breaks the temporal continuity of inter-frame compression and leaves measurable artifacts in motion vectors, GOP structure, and container timestamps. This topic explains how forensic examiners parse these artifacts to identify and localise tampering in digital video files.
Last updated:
Frame deletion and insertion detection is the forensic discipline of identifying whether individual frames have been removed from or added to a digital video file to alter what the recording shows. Modern video codecs such as H.264 and H.265 do not store every frame as a complete image. They store most frames as compressed differences from a reference frame, so removing or inserting a single frame breaks the reference chain and leaves measurable artifacts. Examiners look for these artifacts in three places: the container's timestamp sequence, the codec's Group of Pictures (GOP) structure, and the inter-frame residuals produced when predicted and actual frame data no longer agree.
The practical motivation for this analysis is significant. Video evidence from surveillance cameras, body-worn cameras, and mobile devices is routinely submitted in criminal and civil proceedings. A recording that has had frames removed to conceal an event, or frames inserted to fabricate one, can change the outcome of a case. Courts in multiple jurisdictions now expect authentication evidence: UK guidance under the College of Policing Digital Media Advisor framework, US Federal Rules of Evidence Rule 901(b)(9), the Bharatiya Sakshya Adhiniyam 2023 in India, and the EU's electronic evidence frameworks all require that digital evidence be shown to be what it purports to be before it is given weight.
Frame-level tampering detection sits within the broader discipline of video authentication, alongside double-compression analysis, source-camera matching, and metadata verification. It is distinct from enhancement work, which is concerned with improving visibility, not with establishing integrity. An examiner asked to authenticate a video must document the baseline codec parameters, extract and plot the GOP structure, compute inter-frame differences, and compare measured values against the statistical baseline for unmodified recordings from that camera or codec. The output is a reproducible, documented finding that can be cross-examined.
By the end of this topic you will be able to:
- Explain how inter-frame compression makes frame deletion and insertion detectable at the codec level.
- Describe how to extract and interpret GOP structure and identify anomalous GOP lengths that indicate tampering.
- Use container timestamp analysis to locate gaps or duplications consistent with frame deletion or insertion.
- Compute inter-frame residuals and explain why elevated residuals at a specific frame boundary are forensically significant.
- Explain what re-encoding does to frame-tampering artifacts and how double-compression analysis compensates.
- Group of Pictures (GOP)
- The repeating unit of inter-frame compressed video. A GOP begins with an I-frame (a complete image) followed by a fixed or near-fixed number of P-frames and B-frames. Tampering changes the frame count within at least one GOP, producing an anomalous GOP length.
- I-frame (Intra-coded frame)
- A self-contained frame that stores a complete image, independent of other frames. It is used as a reference for the P-frames and B-frames that follow it within the same GOP.
- P-frame (Predictive frame)
- A frame encoded as the difference from the preceding reference frame. Deleting any frame that a P-frame references produces large, anomalous residuals in that P-frame.
- Presentation Timestamp (PTS)
- A value stored in the container for each packet indicating when that frame should be displayed. In an unmodified recording, PTS values increase monotonically at the camera's frame rate. A deleted frame leaves a gap; an inserted frame disrupts the expected interval.
- Inter-frame residual
- The pixel-level difference between a predicted frame and the actual frame after decoding. In unmodified video, residuals at scene cuts are elevated but follow a characteristic pattern. At a deletion or insertion point, residuals are elevated in a spatially incoherent way that differs from a natural scene cut.
- Motion vector consistency
- In unmodified video, motion vectors across consecutive frames track the same objects and follow smooth trajectories. A deleted or inserted frame breaks the motion field: vectors jump discontinuously in direction or magnitude at the tamper boundary.
How inter-frame compression creates a forensic record
Raw video from a camera sensor produces a new complete image for every frame. At 1080p and 30 frames per second, an hour of uncompressed video requires roughly 560 GB. Practical storage and transmission requires compression. Codecs such as H.264 (AVC) and H.265 (HEVC) reduce this by storing only the first frame of each GOP as a complete image (the I-frame) and representing every subsequent frame as the difference from its reference frame.
A P-frame contains motion vectors describing how blocks of pixels moved from the reference frame, and a residual describing any prediction error. A B-frame uses two references: a frame before it and a frame after it. When the codec decodes a P-frame or B-frame, it reconstructs the image by applying the motion vectors to the reference frame and adding the residual. If the reference frame is missing because it was deleted, the decoder either fails or produces visually corrupted output, and the stored residual values become forensically anomalous.
This dependency structure is what makes frame deletion traceable. The tamperer cannot simply remove a frame from the bitstream without affecting all frames that depend on it in the same GOP. They must either accept the resulting artifacts, re-encode the clip (which leaves double-compression evidence), or carefully reconstruct the codec's state after the deletion point, which requires specialised tools and leaves its own artifacts.
GOP structure analysis
The first step in a frame-tampering examination is to extract the frame type sequence from the bitstream and compute GOP lengths throughout the recording. Tools such as FFprobe, MediaInfo, and ExifTool can output frame type and packet timing data. For a camera recording at a fixed GOP length (a common setting in CCTV encoders), every GOP in an unmodified file has the same length. A single anomalous GOP that is shorter or longer than the rest is the primary indicator of frame deletion or insertion at that point.
| Observation | Likely cause | Follow-up analysis |
|---|---|---|
| One GOP shorter than baseline | Frame(s) deleted within that GOP | Check PTS gap at same location; compute inter-frame residuals across the boundary |
| One GOP longer than baseline | Frame(s) inserted within that GOP, or I-frame suppressed | Check PTS for duplicate or out-of-order values; inspect motion vectors |
| Multiple successive anomalous GOPs | Segment substitution or re-encoding of a clip segment | Double-compression analysis on the anomalous segment |
| I-frame appearing outside expected position | GOP header rewritten after splice | Compare container-level I-frame positions against bitstream-level frame types |
| All GOPs normal, but PTS gap exists | Frame deleted between GOPs at an I-frame boundary | Timestamp delta analysis; check for missing I-frame at expected position |
Examiners document the expected GOP length from the codec parameters stored in the file header or from analysis of the first 60 seconds of the recording. They then plot GOP lengths across the entire file and flag any that deviate by more than one frame. The position of the anomalous GOP, expressed as a byte offset and a timecode, is recorded in the examination report.
Container timestamp analysis
Container formats such as MP4, MKV, and AVI store a presentation timestamp (PTS) and a decoding timestamp (DTS) for each packet. In a camera recording at 30 frames per second with a 90 kHz time base, successive PTS values should differ by exactly 3000 (90000 divided by 30). Any deviation from this interval indicates that the file has been altered or that the camera had a fault.
Frame deletion leaves a gap: instead of PTS values 0, 3000, 6000, 9000, the sequence might read 0, 3000, 9000, 12000, showing that the frame at PTS 6000 is missing. Frame insertion can produce duplicated PTS values or a compressed interval where two frames occupy the space that one would normally occupy. FFprobe with the -show_packets flag outputs PTS and DTS for every packet. A simple script computing the first difference of PTS values and flagging deviations from the expected interval localises tampering to the exact packet position.
A sophisticated tamperer may rebuild the timestamp sequence after deletion to eliminate the gap. In that case, PTS values look continuous, but the inter-frame content is now inconsistent with the stated frame rate: the video appears to skip or jump at the tamper point. Comparing the measured motion between frames against the motion expected at the stated frame rate can detect this. A deleted second of 30 fps content replaced with re-timed timestamps will have the visual motion of skipped frames, which diverges from the motion field measured elsewhere in the recording.
Inter-frame residual and motion vector analysis
Inter-frame residuals measure how well each P-frame or B-frame was predicted from its reference. In normal video, residuals are highest at natural scene cuts and lowest during slow, predictable motion. The spatial pattern of high residuals at a scene cut is coherent: the entire frame is unpredictable because the reference frame shows a different scene entirely. At a deletion point, the residual pattern is incoherent: the prediction partially succeeds for some regions and fails sharply for others, because the deleted frame was part of a continuous scene, not a cut.
Motion vector consistency analysis looks at the direction and magnitude of motion vectors across consecutive frames. In unmodified video, an object moving across the frame produces a sequence of vectors that trace a smooth path. Deleting a frame creates a discontinuity: the motion vector at the tamper boundary is twice or three times the magnitude expected from the surrounding frames, or changes direction abruptly. This is distinct from natural fast motion because fast motion produces large vectors consistently across several frames, not a single anomalous jump surrounded by normal values.
When the video has been re-encoded
Re-encoding is the tamperer's most effective countermeasure. Transcoding the edited clip through a new encoding pass replaces the original bitstream with a fresh one, eliminating broken motion-prediction chains, rebuilding GOP structure, and resetting timestamps. A re-encoded clip can pass basic GOP and timestamp checks.
Re-encoding does not, however, eliminate all evidence. Double-compression analysis, covered in detail in the Video Double-Compression Analysis topic, exploits the statistical signature left in the DCT coefficient distribution. When a clip is encoded twice, the quantisation grid of the first encode imposes structure on the coefficients that survives the second encode. Histograms of DCT coefficients in a double-compressed clip show characteristic dips at quantisation step boundaries that are absent in single-encoded material.
If the tamperer re-encodes only a segment of the clip and rejoins it to the original, the splice point between single-encoded and double-encoded segments becomes detectable. The statistical signature of the DCT coefficients changes at the join point. Segment-level analysis, computing the DCT histogram for each GOP or sliding window rather than for the whole file, localises the re-encoded segment and therefore the region of potential tampering.
Chain of custody, tools, and court presentation
Before any analysis, the examiner acquires a forensic copy of the original file, records the SHA-256 hash, and stores the original in write-protected form. All analysis is conducted on the working copy. This discipline applies globally: the UK College of Policing Digital Media Advisor framework, the US SWGDE guidelines, and the Bharatiya Sakshya Adhiniyam 2023 in India all require that the integrity of the original exhibit be demonstrable. Working from the original without a hash-verified copy invalidates the chain of custody. See Chain of Custody for Digital Media for the procedural requirements.
Common tools used in frame-tampering analysis include FFprobe for packet-level timestamp and frame-type extraction, ExifTool for container metadata, MediaInfo for codec parameter verification, and custom Python scripts using OpenCV or the FFmpeg API for inter-frame difference computation. Commercial tools such as Cognitech VideoInvestigator and Amped FIVE include automated GOP and timestamp analysis modules. The examiner must document which tool version was used, because tool updates can affect output format.
Examination reports for court presentation must be reproducible: any competent examiner using the same tools on the same file should arrive at the same measurements. Reports state the baseline GOP length, the measured GOP lengths at each anomalous position, the expected and actual PTS delta at each flagged location, and any inter-frame residual values above the threshold established from the unmodified portions of the file. Conclusions are expressed as findings (what was measured) and interpretations (what those measurements indicate), kept as separate sections to aid cross-examination.
In an H.264 video with a fixed GOP length of 30 frames, an examiner finds GOP number 15 contains only 22 frames while all others contain 30. What does this most directly indicate?
Key Takeaways
- Inter-frame compression makes frame deletion and insertion traceable: removing a frame breaks the motion-prediction reference chain, producing anomalous residuals, disrupted GOP lengths, and timestamp gaps that can be independently measured.
- GOP structure analysis is the primary detection step: extract frame types from the bitstream, compute GOP lengths, and flag any GOP that deviates from the baseline established by the rest of the recording.
- Container timestamp analysis provides a second independent indicator: compute first differences of PTS values and identify gaps or duplications at the expected frame-interval resolution.
- Inter-frame residuals and motion vector discontinuities at the same location as GOP and timestamp anomalies strengthen the finding; a spatially incoherent residual spike differs from the coherent pattern of a natural scene cut.
- Re-encoding can eliminate some direct artifacts but introduces double-compression evidence in the DCT coefficient distribution; segment-level analysis can localise which portion of a clip was re-encoded and therefore suspect.
Why does deleting a video frame produce detectable artifacts?
What is a GOP and why does it matter for tampering detection?
Can frame-level tampering be detected even when the video is re-encoded?
What role do container timestamps play in frame-tampering analysis?
How is frame-tampering evidence presented in court?
Test yourself on Multimedia Authentication and Deepfake Forensics with free, timed mocks.
Practice Multimedia Authentication and Deepfake Forensics questionsSpotted an error in this page? Report a correction or read our editorial standards.