Detecting Frame Deletion and Insertion in Video

Removing or inserting frames breaks the temporal continuity of inter-frame compression and leaves measurable artifacts in motion vectors, GOP structure, and container timestamps. This topic explains how forensic examiners parse these artifacts to identify and localise tampering in digital video files.

Last updated: 24 Jun 2026

Frame deletion and insertion detection is the forensic discipline of identifying whether individual frames have been removed from or added to a digital video file to alter what the recording shows. Modern video codecs such as H.264 and H.265 do not store every frame as a complete image. They store most frames as compressed differences from a reference frame, so removing or inserting a single frame breaks the reference chain and leaves measurable artifacts. Examiners look for these artifacts in three places: the container's timestamp sequence, the codec's Group of Pictures (GOP) structure, and the inter-frame residuals produced when predicted and actual frame data no longer agree.

The practical motivation for this analysis is significant. Video evidence from surveillance cameras, body-worn cameras, and mobile devices is routinely submitted in criminal and civil proceedings. A recording that has had frames removed to conceal an event, or frames inserted to fabricate one, can change the outcome of a case. Courts in multiple jurisdictions now expect authentication evidence: UK guidance under the College of Policing Digital Media Advisor framework, US Federal Rules of Evidence Rule 901(b)(9), the Bharatiya Sakshya Adhiniyam 2023 in India, and the EU's electronic evidence frameworks all require that digital evidence be shown to be what it purports to be before it is given weight.

Frame-level tampering detection sits within the broader discipline of video authentication, alongside double-compression analysis, source-camera matching, and metadata verification. It is distinct from enhancement work, which is concerned with improving visibility, not with establishing integrity. An examiner asked to authenticate a video must document the baseline codec parameters, extract and plot the GOP structure, compute inter-frame differences, and compare measured values against the statistical baseline for unmodified recordings from that camera or codec. The output is a reproducible, documented finding that can be cross-examined.

By the end of this topic you will be able to:

Explain how inter-frame compression makes frame deletion and insertion detectable at the codec level.
Describe how to extract and interpret GOP structure and identify anomalous GOP lengths that indicate tampering.
Use container timestamp analysis to locate gaps or duplications consistent with frame deletion or insertion.
Compute inter-frame residuals and explain why elevated residuals at a specific frame boundary are forensically significant.
Explain what re-encoding does to frame-tampering artifacts and how double-compression analysis compensates.

Key terms

Group of Pictures (GOP): The repeating unit of inter-frame compressed video. A GOP begins with an I-frame (a complete image) followed by a fixed or near-fixed number of P-frames and B-frames. Tampering changes the frame count within at least one GOP, producing an anomalous GOP length.
I-frame (Intra-coded frame): A self-contained frame that stores a complete image, independent of other frames. It is used as a reference for the P-frames and B-frames that follow it within the same GOP.
P-frame (Predictive frame): A frame encoded as the difference from the preceding reference frame. Deleting any frame that a P-frame references produces large, anomalous residuals in that P-frame.
Presentation Timestamp (PTS): A value stored in the container for each packet indicating when that frame should be displayed. In an unmodified recording, PTS values increase monotonically at the camera's frame rate. A deleted frame leaves a gap; an inserted frame disrupts the expected interval.
Inter-frame residual: The pixel-level difference between a predicted frame and the actual frame after decoding. In unmodified video, residuals at scene cuts are elevated but follow a characteristic pattern. At a deletion or insertion point, residuals are elevated in a spatially incoherent way that differs from a natural scene cut.
Motion vector consistency: In unmodified video, motion vectors across consecutive frames track the same objects and follow smooth trajectories. A deleted or inserted frame breaks the motion field: vectors jump discontinuously in direction or magnitude at the tamper boundary.

How inter-frame compression creates a forensic record

Raw video from a camera sensor produces a new complete image for every frame. At 1080p and 30 frames per second, an hour of uncompressed video requires roughly 560 GB. Practical storage and transmission requires compression. Codecs such as H.264 (AVC) and H.265 (HEVC) reduce this by storing only the first frame of each GOP as a complete image (the I-frame) and representing every subsequent frame as the difference from its reference frame.

A P-frame contains motion vectors describing how blocks of pixels moved from the reference frame, and a residual describing any prediction error. A B-frame uses two references: a frame before it and a frame after it. When the codec decodes a P-frame or B-frame, it reconstructs the image by applying the motion vectors to the reference frame and adding the residual. If the reference frame is missing because it was deleted, the decoder either fails or produces visually corrupted output, and the stored residual values become forensically anomalous.

This dependency structure is what makes frame deletion traceable. The tamperer cannot simply remove a frame from the bitstream without affecting all frames that depend on it in the same GOP. They must either accept the resulting artifacts, re-encode the clip (which leaves double-compression evidence), or carefully reconstruct the codec's state after the deletion point, which requires specialised tools and leaves its own artifacts.

GOP structure analysis

The first step in a frame-tampering examination is to extract the frame type sequence from the bitstream and compute GOP lengths throughout the recording. Tools such as FFprobe, MediaInfo, and ExifTool can output frame type and packet timing data. For a camera recording at a fixed GOP length (a common setting in CCTV encoders), every GOP in an unmodified file has the same length. A single anomalous GOP that is shorter or longer than the rest is the primary indicator of frame deletion or insertion at that point.

Observation	Likely cause	Follow-up analysis
One GOP shorter than baseline	Frame(s) deleted within that GOP	Check PTS gap at same location; compute inter-frame residuals across the boundary
One GOP longer than baseline	Frame(s) inserted within that GOP, or I-frame suppressed	Check PTS for duplicate or out-of-order values; inspect motion vectors
Multiple successive anomalous GOPs	Segment substitution or re-encoding of a clip segment	Double-compression analysis on the anomalous segment
I-frame appearing outside expected position	GOP header rewritten after splice	Compare container-level I-frame positions against bitstream-level frame types
All GOPs normal, but PTS gap exists	Frame deleted between GOPs at an I-frame boundary	Timestamp delta analysis; check for missing I-frame at expected position

Examiners document the expected GOP length from the codec parameters stored in the file header or from analysis of the first 60 seconds of the recording. They then plot GOP lengths across the entire file and flag any that deviate by more than one frame. The position of the anomalous GOP, expressed as a byte offset and a timecode, is recorded in the examination report.

Container timestamp analysis

Container formats such as MP4, MKV, and AVI store a presentation timestamp (PTS) and a decoding timestamp (DTS) for each packet. In a camera recording at 30 frames per second with a 90 kHz time base, successive PTS values should differ by exactly 3000 (90000 divided by 30). Any deviation from this interval indicates that the file has been altered or that the camera had a fault.

Frame deletion leaves a gap: instead of PTS values 0, 3000, 6000, 9000, the sequence might read 0, 3000, 9000, 12000, showing that the frame at PTS 6000 is missing. Frame insertion can produce duplicated PTS values or a compressed interval where two frames occupy the space that one would normally occupy. FFprobe with the -show_packets flag outputs PTS and DTS for every packet. A simple script computing the first difference of PTS values and flagging deviations from the expected interval localises tampering to the exact packet position.

A sophisticated tamperer may rebuild the timestamp sequence after deletion to eliminate the gap. In that case, PTS values look continuous, but the inter-frame content is now inconsistent with the stated frame rate: the video appears to skip or jump at the tamper point. Comparing the measured motion between frames against the motion expected at the stated frame rate can detect this. A deleted second of 30 fps content replaced with re-timed timestamps will have the visual motion of skipped frames, which diverges from the motion field measured elsewhere in the recording.

Inter-frame residual and motion vector analysis

Inter-frame residuals measure how well each P-frame or B-frame was predicted from its reference. In normal video, residuals are highest at natural scene cuts and lowest during slow, predictable motion. The spatial pattern of high residuals at a scene cut is coherent: the entire frame is unpredictable because the reference frame shows a different scene entirely. At a deletion point, the residual pattern is incoherent: the prediction partially succeeds for some regions and fails sharply for others, because the deleted frame was part of a continuous scene, not a cut.

Motion vector consistency analysis looks at the direction and magnitude of motion vectors across consecutive frames. In unmodified video, an object moving across the frame produces a sequence of vectors that trace a smooth path. Deleting a frame creates a discontinuity: the motion vector at the tamper boundary is twice or three times the magnitude expected from the surrounding frames, or changes direction abruptly. This is distinct from natural fast motion because fast motion produces large vectors consistently across several frames, not a single anomalous jump surrounded by normal values.

When the video has been re-encoded

Re-encoding is the tamperer's most effective countermeasure. Transcoding the edited clip through a new encoding pass replaces the original bitstream with a fresh one, eliminating broken motion-prediction chains, rebuilding GOP structure, and resetting timestamps. A re-encoded clip can pass basic GOP and timestamp checks.

Re-encoding does not, however, eliminate all evidence. Double-compression analysis, covered in detail in the Video Double-Compression Analysis topic, exploits the statistical signature left in the DCT coefficient distribution. When a clip is encoded twice, the quantisation grid of the first encode imposes structure on the coefficients that survives the second encode. Histograms of DCT coefficients in a double-compressed clip show characteristic dips at quantisation step boundaries that are absent in single-encoded material.

If the tamperer re-encodes only a segment of the clip and rejoins it to the original, the splice point between single-encoded and double-encoded segments becomes detectable. The statistical signature of the DCT coefficients changes at the join point. Segment-level analysis, computing the DCT histogram for each GOP or sliding window rather than for the whole file, localises the re-encoded segment and therefore the region of potential tampering.

Chain of custody, tools, and court presentation

Before any analysis, the examiner acquires a forensic copy of the original file, records the SHA-256 hash, and stores the original in write-protected form. All analysis is conducted on the working copy. This discipline applies globally: the UK College of Policing Digital Media Advisor framework, the US SWGDE guidelines, and the Bharatiya Sakshya Adhiniyam 2023 in India all require that the integrity of the original exhibit be demonstrable. Working from the original without a hash-verified copy invalidates the chain of custody. See Chain of Custody for Digital Media for the procedural requirements.

Common tools used in frame-tampering analysis include FFprobe for packet-level timestamp and frame-type extraction, ExifTool for container metadata, MediaInfo for codec parameter verification, and custom Python scripts using OpenCV or the FFmpeg API for inter-frame difference computation. Commercial tools such as Cognitech VideoInvestigator and Amped FIVE include automated GOP and timestamp analysis modules. The examiner must document which tool version was used, because tool updates can affect output format.

Examination reports for court presentation must be reproducible: any competent examiner using the same tools on the same file should arrive at the same measurements. Reports state the baseline GOP length, the measured GOP lengths at each anomalous position, the expected and actual PTS delta at each flagged location, and any inter-frame residual values above the threshold established from the unmodified portions of the file. Conclusions are expressed as findings (what was measured) and interpretations (what those measurements indicate), kept as separate sections to aid cross-examination.

Worked example

Analysing a CCTV clip submitted as evidence

A 90-second CCTV clip is submitted in a criminal case. Defence counsel alleges that 10 seconds of footage was deleted. The following is the step-by-step examination.

The file is an MP4 container with an H.264 video track at 25 frames per second, GOP length 50 frames (2 seconds per GOP) as stated in the sequence parameter set. Total frames expected for 90 seconds: 2250. The examination proceeds as follows.

Hash acquisition. SHA-256 hash of the original file is recorded. A working copy is made. The working copy hash is verified to match.
Packet extraction. FFprobe is run with -show_packets -select_streams v. Output lists all 2000 packets with PTS, DTS, and size. Total frame count: 2000, not 2250. This confirms that approximately 250 frames (10 seconds at 25 fps) are missing.
GOP structure extraction. FFprobe with -show_frames lists frame type (I, P, B) for each frame. GOPs 1 to 19 each contain exactly 50 frames. GOP 20 contains 38 frames. GOPs 21 onward return to 50 frames. The anomalous GOP 20 begins at timecode 00:00:38.00 and ends at 00:00:39.52 instead of the expected 00:00:40.00.
Timestamp analysis. PTS values are extracted and first differences computed. All differences are 3600 (90000 time-base units divided by 25 fps) except at position 950, where the difference jumps to 27600 (equivalent to 12 missing frames at 25 fps). The gap occurs at timecode 00:00:38.00, coinciding with the start of the anomalous GOP.
Inter-frame residual analysis. Mean absolute difference between successive decoded frames is computed for the full clip. Values range from 0.3 to 4.1 across normal frames and natural scene variations. At frame 950, the mean absolute difference is 18.7, substantially above the range for scene cuts elsewhere in the clip (maximum 7.2 at the one genuine scene cut at timecode 00:00:15.00). The spatial distribution of the residual at frame 950 is incoherent: high-residual regions are scattered across the frame rather than concentrated at a new scene's dominant features.
Conclusion. The file shows an anomalous GOP at timecode 00:00:38.00 to 00:00:39.52 that is 12 frames shorter than all other GOPs in the recording, a PTS discontinuity of 12 frame intervals at the same location, and an inter-frame residual spike at the same boundary that is inconsistent with a natural scene cut. These three independent findings, converging on the same timecode, are consistent with the deletion of approximately 12 frames (0.48 seconds) from the video at that point. The file does not show double-compression artifacts, indicating the deletion was made without re-encoding.

Check your understanding

Question 1 of 4· 0 answered

In an H.264 video with a fixed GOP length of 30 frames, an examiner finds GOP number 15 contains only 22 frames while all others contain 30. What does this most directly indicate?

Key Takeaways

Inter-frame compression makes frame deletion and insertion traceable: removing a frame breaks the motion-prediction reference chain, producing anomalous residuals, disrupted GOP lengths, and timestamp gaps that can be independently measured.
GOP structure analysis is the primary detection step: extract frame types from the bitstream, compute GOP lengths, and flag any GOP that deviates from the baseline established by the rest of the recording.
Container timestamp analysis provides a second independent indicator: compute first differences of PTS values and identify gaps or duplications at the expected frame-interval resolution.
Inter-frame residuals and motion vector discontinuities at the same location as GOP and timestamp anomalies strengthen the finding; a spatially incoherent residual spike differs from the coherent pattern of a natural scene cut.
Re-encoding can eliminate some direct artifacts but introduces double-compression evidence in the DCT coefficient distribution; segment-level analysis can localise which portion of a clip was re-encoded and therefore suspect.

Why does deleting a video frame produce detectable artifacts?

Modern video codecs compress frames by storing only the differences from a reference frame rather than full images. When a frame is deleted, the reference chain breaks: the next frame's predicted motion vectors and residuals no longer match what the codec expects, producing anomalous residuals, irregular GOP lengths, and discontinuous presentation timestamps that examiners can measure.

What is a GOP and why does it matter for tampering detection?

A Group of Pictures (GOP) is the repeating sequence of I-frames, P-frames, and B-frames that a codec uses to organise inter-frame compression. Each GOP normally has a fixed or near-fixed length. Frame deletion or insertion changes the number of frames in at least one GOP, producing an anomalous GOP length that stands out against the baseline pattern of the rest of the recording.

Can frame-level tampering be detected even when the video is re-encoded?

Re-encoding destroys some direct artifacts but creates a different signature: double-compression artifacts. The statistical distribution of DCT coefficients in a re-encoded clip differs from a single-encode baseline. Tools that analyse quantisation residuals and coefficient histograms can identify re-compression even when the container timestamps have been rebuilt.

What role do container timestamps play in frame-tampering analysis?

Container formats such as MP4 and MKV store a presentation timestamp (PTS) and a decoding timestamp (DTS) for each packet. In an unmodified recording these increase monotonically at the camera's frame rate. A deleted frame leaves a gap in the timestamp sequence; an inserted frame can duplicate or disrupt the expected interval. Timestamp analysis is therefore one of the first steps in a video integrity examination.

How is frame-tampering evidence presented in court?

Examiners document the methodology, tools used, hash values of the original and working copies, and the specific anomalies found. Reports reference measurable, reproducible findings such as anomalous GOP lengths at specific byte offsets or timestamp discontinuities at specific timecodes. Under frameworks such as the US Federal Rules of Evidence, UK CCTV guidance, or the Bharatiya Sakshya Adhiniyam 2023, the examiner must be able to demonstrate that findings are reproducible and that the chain of custody was maintained.

Test yourself on Multimedia Authentication and Deepfake Forensics with free, timed mocks.

Practice Multimedia Authentication and Deepfake Forensics questions

Found this useful? Pass it along.

Spotted an error in this page? Report a correction or read our editorial standards.

Key Takeaways

Your journey to becoming a forensic professional starts here.