Chapter 06· 3 min read

Video Analysis

Reading as a guest

You'll lose your reading position and notes if you leave without an account.

CCTV is now the most prolific evidence source in urban crime investigation. Modern forensic video analysis spans frame extraction from compressed codecs, photogrammetric measurement, camera identification via PRNU sensor noise, authentication against tampering, and increasingly the detection of synthetic / deepfake video.

6.1H.264 / H.265 Frame Types

Modern CCTV uses H.264 (AVC) or H.265 (HEVC) compression with three frame types:

I-frames (Intra-coded) — full image data, compressed independently. Largest size, highest quality per frame. Appear at GOP boundaries (every 0.5–2 sec).
P-frames (Predicted) — predicted from previous I or P. Motion vectors + residual. Smaller, lower quality.
B-frames (Bi-directional) — predicted from past + future. Smallest, lowest quality.

For forensic analysis, extract I-frames wherever possible. Export in lossless format (PNG, TIFF), never JPEG re-encoding.

6.2Frame-Rate Verification

DVR / NVR frame rates often deviate from nominal due to CPU load, storage bottlenecks, or motion-detection-driven variable rate. Verify the actual frame rate by counting frames during a known-duration event (a clock visible in frame, a passing pedestrian's footstep cadence, a synced reference clock placed in view at seizure time).

6.3DVR Clock Skew

Most consumer DVRs lack NTP synchronisation; system clocks drift. Defensible procedure at seizure:

Photograph the DVR display showing system time alongside a GPS-synchronised reference (e.g., the IO's mobile phone)
Document the offset in the chain-of-custody log: e.g., "DVR system time +47 minutes vs GPS reference"
Never alter the DVR clock — that breaks chain of custody
Apply the offset in all reporting: actual event time = displayed time − 47 minutes

6.4Photogrammetric Height Estimation

Height from Image (Pinhole Projection)

H_actual = h_image × distance / focal_length

With scene reference markers + lens calibration; typical 95% CI ±3–7 cm for a well-calibrated single-camera setup.

6.5PRNU — Camera Sensor Fingerprinting

Photo-Response Non-Uniformity (PRNU) is the sensor-level fingerprint of a camera. Each CMOS / CCD pixel responds slightly differently to light because of microscopic manufacturing variations (silicon doping, thin-film thickness, gain calibration). The variation manifests as a multiplicative noise pattern specific to that sensor — a fingerprint that differs even between cameras of the same model.

Cross-correlation of an unknown video's noise residual with a suspect-camera reference PRNU gives a similarity score. High correlation is consistent with the unknown video having been made on the suspect camera.

6.6Lens Distortion

Wide-angle CCTV lenses introduce barrel distortion — straight lines bow outward, with stronger bending toward the frame edges. At frame edges, distortion can shift positions by 5–20% of the frame width — metres of error in real-world coordinates.

Calibrate with a known target (checkerboard, dot grid) at multiple poses; fit a Brown-Conrady or Zhang model; undistort each frame before geometric measurement.

6.7Super-Resolution — Legitimate vs Hallucination

Legitimate techniques:

Multi-frame SR (combining adjacent frames where motion provides sub-pixel shifts)
Lens-distortion correction
Contrast / sharpness enhancement within the limits of the source signal
Model-based deconvolution with known motion-blur or lens PSF

Not legitimate without disclosure: AI hallucination — modern deep-learning SR (ESRGAN, Real-ESRGAN) trained on large datasets invents plausible-looking content. The output looks high-resolution but the new pixels are invented, not recovered. The court must be told if AI methods were used.

6.8Deepfake Detection (Multi-Modal)

Pixel artefacts — boundary inconsistencies between face and background, blur halos, colour-temperature mismatch
Temporal inconsistencies — eye-blink rate, micro-expression flow, head-pose dynamics
Physiological signals — rPPG (remote photoplethysmography) extracts cardiac-pulse colour fluctuations from facial regions; deepfakes typically lack a coherent rPPG signal
Compression-domain artefacts — double-compression signatures in DCT histograms
Lighting consistency — shadow direction, colour temperature, ambient bounce
ML classifiers — neural networks trained on real-vs-deepfake pairs (DFDC, FaceForensics++, Celeb-DF)

Detection accuracy on contemporaneous deepfakes is ~95%+; degrades on novel methods (open-set generalisation gap).

Memory hooks · Chapter 6

I-frame = full image, best quality. Extract I-frames for analysis. DVR clock: photograph + reference clock at seizure; document offset; never reset. Photogrammetric height: H = h × d / f. PRNU: sensor noise fingerprint = camera-instance identifier. Lens distortion: wide-angle = barrel; calibrate with checkerboard. Super-resolution: multi-frame legitimate; AI hallucination requires disclosure. Deepfake detection: pixel + temporal + rPPG + compression + ML.

Don't lose your place

Save this chapter and the rest of Forensic Physics.

A free ForensicSpot account remembers which chapters you've read, lets you highlight passages, take notes and resume from any device.

Create free account I already have an account