Chapter 06· 3 min read
Video Analysis
Reading as a guest
Sign up free to save your progress, highlight passages, and pick up where you left off.
You'll lose your reading position and notes if you leave without an account.
CCTV is now the most prolific evidence source in urban crime investigation. Modern forensic video analysis spans frame extraction from compressed codecs, photogrammetric measurement, camera identification via PRNU sensor noise, authentication against tampering, and increasingly the detection of synthetic / deepfake video.
6.1H.264 / H.265 Frame Types
Modern CCTV uses H.264 (AVC) or H.265 (HEVC) compression with three frame types:
- I-frames (Intra-coded) — full image data, compressed independently. Largest size, highest quality per frame. Appear at GOP boundaries (every 0.5–2 sec).
- P-frames (Predicted) — predicted from previous I or P. Motion vectors + residual. Smaller, lower quality.
- B-frames (Bi-directional) — predicted from past + future. Smallest, lowest quality.
For forensic analysis, extract I-frames wherever possible. Export in lossless format (PNG, TIFF), never JPEG re-encoding.
6.2Frame-Rate Verification
DVR / NVR frame rates often deviate from nominal due to CPU load, storage bottlenecks, or motion-detection-driven variable rate. Verify the actual frame rate by counting frames during a known-duration event (a clock visible in frame, a passing pedestrian's footstep cadence, a synced reference clock placed in view at seizure time).
6.3DVR Clock Skew
Most consumer DVRs lack NTP synchronisation; system clocks drift. Defensible procedure at seizure:
- Photograph the DVR display showing system time alongside a GPS-synchronised reference (e.g., the IO's mobile phone)
- Document the offset in the chain-of-custody log: e.g., "DVR system time +47 minutes vs GPS reference"
- Never alter the DVR clock — that breaks chain of custody
- Apply the offset in all reporting: actual event time = displayed time − 47 minutes
6.4Photogrammetric Height Estimation
6.5PRNU — Camera Sensor Fingerprinting
Photo-Response Non-Uniformity (PRNU) is the sensor-level fingerprint of a camera. Each CMOS / CCD pixel responds slightly differently to light because of microscopic manufacturing variations (silicon doping, thin-film thickness, gain calibration). The variation manifests as a multiplicative noise pattern specific to that sensor — a fingerprint that differs even between cameras of the same model.
Cross-correlation of an unknown video's noise residual with a suspect-camera reference PRNU gives a similarity score. High correlation is consistent with the unknown video having been made on the suspect camera.
6.6Lens Distortion
Wide-angle CCTV lenses introduce barrel distortion — straight lines bow outward, with stronger bending toward the frame edges. At frame edges, distortion can shift positions by 5–20% of the frame width — metres of error in real-world coordinates.
Calibrate with a known target (checkerboard, dot grid) at multiple poses; fit a Brown-Conrady or Zhang model; undistort each frame before geometric measurement.
6.7Super-Resolution — Legitimate vs Hallucination
Legitimate techniques:
- Multi-frame SR (combining adjacent frames where motion provides sub-pixel shifts)
- Lens-distortion correction
- Contrast / sharpness enhancement within the limits of the source signal
- Model-based deconvolution with known motion-blur or lens PSF
Not legitimate without disclosure: AI hallucination — modern deep-learning SR (ESRGAN, Real-ESRGAN) trained on large datasets invents plausible-looking content. The output looks high-resolution but the new pixels are invented, not recovered. The court must be told if AI methods were used.
6.8Deepfake Detection (Multi-Modal)
- Pixel artefacts — boundary inconsistencies between face and background, blur halos, colour-temperature mismatch
- Temporal inconsistencies — eye-blink rate, micro-expression flow, head-pose dynamics
- Physiological signals — rPPG (remote photoplethysmography) extracts cardiac-pulse colour fluctuations from facial regions; deepfakes typically lack a coherent rPPG signal
- Compression-domain artefacts — double-compression signatures in DCT histograms
- Lighting consistency — shadow direction, colour temperature, ambient bounce
- ML classifiers — neural networks trained on real-vs-deepfake pairs (DFDC, FaceForensics++, Celeb-DF)
Detection accuracy on contemporaneous deepfakes is ~95%+; degrades on novel methods (open-set generalisation gap).
I-frame = full image, best quality. Extract I-frames for analysis. DVR clock: photograph + reference clock at seizure; document offset; never reset. Photogrammetric height: H = h × d / f. PRNU: sensor noise fingerprint = camera-instance identifier. Lens distortion: wide-angle = barrel; calibrate with checkerboard. Super-resolution: multi-frame legitimate; AI hallucination requires disclosure. Deepfake detection: pixel + temporal + rPPG + compression + ML.
Don't lose your place
Save this chapter and the rest of Forensic Physics.
A free ForensicSpot account remembers which chapters you've read, lets you highlight passages, take notes and resume from any device.