Practice with mock tests, learn from structured notes, and get your questions answered by a global forensic community, all in one place.
Copy-move and splicing are the two most common structural forgeries in digital photographs. This topic covers the block-matching and keypoint-based algorithms for detecting duplicated regions, illumination and chromatic aberration inconsistency for detecting spliced content, and the deep-learning detectors that now supplement classical methods.
Last updated:
Copy-move and splicing are the two manipulation classes that forensic image analysts encounter most often. Copy-move steals from within: a region of the image is duplicated and pasted elsewhere in the same frame, hiding an object or padding a background. Splicing steals from outside: content from a separate photograph is imported and blended in. The two classes require different detection strategies, but both leave structural traces that careful analysis can uncover.
The detection challenge is substantial. Copy-move forgeries are hard precisely because the pasted region came from the same photographic event and therefore has the same noise level, the same colour temperature, and the same compression history as the surrounding image. Splicing is in principle easier because the inserted content comes from a different source and carries foreign statistics, but a skilled forger can match brightness, colour, and noise to make the join invisible.
This topic covers the two main algorithmic traditions for copy-move detection (block-matching and keypoint-based SIFT/SURF), the geometric tools for handling rotated or scaled duplicates, and three passive methods for splicing detection: illumination-direction inconsistency, chromatic aberration inconsistency, and noise-level inconsistency. It closes with deep-learning detectors (Noiseprint, MantraNet) and an honest account of where they succeed and where they fail.
Two identical patches in different parts of the image cannot both be original.
The foundational insight is simple: if a region was copied from one part of an image and pasted elsewhere, both the source region and the destination region now exist in the image simultaneously. They are near-identical patches in different locations. Exhaustive block-matching finds them by dividing the image into overlapping blocks and comparing every block to every other block.
In practice, comparing every pair of blocks directly is computationally expensive on high-resolution images. The standard approach represents each block as a compact feature vector, typically the quantised DCT coefficients of the block, and then lexicographically sorts all feature vectors. Duplicate or near-duplicate blocks appear as adjacent entries after sorting, reducing the comparison cost from O(n^2) to O(n log n). A match is accepted only if the Euclidean distance between feature vectors falls below a threshold and the two blocks are spatially separated by more than a minimum distance, filtering out trivially similar neighbours in smooth backgrounds.
Post-processing applies a spatial coherence filter: individual isolated block matches are likely noise; a cluster of nearby matched block pairs pointing in the same direction is strong evidence of a copied region. The cluster defines the boundaries of both the source and the destination of the copy.
Block-matching breaks when the attacker rotates the copy. SIFT does not.
A forger who copies a region and then rotates, scales, or reflects it before pasting defeats straightforward block-matching because the resulting blocks are no longer pixel-identical. SIFT (proposed by David Lowe in 1999, with SURF as a faster variant) is the standard solution. Instead of comparing fixed rectangular blocks, SIFT detects interest points at multiple scales, computes a gradient-based descriptor that is invariant to scale and rotation, and then matches descriptors across the image.
When two spatially separated regions of the same image yield a cluster of matched SIFT keypoints, the matched keypoint pairs define a geometric relationship between the two regions. Estimating an affine transform (rotation, uniform scaling, shear, translation) from the keypoint correspondences, typically using RANSAC (random sample consensus) to reject outliers, both confirms that the match is geometrically consistent and quantifies the transformation. A rotation of 45 degrees with a scale factor of 0.8, verified by many inlier keypoint pairs, is strong evidence of copy-move with geometric transformation.
Keypoint methods struggle in regions with little texture. A copied region of sky or smooth wall produces few stable keypoints, so matching fails even when the copy is obvious visually. Hybrid detectors that apply block-matching in smooth regions and SIFT in textured regions address this, and several published methods combine the two in a pipeline.
Light in a real scene has a direction. A composite from two scenes can have two directions.
Farid and Kee published their illumination-direction method in 2010. The key observation is that convex objects (spheres, human faces, curved surfaces) reflect specular highlights whose position encodes the direction of the dominant light source. By fitting a photometric model to the specular highlight pattern on each face or object in the image, the analyst can estimate the 3D light direction for each region and check whether those directions are consistent with a single illumination environment.
A genuine photograph has a consistent lighting environment, so all faces should yield light-direction estimates that agree within the expected variance. A composite image that places a face lit from a high left source next to a background lit from a low right source will show directional vectors that are irreconcilable. The method is most reliable on images with clear specular highlights on multiple approximately spherical objects, which makes it well-suited to portraits and group photographs but less useful for landscapes or documents.
| Method | What it detects | Best suited to | Key limitation |
|---|---|---|---|
| Illumination direction (Farid-Kee) | Physically inconsistent light directions across regions | Portraits, faces, group photos | Requires specular highlights on convex surfaces |
| Chromatic aberration | Lens-fingerprint inconsistency at splice boundaries | Images from single-camera sources | Requires high-contrast edges; varies with zoom/aperture |
| Noise-level inconsistency | Different sensor noise in different regions | Most image types | Sensitive to in-camera processing differences |
| Block-matching | Copied regions from same image | All image types | Fails on rotated/scaled copies unless combined with SIFT |
Every lens bends light imperfectly. That imperfection is a fingerprint.
Chromatic aberration arises because glass refracts different wavelengths of visible light at slightly different angles. In a camera lens, this means that the red, green, and blue channels of the image are focused at slightly different distances, and at high-contrast edges the three channels are spatially displaced from each other by a fraction of a pixel. The displacement pattern is characteristic of the specific lens design: it is stronger at the corners than at the centre (lateral chromatic aberration) and its magnitude varies with focal length and aperture.
Forgery detection based on chromatic aberration works as follows. Estimate the aberration model for the host image's lens by fitting the lateral displacement between colour channels across many edges in the non-suspect regions. Then check whether the high-contrast edges within the suspect region conform to the same model. If the suspect region was photographed with a different lens, its aberration pattern will be inconsistent with the fitted host model. If the forger applied lens-correction software to the pasted region before inserting it, the chromatic aberration will be reduced or eliminated, which is itself an anomaly in an uncorrected host image.
Classical methods are interpretable but manual. CNNs are automatic but opaque.
The last decade has seen deep-learning methods move from promising to mainstream in image forensics, with Noiseprint (Cozzolino and Verdoliva, 2019) and MantraNet (Wu et al., 2019) among the most studied.
Noiseprint is a Siamese network trained to suppress image content and retain only the camera-model fingerprint in the noise residual. Every camera model produces a characteristic fixed-pattern noise (FPN) because of manufacturing variations in the sensor. Noiseprint extracts this fingerprint as a dense feature map. In an authentic image from a single camera, the fingerprint is spatially consistent. Where a region was copied from a different camera, or composited from a different source, the fingerprint is inconsistent with its surroundings. A clustering step on the fingerprint map localises the forgery boundary.
MantraNet approaches the problem differently. It trains a two-branch architecture: a feature extractor branch learns visual artefacts from over 400 types of image manipulation, and an anomaly detector branch compares local features to their neighbourhood statistics to flag regions that are locally inconsistent. MantraNet outputs a pixel-level anomaly map rather than a binary forgery label.
No single detector closes the case. Convergence of independent methods does.
The field consensus, reflected in SWGIT guidance and in the academic literature, is that no single detection method is sufficient for a reliable forensic conclusion. Each method has its own failure modes: block-matching misses rotated copies, SIFT fails in smooth regions, illumination analysis requires specific scene geometry, and deep-learning methods falter on out-of-distribution images. The appropriate strategy is to run multiple independent methods and report the level of convergence.
Documenting the methods, their versions, the parameter settings used, and the known limits of each is the baseline expectation for casework in any jurisdiction that applies reliability standards to expert evidence.
Why does block-matching use lexicographic sorting of feature vectors rather than direct pairwise comparison?
Test yourself on Forensic Audio, Video and Image Analysis with free, timed mocks.
Practice Forensic Audio, Video and Image Analysis questionsSpotted an error in this page? Report a correction or read our editorial standards.