Copy-Move and Splicing Detection

Copy-move and splicing are the two most common structural forgeries in digital photographs. This topic covers the block-matching and keypoint-based algorithms for detecting duplicated regions, illumination and chromatic aberration inconsistency for detecting spliced content, and the deep-learning detectors that now supplement classical methods.

Last updated: 10 Jun 2026

Copy-move and splicing are the two manipulation classes that forensic image analysts encounter most often. Copy-move steals from within: a region of the image is duplicated and pasted elsewhere in the same frame, hiding an object or padding a background. Splicing steals from outside: content from a separate photograph is imported and blended in. The two classes require different detection strategies, but both leave structural traces that careful analysis can uncover.

The detection challenge is substantial. Copy-move forgeries are hard precisely because the pasted region came from the same photographic event and therefore has the same noise level, the same colour temperature, and the same compression history as the surrounding image. Splicing is in principle easier because the inserted content comes from a different source and carries foreign statistics, but a skilled forger can match brightness, colour, and noise to make the join invisible.

This topic covers the two main algorithmic traditions for copy-move detection (block-matching and keypoint-based SIFT/SURF), the geometric tools for handling rotated or scaled duplicates, and three passive methods for splicing detection: illumination-direction inconsistency, chromatic aberration inconsistency, and noise-level inconsistency. It closes with deep-learning detectors (Noiseprint, MantraNet) and an honest account of where they succeed and where they fail.

Key terms

Copy-move forgery: A manipulation that copies a region from within the same image and pastes it elsewhere, typically to conceal or duplicate content.
Splicing: A manipulation that inserts content from a different source image into the target, producing a composite with regions of different photographic origin.
SIFT (Scale-Invariant Feature Transform): A keypoint detection and description algorithm that extracts local feature descriptors invariant to scale, rotation, and partial illumination change, enabling matching of corresponding regions across geometric transformations.
Affine transform estimation: The recovery of the geometric transformation (rotation, scaling, shear) relating a copied region to its source, using matched keypoint pairs. Geometrically consistent matching across a region confirms that the same content appears in two places.
Chromatic aberration: Colour fringing at high-contrast edges caused by differential refraction of light wavelengths by a lens. The pattern is lens-specific and inconsistent across a splice boundary where content from a different lens is inserted.
Noiseprint: A CNN-based forensic tool that extracts a camera-model fingerprint from the noise residual of an image; regions where the fingerprint is inconsistent with the dominant pattern are flagged as potentially manipulated.

Block-matching for copy-move detection

Two identical patches in different parts of the image cannot both be original.

The foundational insight is simple: if a region was copied from one part of an image and pasted elsewhere, both the source region and the destination region now exist in the image simultaneously. They are near-identical patches in different locations. Exhaustive block-matching finds them by dividing the image into overlapping blocks and comparing every block to every other block.

In practice, comparing every pair of blocks directly is computationally expensive on high-resolution images. The standard approach represents each block as a compact feature vector, typically the quantised DCT coefficients of the block, and then lexicographically sorts all feature vectors. Duplicate or near-duplicate blocks appear as adjacent entries after sorting, reducing the comparison cost from O(n^2) to O(n log n). A match is accepted only if the Euclidean distance between feature vectors falls below a threshold and the two blocks are spatially separated by more than a minimum distance, filtering out trivially similar neighbours in smooth backgrounds.

Block-matching pipeline for copy-move detection.

Post-processing applies a spatial coherence filter: individual isolated block matches are likely noise; a cluster of nearby matched block pairs pointing in the same direction is strong evidence of a copied region. The cluster defines the boundaries of both the source and the destination of the copy.

Keypoint-based detection and affine-transform estimation

Block-matching breaks when the attacker rotates the copy. SIFT does not.

A forger who copies a region and then rotates, scales, or reflects it before pasting defeats straightforward block-matching because the resulting blocks are no longer pixel-identical. SIFT (proposed by David Lowe in 1999, with SURF as a faster variant) is the standard solution. Instead of comparing fixed rectangular blocks, SIFT detects interest points at multiple scales, computes a gradient-based descriptor that is invariant to scale and rotation, and then matches descriptors across the image.

When two spatially separated regions of the same image yield a cluster of matched SIFT keypoints, the matched keypoint pairs define a geometric relationship between the two regions. Estimating an affine transform (rotation, uniform scaling, shear, translation) from the keypoint correspondences, typically using RANSAC (random sample consensus) to reject outliers, both confirms that the match is geometrically consistent and quantifies the transformation. A rotation of 45 degrees with a scale factor of 0.8, verified by many inlier keypoint pairs, is strong evidence of copy-move with geometric transformation.

Keypoint methods struggle in regions with little texture. A copied region of sky or smooth wall produces few stable keypoints, so matching fails even when the copy is obvious visually. Hybrid detectors that apply block-matching in smooth regions and SIFT in textured regions address this, and several published methods combine the two in a pipeline.

Splicing detection: illumination-direction inconsistency

Light in a real scene has a direction. A composite from two scenes can have two directions.

Farid and Kee published their illumination-direction method in 2010. The key observation is that convex objects (spheres, human faces, curved surfaces) reflect specular highlights whose position encodes the direction of the dominant light source. By fitting a photometric model to the specular highlight pattern on each face or object in the image, the analyst can estimate the 3D light direction for each region and check whether those directions are consistent with a single illumination environment.

A genuine photograph has a consistent lighting environment, so all faces should yield light-direction estimates that agree within the expected variance. A composite image that places a face lit from a high left source next to a background lit from a low right source will show directional vectors that are irreconcilable. The method is most reliable on images with clear specular highlights on multiple approximately spherical objects, which makes it well-suited to portraits and group photographs but less useful for landscapes or documents.

Method	What it detects	Best suited to	Key limitation
Illumination direction (Farid-Kee)	Physically inconsistent light directions across regions	Portraits, faces, group photos	Requires specular highlights on convex surfaces
Chromatic aberration	Lens-fingerprint inconsistency at splice boundaries	Images from single-camera sources	Requires high-contrast edges; varies with zoom/aperture
Noise-level inconsistency	Different sensor noise in different regions	Most image types	Sensitive to in-camera processing differences
Block-matching	Copied regions from same image	All image types	Fails on rotated/scaled copies unless combined with SIFT

Splicing detection: chromatic aberration inconsistency

Every lens bends light imperfectly. That imperfection is a fingerprint.

Chromatic aberration arises because glass refracts different wavelengths of visible light at slightly different angles. In a camera lens, this means that the red, green, and blue channels of the image are focused at slightly different distances, and at high-contrast edges the three channels are spatially displaced from each other by a fraction of a pixel. The displacement pattern is characteristic of the specific lens design: it is stronger at the corners than at the centre (lateral chromatic aberration) and its magnitude varies with focal length and aperture.

Forgery detection based on chromatic aberration works as follows. Estimate the aberration model for the host image's lens by fitting the lateral displacement between colour channels across many edges in the non-suspect regions. Then check whether the high-contrast edges within the suspect region conform to the same model. If the suspect region was photographed with a different lens, its aberration pattern will be inconsistent with the fitted host model. If the forger applied lens-correction software to the pasted region before inserting it, the chromatic aberration will be reduced or eliminated, which is itself an anomaly in an uncorrected host image.

Chromatic aberration displacement grows from image centre to corners.

Deep-learning detectors: Noiseprint and MantraNet

Classical methods are interpretable but manual. CNNs are automatic but opaque.

The last decade has seen deep-learning methods move from promising to mainstream in image forensics, with Noiseprint (Cozzolino and Verdoliva, 2019) and MantraNet (Wu et al., 2019) among the most studied.

Noiseprint is a Siamese network trained to suppress image content and retain only the camera-model fingerprint in the noise residual. Every camera model produces a characteristic fixed-pattern noise (FPN) because of manufacturing variations in the sensor. Noiseprint extracts this fingerprint as a dense feature map. In an authentic image from a single camera, the fingerprint is spatially consistent. Where a region was copied from a different camera, or composited from a different source, the fingerprint is inconsistent with its surroundings. A clustering step on the fingerprint map localises the forgery boundary.

MantraNet approaches the problem differently. It trains a two-branch architecture: a feature extractor branch learns visual artefacts from over 400 types of image manipulation, and an anomaly detector branch compares local features to their neighbourhood statistics to flag regions that are locally inconsistent. MantraNet outputs a pixel-level anomaly map rather than a binary forgery label.

Combining methods for reliable conclusions

No single detector closes the case. Convergence of independent methods does.

The field consensus, reflected in SWGIT guidance and in the academic literature, is that no single detection method is sufficient for a reliable forensic conclusion. Each method has its own failure modes: block-matching misses rotated copies, SIFT fails in smooth regions, illumination analysis requires specific scene geometry, and deep-learning methods falter on out-of-distribution images. The appropriate strategy is to run multiple independent methods and report the level of convergence.

Convergent findings (multiple methods flag the same region consistently) provide strong evidence of manipulation and allow the analyst to characterise the type of forgery.
Divergent findings (one method flags, others do not) require investigation of whether the flagging method is producing a false positive before any conclusion is drawn.
Null findings (no method flags anything) should be reported as 'no detectable manipulation by methods A, B, and C' rather than 'authentic', as all methods have blind spots.

Documenting the methods, their versions, the parameter settings used, and the known limits of each is the baseline expectation for casework in any jurisdiction that applies reliability standards to expert evidence.

Worked example

Detecting a copy-move concealment in a construction-site photograph

A copied patch of sky over a hazard. Two methods find it; one almost misses it.

An insurance investigator suspects that a construction-site photograph submitted to support a liability claim has been altered to conceal a scaffolding failure. A region of collapsed framework appears to have been replaced with what looks like an undamaged continuation of the structure. The analyst receives a 12-megapixel JPEG at quality 88.

Block-matching. Dividing the image into 32x32 pixel overlapping blocks and applying DCT-based feature matching produces a cluster of 47 matched block pairs. The cluster links a region in the upper-right scaffolding area to an almost-identical region 180 pixels to the left. The spatial coherence filter confirms the cluster: all matched pairs have displacement vectors within 15 degrees of each other.
SIFT keypoint matching. The scaffolding region is texturally rich (metal tubes, bolts, shadows), generating plentiful keypoints. SIFT finds 34 matched keypoint pairs between the two regions. RANSAC estimates a near-identity transform (translation of 180 pixels, rotation of 0.3 degrees, scale 1.0). The inlier count is 31 of 34 pairs, which is high.
Noise-level check. The noise variance in the suspect region is slightly elevated relative to the rest of the scaffolding, consistent with a re-save at a different quality. JPEG ghost analysis shows the suspect region's minimum error at quality 91 versus quality 88 for the surrounding image.
Finding. Three independent methods converge on the same region. The evidence is consistent with a region of the image being copied from a nearby area and pasted over the site of the structural failure. The translation distance and structural geometry are consistent with the copied region being an undamaged portion of the same scaffold frame, scaled and placed to hide damage.

Check your understanding

Question 1 of 4· 0 answered

Why does block-matching use lexicographic sorting of feature vectors rather than direct pairwise comparison?

Key Takeaways

Block-matching detects copy-move by finding near-identical spatially-separated patches via sorted DCT feature vectors; post-processing with spatial coherence filtering removes false positives.
Keypoint-based methods (SIFT, SURF) handle rotated and scaled duplicates by matching scale- and rotation-invariant descriptors and estimating an affine transform with RANSAC.
Splicing can be detected through illumination-direction inconsistency across faces or objects, chromatic aberration fingerprint mismatch at boundaries, and noise-level inconsistency between regions.
CNN-based detectors (Noiseprint, MantraNet) automate forgery localisation but generalise imperfectly to out-of-distribution images and should be used with expert interpretation, not as black-box verdicts.
Reliable forensic conclusions require convergence across multiple independent methods; a single method flagging one region is a hypothesis, not a finding.

How does block-matching detect copy-move forgery?

The image is divided into overlapping blocks, each block is represented by a feature vector, and the vectors are lexicographically sorted. Blocks that are near-identical after sorting correspond to duplicated regions. Spatial coherence filtering removes false positives, and the remaining cluster of matched pairs marks the copied and destination regions.

Why are keypoint-based methods used for copy-move detection?

Block-matching struggles when the copied region has been rotated, scaled, or reflected. SIFT and SURF extract scale-invariant, rotation-invariant descriptors from interest points, allowing matching across geometric transformations. An affine transform can then be estimated from matched keypoint pairs to verify geometric consistency.

How does illumination-direction inconsistency reveal image splicing?

In a real photograph, specular highlights on convex surfaces encode the light direction. By comparing estimated light directions across different faces or objects, the analyst can detect physically impossible illumination inconsistencies that reveal compositing from images taken under different lighting.

What is chromatic aberration and how can it detect splicing?

Chromatic aberration is colour fringing at high-contrast edges caused by a lens refracting wavelengths differently. The pattern is lens-specific. When a region from a different lens is pasted in, its aberration pattern at the splice boundary is inconsistent with the host lens model, revealing the splice.

What are Noiseprint and MantraNet, and what are their limitations?

Noiseprint extracts a camera-model fingerprint from the noise residual; inconsistencies reveal manipulated regions. MantraNet detects local anomalies in a learned feature space trained on over 400 forgery types. Both can fail on images heavily processed by social media pipelines or generated by AI tools that produce no traditional manipulation artefacts.

Test yourself on Forensic Audio, Video and Image Analysis with free, timed mocks.

Practice Forensic Audio, Video and Image Analysis questions

Found this useful? Pass it along.

Spotted an error in this page? Report a correction or read our editorial standards.

Copy-Move and Splicing Detection

Last updated: 10 Jun 2026

Method

What it detects

Best suited to

Key limitation

Illumination direction (Farid-Kee)

Physically inconsistent light directions across regions

Portraits, faces, group photos

Requires specular highlights on convex surfaces

Chromatic aberration

Lens-fingerprint inconsistency at splice boundaries

Images from single-camera sources

Requires high-contrast edges; varies with zoom/aperture

Noise-level inconsistency

Different sensor noise in different regions

Most image types

Sensitive to in-camera processing differences

Block-matching

Copied regions from same image

All image types

Fails on rotated/scaled copies unless combined with SIFT

Key Takeaways

Block-matching detects copy-move by finding near-identical spatially-separated patches via sorted DCT feature vectors; post-processing with spatial coherence filtering removes false positives.

Keypoint-based methods (SIFT, SURF) handle rotated and scaled duplicates by matching scale- and rotation-invariant descriptors and estimating an affine transform with RANSAC.

Splicing can be detected through illumination-direction inconsistency across faces or objects, chromatic aberration fingerprint mismatch at boundaries, and noise-level inconsistency between regions.

CNN-based detectors (Noiseprint, MantraNet) automate forgery localisation but generalise imperfectly to out-of-distribution images and should be used with expert interpretation, not as black-box verdicts.

Reliable forensic conclusions require convergence across multiple independent methods; a single method flagging one region is a hypothesis, not a finding.

How does block-matching detect copy-move forgery?

Why are keypoint-based methods used for copy-move detection?

How does illumination-direction inconsistency reveal image splicing?

What is chromatic aberration and how can it detect splicing?

What are Noiseprint and MantraNet, and what are their limitations?

Your journey to becoming a forensic professional starts here.

Key Takeaways

Key Takeaways