PDF Metadata Forensics and Document Image Authentication

The PDF and image-authentication side of modern document casework: the PDF object model (the document catalogue, the trailer, the cross-reference table, incremental updates, the producer and creator metadata fields), the digital-signature stack on signed PDFs and what its absence or break means, document image authentication (error level analysis or ELA, copy-move detection, JPEG ghost analysis, lighting and shadow consistency, EXIF metadata reconciliation), and the limits of these techniques against modern image-editing pipelines.

Last updated: 19 Jun 2026

PDF documents store provenance evidence at three independent layers: the object graph (cross-reference table, information dictionary, XMP stream), the cryptographic signature chain (ByteRange coverage, certificate validity, timestamp authority), and the pixel statistics of embedded images (JPEG compression history, EXIF metadata, lighting geometry). A forger who edits any of these layers leaves characteristic traces that a trained examiner can recover using structural analysis tools, signature verification, and image authentication techniques including Error Level Analysis, JPEG ghost analysis, and copy-move detection. No single technique is definitive; reliable conclusions require corroboration across at least two independent channels.

A PDF is a structured object graph, not a flat image. Every node records provenance: creating application, modification date, and whether any bytes were appended after the original save. A forger who edits a PDF, alters a date, or splices a signature leaves traceable evidence at three levels: the PDF metadata, the byte-level signature chain, and the pixel statistics of embedded images.

Key takeaways

Multiple cross-reference (xref) sections in a PDF indicate incremental updates, each recoverable as a separate forensically significant editing layer.
Discrepancies between the information dictionary and the XMP metadata stream reveal post-creation metadata editing.
A PDF digital signature's ByteRange entry records exactly which bytes were hashed; content outside that range is not covered, even if the signature itself is mathematically valid.
Error Level Analysis (ELA), developed by Neal Krawetz (2007), detects regions of different JPEG compression history but requires content-aware interpretation because dense text naturally shows higher error levels.
ENFSI Digital Imaging Working Group and SWGMAT guidance classify ELA as a screening tool; definitive conclusions require corroboration from a second technique such as JPEG ghost analysis or copy-move detection.

The cryptographic signature layer is examined in detail under e-signatures: Aadhaar e-Sign, eIDAS, and ESIGN/UETA, where the PKI framework, certificate chain verification, and jurisdiction-specific validity requirements are fully developed.

Document image authentication sits alongside PDF structural analysis as the second major pillar of digital document forensics. When an examiner receives a scan, a photograph of a page, or a JPEG exported from a PDF, the question shifts from object-graph analysis to image physics: does the error pattern encoded in the JPEG compression artefacts match what would be expected from a single, unedited acquisition? Are there regions of the image that show different compression histories, suggesting a patch-and-resave operation? Does the EXIF metadata embedded by the camera or scanner match the lighting geometry visible in the image itself?

Courts in multiple jurisdictions now receive expert evidence on both classes of analysis. In the UK, CCTV and document image authentication expert reports are submitted under Criminal Procedure Rules Part 19. In India, digital evidence admissibility operates under the Bharatiya Sakshya Adhiniyam (BSA) 2023 sections 57 to 59 (electronic records), with the forensic examiner's report conforming to MHA-CFSL technical guidelines. In the United States, Federal Rule of Evidence 702 (Daubert standard) governs the admissibility of image-authentication expert testimony, and several reported decisions now address ELA and EXIF evidence specifically. Examiners working in any of these jurisdictions must understand the science behind both pillars before forming or reporting conclusions.

By the end of this topic you will be able to:

Reconstruct the editing history of a PDF by parsing cross-reference table sections and comparing information dictionary fields against the XMP metadata stream.
Verify a PDF digital signature by checking hash validity, certificate chain integrity, revocation status, and ByteRange coverage of the full document.
Apply Error Level Analysis (ELA) to a JPEG image, correctly interpreting texture-driven variation as distinct from manipulation-driven anomalies.
Distinguish the forensic signals produced by copy-move detection, JPEG ghost analysis, and lighting-consistency analysis, and identify which technique is appropriate for a given suspected manipulation type.
Recognise the limits of classical image authentication tools against content-aware fill and AI-generated inpainting, and explain why ensemble methods are now required for reliable conclusions.

The PDF Object Model: Catalogue, Trailer, and Cross-Reference Table

A PDF file consists of a sequence of objects (integers, strings, arrays, dictionaries, streams, and references between them) followed by a cross-reference table (xref table) that maps each object's number to its byte offset in the file, and a trailer dictionary that points to the document catalogue and the previous xref section if one exists.

The document catalogue is the root dictionary of the PDF's logical structure. It links to the page tree, the named destinations, the outlines (bookmarks), the interactive form dictionary, and the document information dictionary. The information dictionary is where the metadata most relevant to provenance lives: the Creator field (the application that originally authored the document, often a word processor or design tool), the Producer field (the application that converted the content to PDF format, often a PDF printer driver or export library), the CreationDate field, and the ModDate (last modification date). These fields are plain text strings; they can be set to any value by the creating application and can be edited by a hex editor or a library such as Python's pypdf. They are therefore unreliable as a sole provenance indicator. The forensic examiner's task is to cross-check these fields against everything else in the file.

The cross-reference table records every object. In an original PDF, there is a single xref section. Whenever a PDF is edited and resaved incrementally, a new xref section is appended to the file without overwriting the original objects. This is the PDF incremental update mechanism, and it is the most important structural feature for forensic analysis. A document that shows two or more xref sections has been edited after the original save. The examiner can read each incremental update in isolation to reconstruct exactly which objects were changed or added at each stage, and tools such as QPDF (open source), pdf-parser.py (Didier Stevens, freely available), and PDF Examiner (Malware Tracker) expose this structure directly.

PDF structural layers: the original save produces a body and xref table; each incremental update appends a new body section and xref, with the trailer pointing backward through the chain. A forger's edit is visible as an additional xref layer.

XMP metadata (Extensible Metadata Platform, ISO 16684) is a second metadata stream embedded in most modern PDFs as a binary stream on the document catalogue. XMP duplicates the information dictionary fields (Creator, Producer, dates) in XML format. Discrepancies between the information dictionary values and the XMP stream values indicate post-creation metadata editing: a forger who edits the information dictionary via a simple library may not know to update the XMP stream simultaneously, or vice versa. Comparing both is a standard first step in PDF provenance analysis.

Font embedding records tell the examiner which fonts were used to render each text object. A document claimed to have been created by one word processor may carry embedded font descriptors that are inconsistent with that application's font stack for the claimed platform and version. A sworn affidavit claimed to have been typed in Microsoft Word 2003 on Windows XP that embeds a Core Text macOS TrueType descriptor is internally inconsistent.

Digital-Signature Forensics: What Absence or a Broken Chain Means

PDF digital signatures are governed by the PDF specification (ISO 32000-2) and implemented by most PDF creation tools via PKCS#7 / CMS (Cryptographic Message Syntax) containers. When a PDF is digitally signed, the signing application computes a cryptographic hash (SHA-256 or SHA-512 in modern implementations) over a defined byte range of the file, then wraps that hash in a signed PKCS#7 structure that includes the signer's X.509 certificate and the timestamp from a trusted Timestamp Authority (TSA). The signature dictionary is embedded back into the PDF, and the byte range that was hashed is recorded in the signature dictionary as the ByteRange entry.

The ByteRange entry specifies which portions of the file the signature covers, expressed as four integers: offset and length for the first covered region, and offset and length for the second covered region. The gap between the two regions is the signature container itself, which cannot hash itself. This architecture is important forensically: any bytes appended to the file after the original signing are outside the ByteRange and are therefore not covered by the signature. A valid signature in a document that has received incremental updates after signing does not cover those updates. Adobe Acrobat Reader visually flags this: it distinguishes between "signature is valid and covers the entire document" and "signature is valid but does not cover all content; modifications may have been made."

The examiner's workflow on a signed PDF proceeds in three steps: verify the mathematical validity of the signature (the hash matches); verify the certificate chain (the signer's certificate was issued by a trusted CA, is within its validity period, and was not revoked at the time of signing per OCSP or CRL); and verify coverage (the ByteRange covers all bytes except the signature container). In the UK, the Forensic Science Regulator's Legal Guidance for Digital Evidence (2020 edition) addresses signature verification as a component of electronic document examination. In India, the IT Act 2000 section 3 (as amended) and the Controller of Certifying Authorities (CCA) framework establish the legal validity of digital signatures, and the CCA's technical guidelines specify SHA-256 minimum hash lengths for signatures after April 2018. EU courts adjudicating under eIDAS Regulation 910/2014 recognise a qualified electronic signature as having the legal effect of a handwritten signature, and the TSA timestamp is treated as evidence of the signing time under ETSI EN 319 422.

Error Level Analysis: Reading the Compression History of an Image

JPEG compression is lossy: each time a JPEG is saved, the image is divided into 8x8 pixel blocks and a discrete cosine transform (DCT) is applied to each block, followed by quantisation (dividing the DCT coefficients by a quantisation matrix and rounding to integers). The quantisation step discards information, and the degree of information loss depends on the quality factor (1 to 100 in most implementations, where 100 is near-lossless and 1 is maximally lossy). When a JPEG is resaved, the existing quantised DCT coefficients are dequantised, an inverse DCT is applied to reconstruct pixel values (now with quantisation error), and the DCT-quantise cycle is applied again. The resulting image accumulates quantisation error across each save.

Error Level Analysis (ELA) exploits this accumulation. The technique was published by Neal Krawetz in 2007 and operates as follows: take the image under examination, resave it at a known quality level (typically 95 per cent), and compute the per-pixel difference between the original and the resaved version. This difference image is the error level image. In an unmodified JPEG that has been compressed only once at approximately the claimed quality level, the error pattern will be relatively uniform across the image. Regions that have been pasted in from a different image, or that have been edited and locally resaved, will show a different error level than the surrounding area, because they carry a different compression history.

Interpreting ELA output requires care. Several conditions produce localised variation in a genuinely unmodified image: regions of uniform colour (large flat skies, blank white sections of a form) converge to near-zero error quickly because there is little high-frequency DCT information to preserve, while regions of fine texture (hair, fabric, printed text) retain higher error levels through multiple saves. An ELA result therefore requires interpretation in the context of the image content, and an examiner who reports a bright ELA region as "evidence of manipulation" without accounting for texture content is applying the technique incorrectly. Published guidance from the ENFSI Digital Imaging Working Group and the SWGMAT (US) Digital Imaging Technical Working Group both note that ELA is a screening tool, not a definitive indicator of manipulation.

Copy-Move Detection, JPEG Ghost Analysis, and Lighting Consistency

Copy-move forgery occurs when a region of an image is copied and pasted onto another region within the same image to conceal something: a date stamp obscured by cloning a portion of the background, a signature field covered by a region from the page margins, a person removed from a photograph and replaced with duplicated background pixels. Copy-move detection algorithms search for block-level or keypoint-level similarities within the image.

Block-based methods divide the image into overlapping tiles (typically 16x16 or 32x32 pixels), compute a feature vector for each tile (DCT coefficients, PCA-reduced pixel values, or robust hash), and search for near-duplicate tile pairs. Because the copied region shares the same pixel statistics as its source, duplicate tile pairs stand out against the rest of the image. Keypoint-based methods (SIFT, SURF, ORB) extract local feature descriptors at interest points across the image and match descriptor pairs; post-filtering by geometric consistency (RANSAC) distinguishes true copy-move matches from false positives. Tools implementing these methods include PhotoForensics.com (browser-based, Krawetz), FotoForensics, and academic implementations in Matlab and Python used in research labs at UNISA (Italy), Stony Brook University (US), and the DFR Lab (Digital Forensic Research Laboratory, Atlantic Council, international).

JPEG ghost analysis (Farid, 2009) detects double-compressed regions. When a JPEG patch from a source image is pasted into a target JPEG and the composite is saved, the pasted region has been compressed at two different quality levels: the source JPEG's compression level and the final save's compression level. Resaving the composite at a range of quality values and computing the pixel-level difference between each resaved version and the composite produces a ghost signal at the quality level matching the source image's original compression. The ghost appears as a region of anomalously low difference (the patch is "at home" at the source quality) against a background of higher difference (the rest of the image, compressed at a different original quality level). This technique is independent of ELA and provides a second channel of forensic signal that corroborates or contradicts the ELA finding.

Lighting and shadow consistency analysis applies geometric photogrammetry rather than compression statistics. If a spliced element was photographed under a different light source geometry than the background scene, the direction and softness of cast shadows will be inconsistent. A 3D reconstruction of the light source position from the shadows cast by objects in the background can be compared with the light source implied by the shadows or specular highlights on the spliced element. Farid's group at Dartmouth (US) and the Intelligent Systems Group at the University of Amsterdam (Netherlands) have published extensively on shadow-geometry and specular-geometry inconsistency as manipulation indicators.

Technique	What it detects	Underlying signal	Key limitation
Error Level Analysis (ELA)	Regions with different compression history	Per-pixel DCT quantisation error differential	Texture variation mimics manipulation signal; requires content-aware interpretation
Copy-move detection	Duplicated regions within the same image	Block similarity or keypoint descriptor matching	Scaling or rotation of copied region degrades detection; post-processing can break block alignment
JPEG ghost analysis	Double-compressed patches from a different source image	Per-quality-level pixel difference minimum	Requires original and target to be JPEG; ineffective on PNG or TIFF composites
Lighting/shadow consistency	Spliced elements from a different lighting environment	3D geometry of shadow direction and specular highlights	Requires visible cast shadows or specular surfaces; computationally intensive

Four image authentication techniques mapped by manipulation type: ELA targets compression-history gaps; copy-move targets duplicated regions; JPEG ghost targets double-compressed patches; lighting analysis targets geometry-inconsistent splices. No single technique is definitive; corroboration across two channels is required.

EXIF Metadata Reconciliation: Camera, Time, and GPS

EXIF (Exchangeable Image File Format, JEIDA/JEITA, now formalised in the CIPA DC-008 standard) is the metadata standard embedded by cameras and scanners in JPEG and TIFF files. An unmodified camera original carries a rich metadata block: camera make and model, firmware version, lens focal length, aperture, shutter speed, ISO sensitivity, white balance setting, flash status, creation timestamp (from the camera's internal clock), GPS coordinates if the device has a location sensor, and the software that last processed the image (relevant when the image passed through a RAW-to-JPEG conversion pipeline).

EXIF metadata is not cryptographically protected in the base EXIF specification. It can be edited by any hex editor, by ExifTool (Phil Harvey, widely used by both legitimate photographers and document fraudsters), or by any image editing application that writes its own metadata on save. The forensic value of EXIF metadata lies not in its tamper-resistance but in the consistency of multiple fields with each other and with the image content: a claimed outdoor photograph at noon in July in London showing EXIF data indicating the camera's built-in flash fired and an ISO of 3200 is self-inconsistent. A document claimed to have been scanned on a Canon MF644Cdw in 2018 carrying an EXIF SoftwareVersion string that first appeared in 2021 is internally anachronistic.

Several cross-checks are standard practice in EXIF reconciliation. First, camera clock drift: every digital camera's clock runs slightly fast or slow, and some cameras ship with the clock unset (defaulting to 2000-01-01 or similar). A sequence of photographs that claim to document a scene in chronological order should show monotonically increasing timestamps with intervals consistent with what the claimed sequence required. Gaps or reversals in the sequence are forensically significant. Second, GPS-timestamp reconciliation: the GPS EXIF block includes both a GPS timestamp (UTC, from the GPS satellite signal) and a local time (from the camera clock). If these two timestamps are inconsistent by more than the expected clock drift, the GPS metadata or the local timestamp has been altered. Third, the thumbnail-main image consistency check: EXIF thumbnails are generated by the camera at capture and embedded in the EXIF block. If the main image has been cropped, rotated, or edited, the thumbnail may retain the original uncropped framing, providing an independent reference image of what the camera originally captured.

In Indian forensic casework, EXIF analysis is conducted under the CFSL (Central Forensic Science Laboratory) technical guidelines for digital image examination, which reference the SWGMAT guidelines as the international standard. In the UK, the Digital Imaging and CCTV Analysis Faculty of the Forensic Science Regulator has published technical guidance on image metadata analysis as part of its Digital Forensics suite. US federal examiners follow SWGMAT (Scientific Working Group for Materials Analysis) and NIST guidelines, with case-specific methodology challenged under Daubert and assessed against the four Daubert criteria (testing, error rate, peer review, general acceptance). The admissibility gatekeeping process, Daubert hearing structure, the role of error rates and PCAST-style validation demands, and the examiner's obligations under FRE 702, is common ground with expert witness testimony and cognitive bias mitigation.

Limits of Image Authentication Against Modern Editing Pipelines

Error Level Analysis was developed against the editing pipelines of its era, when forgers used JPEG-based copy-paste in applications such as Adobe Photoshop 7 or GIMP 2.x. Modern content-aware fill in Adobe Photoshop (Neural Filters, Content-Aware Scale, Generative Fill from the 2023 Adobe Firefly integration) and equivalent tools in DxO PhotoLab, Lightroom, and DALL-E-integrated pipelines do not necessarily leave the block-boundary artefacts that ELA detects. Content-aware fill synthesises new pixel values rather than copying an existing region, so there is no copied region with a different compression history to detect.

Copy-move detection assumes the manipulated region was taken directly from elsewhere in the image (or from a source JPEG) without scaling or rotation. Modern compositing workflows frequently use perspective correction, warping, and colour grading that break the block-level similarity assumptions underlying naive copy-move algorithms. Keypoint-based methods handle moderate rotation and scaling but fail under large affine transforms or when the forger applies deliberate noise to break feature matching.

AI-generated textures (Stable Diffusion inpainting, DALL-E inpainting, Midjourney Outpainting) synthesise photorealistic pixel values from a text prompt or a mask, producing regions with no prior JPEG compression history and no copied source. ELA typically shows these regions as having low error levels (consistent with a single high-quality save), not elevated error levels, which is the opposite of what a naive interpretation would flag as suspicious. Examiners must understand that low ELA error in a specific region could indicate a synthesised (AI-generated) patch rather than a pristine original. The broader challenge posed by GAN-generated signatures, diffusion-model document synthesis, and the forensic toolkit now deployed against AI-fabricated materials is the subject of AI-generated documents, deepfake signatures, and image manipulation detection.

The response from the forensic science community has been to move toward ensemble methods: combining ELA, copy-move detection, JPEG ghost, PRNU camera fingerprint analysis, and neural network-based detection (trained on large compositing datasets) into a joint decision framework rather than relying on any single technique. Research groups at the University of Erlangen-Nuremberg (Germany), UC Berkeley (US), and the Alan Turing Institute (UK) are active in this space, and several publications from 2022 to 2025 report detection accuracy above 90 per cent on held-out composite datasets for ensemble methods, though accuracy degrades significantly against adversarially constructed forgeries designed to defeat specific detectors.

Structural analysis of the PDF
Extract the PDF object model using pdf-parser.py or QPDF. Count xref sections. Compare information dictionary fields against the XMP stream. Document any incremental updates and their byte ranges.
Signature verification
Identify all signature dictionaries. Verify the mathematical validity of each hash. Verify the certificate chain and revocation status at the signing date. Check whether the ByteRange covers the entire document or only a subset.
Image extraction
Extract all embedded JPEG and PNG images from the PDF using pdfimages or a forensic extraction tool. Preserve the original compressed bitstream; do not allow the extraction tool to decompress and recompress.
EXIF metadata reconciliation
Read EXIF metadata using ExifTool in verbose mode. Cross-check Creator, Producer, timestamp, GPS coordinates, and software version against each other and against the claimed provenance.
ELA and compression analysis
Run ELA on each extracted image at quality 95. Interpret anomalous regions in the context of image content (texture vs flat regions). Apply JPEG ghost analysis at quality steps 60-95 to detect double-compressed patches.
Copy-move detection
Run block-based and keypoint-based copy-move detection on each image. Apply RANSAC post-filtering to geometric consistency. Flag duplicate-region pairs for spatial analysis.
Report and limitation declaration
Document each technique applied, the result obtained, and the specific limitations of each technique for the image type under examination. Do not report ELA or copy-move results as definitive without corroborating evidence from a second channel.

Key terms

Cross-reference table (xref): The index structure at the end of a PDF file that maps each object number to its byte offset. Multiple xref sections indicate incremental updates, each corresponding to a post-original edit of the document.
Incremental update: The PDF mechanism by which an editing application appends new or modified objects to the end of an existing file without overwriting the original bytes. Each update leaves a forensically recoverable layer that can be examined separately.
ByteRange: A four-integer entry in a PDF digital signature dictionary specifying which byte ranges of the file were hashed when the signature was created. Content outside the ByteRange is not covered by the signature.
Error Level Analysis (ELA): An image authentication technique developed by Neal Krawetz (2007) that detects regions of different compression history by computing the per-pixel difference between an image and a resaved version at a known quality level.
JPEG ghost analysis: A technique (Farid, 2009) that identifies double-compressed patches by resaving a composite JPEG at multiple quality levels and identifying the quality at which the difference between the composite and the resaved version is minimised for a localised region.
Copy-move detection: A class of image forensics algorithms that search for duplicated regions within a single image, indicating a patch-and-conceal manipulation. Implemented via block feature matching or keypoint descriptor matching followed by geometric consistency filtering.
EXIF (Exchangeable Image File Format): A metadata standard (CIPA DC-008) embedded by cameras and scanners in JPEG and TIFF files, recording device identity, capture settings, timestamps, and GPS coordinates. Not cryptographically protected; forensic value lies in internal consistency checks.
XMP metadata: Extensible Metadata Platform (ISO 16684), an XML-based metadata stream embedded in many PDF and image files alongside the information dictionary. Discrepancies between XMP and information dictionary fields indicate post-creation metadata editing.
PRNU fingerprint: Photo Response Non-Uniformity: the unique pattern of pixel-level sensitivity variations in a camera sensor, which acts as a device fingerprint. Used to link an image to a specific camera and to detect regions of an image captured by a different sensor.
Content-aware fill: An image editing technique (Photoshop, GIMP, and AI-generative implementations) that synthesises plausible pixel values to replace a masked region, producing manipulations that may not carry the block-level compression artefacts detectable by ELA or copy-move methods.

Practice

Question 1 of 5· 0 answered

A PDF document shows three separate xref (cross-reference) sections when parsed with a forensic tool. What does this most reliably indicate?

Worked example

Gurugram startup dispute, PDF cross-reference stream reveals term sheet was edited 14 months after the claimed signing date

The document was created in 2021. The PDF said 2022. The incremental update said 2023. All three timestamps told the truth.

Scene: A Gurugram arbitration panel hears a dispute between a startup founder and an early investor. The investor produces a term sheet in PDF format as the primary evidence of agreed equity percentages. The founder alleges the percentages were altered after the original agreement was reached.

Step 1 (ExifTool metadata extraction): ExifTool is run against the PDF. The XMP metadata block records CreateDate as 2021-11-03T14:22:31+05:30 and ModifyDate as 2023-01-17T09:44:12+05:30. The Producer field identifies the creating application as Microsoft Word for Microsoft 365 (version 16.0.15330, released November 2021). The metadata conflict is immediately apparent: the document was purportedly agreed in November 2021, but the ModifyDate is over 14 months later.

Step 2 (PDF structure analysis, cross-reference stream): The PDF is opened in a hex editor and parsed for cross-reference sections. The file contains two cross-reference streams: the original (xref stream 1) covering objects 1, 47, and an incremental update (xref stream 2) covering objects 12 and 31. Object 12 is the page content stream containing the equity percentage figures. Object 31 is the document information dictionary. The incremental update is internally dated 2023-01-17, confirming a post-hoc modification to the content stream containing the equity percentages.

Step 3 (Font subsetting fingerprint): The original content stream (objects in xref 1) uses a TrueType subset of Calibri with glyph IDs consistent with a Word 2021 export. Object 12 (the incremental update content stream) uses a different Calibri subset with additional glyph IDs not present in the original export. This is consistent with the replacement text being typed in a later editing session with a different glyph set, even though both sessions used Calibri at the same point size.

Step 4 (Digital signature absence): The document carries no digital signature. Had the parties signed with Aadhaar e-Sign or a DSC-based PAdES signature at the claimed November 2021 date, the incremental update would either have invalidated the signature (triggering a signature-validity failure alert) or would have been provably post-dated relative to the signature timestamp. The absence of any signature removes this protection.

Conclusion: The examiner reports that the PDF was modified via an incremental update on 2023-01-17, specifically altering the page content stream containing the equity percentage figures. The original document structure is dated 2021-11-03. The modification postdates the claimed agreement date by over 14 months. The arbitration panel gives the investor's term sheet no weight as evidence of the November 2021 agreement.

Can a forger defeat PDF metadata forensics by creating a new PDF from scratch?

Creating a new PDF removes the incremental-update layer, but other channels remain. Font embedding fingerprints differ across authoring tools, and font subset hash values can be matched to known application versions. The PDF rendering engine leaves characteristic artefacts in the page content stream. Embedded images carry independent compression and EXIF evidence. A convincing forgery must also replicate the formatting conventions, paper stock, and printing characteristics of genuine documents from the claimed issuer, not just pass structural inspection.

What does ELA detect that visual inspection at high zoom cannot?

Visual inspection detects large-scale compositing artefacts: colour mismatches, hard paste edges, obvious cloning. ELA detects sub-visual differences in the image's JPEG compression history. A high-quality composite where the forger matched colour and texture precisely may be visually indistinguishable from an original, while the DCT error pattern still diverges between the pasted region and the background. ELA and visual inspection are complementary: each detects manipulations the other can miss.

Does a broken or absent digital signature always mean a PDF was forged?

No. A signature can fail verification for reasons unrelated to forgery: the certificate expired, the OCSP server is offline, the PDF library had a documented ByteRange-calculation bug, or the TSA certificate was later revoked. The examiner must distinguish a mathematically invalid signature (hash mismatch, strong evidence of post-signing modification) from a certificate-chain failure (which has its own set of explanations). Each failure mode is investigated and reported separately. For the full PKI verification workflow see [e-signatures: Aadhaar e-Sign, eIDAS, and ESIGN/UETA](/topics/questioned-document/e-signatures-aadhaar-esign-eidas-and-esign-ueta).

How does AI-generated inpainting defeat classical image authentication tools?

Diffusion-model inpainting (Adobe Firefly Generative Fill, Stable Diffusion) synthesises new pixel values rather than copying an existing region. No copy-move detection finds a match. No JPEG ghost reveals a double-compressed patch. ELA shows low error because the region has one compression history, identical to the rest of the final save. The broader implications for casework are covered under [AI-generated documents, deepfake signatures, and image manipulation detection](/topics/questioned-document/ai-generated-documents-deepfake-signatures-and-image-manipulation-detection).

Test yourself on Questioned Document with free, timed mocks.

Practice Questioned Document questions

Found this useful? Pass it along.

Spotted an error in this page? Report a correction or read our editorial standards.

Your journey to becoming a forensic professional starts here.