Image File Format and Integrity Checks
Structural analysis of image containers such as JPEG, PNG, and TIFF can reveal inconsistencies introduced by post-capture editing or format conversion. This topic covers how examiners inspect file headers, chunk structures, and embedded data to detect signs of tampering.
Last updated:
Every digital image file is a container: a structured sequence of bytes that declares its format, encodes pixel data, and stores ancillary information such as camera settings, timestamps, and software history. Image file format integrity checks examine that structure to determine whether the file is consistent with an unmodified original or whether it shows evidence of re-saving, format conversion, or deliberate alteration. Examiners inspect file headers, internal segment or chunk sequences, embedded metadata blocks, and the relationship between declared properties and actual file content. A JPEG whose quantisation tables do not match those of the camera model named in its EXIF header, or a PNG containing a chunk type never produced by any camera firmware, carries structural evidence of post-capture intervention.
Format-level analysis is the first structural layer of image authentication. It does not require access to the original capture device and can be applied to any file in the examiner's possession. The analysis complements pixel-level techniques such as noise inconsistency checks and copy-move detection by providing a different class of evidence: not what the image shows, but how the file was written. Courts in the United States, the United Kingdom, the European Union, and India all treat file provenance evidence as relevant to authenticity; the Bharatiya Sakshya Adhiniyam 2023 (which replaced the Indian Evidence Act 1872) retains the requirement that electronically produced evidence be shown to have been produced by a computer operating properly and without opportunity for tampering, making format integrity documentation directly relevant to admissibility.
Three image formats account for the vast majority of forensic casework: JPEG (used by nearly all consumer cameras and smartphones), PNG (common in screenshots, edited images, and web graphics), and TIFF (used in professional and law-enforcement imaging workflows). Each format has a distinct internal structure, and each leaves a characteristic signature when it is re-encoded, converted, or modified. Understanding those signatures allows an examiner to reconstruct part of an image's processing history from the file itself, before any pixel-level analysis begins.
By the end of this topic you will be able to:
- Describe the internal structure of JPEG, PNG, and TIFF files and identify the segments or chunks that carry integrity-relevant information.
- Explain how JPEG double-compression and quantisation table analysis detect re-saving outside the original capture device.
- Identify anomalous PNG chunk types, sequences, and CRC mismatches that indicate post-capture modification.
- Interpret EXIF and XMP metadata fields as provenance claims and test their consistency against the file's structural properties.
- Apply format integrity findings within a broader authentication report, describing what the evidence establishes and what it does not.
- File signature (magic bytes)
- The fixed byte sequence at the start of a file that identifies its format. JPEG files begin with FF D8 FF; PNG files begin with 89 50 4E 47 0D 0A 1A 0A. A file whose extension does not match its magic bytes has been mislabelled, which itself requires explanation.
- JPEG quantisation table
- A matrix of 64 values embedded in a JPEG file that controls the precision with which each frequency component of the image is encoded. Camera manufacturers and software applications use distinct quantisation table sets; matching a file's tables against a known-camera database is a core technique for identifying the encoding device or application.
- PNG chunk
- The basic structural unit of a PNG file. Each chunk has a four-character type code, a length field, a data payload, and a CRC-32 checksum. Critical chunks (IHDR, IDAT, IEND) must be present and in a defined order; ancillary chunks are optional. Unexpected or malformed chunks indicate post-processing.
- EXIF (Exchangeable Image File Format)
- A metadata standard embedded in JPEG and TIFF files by capture devices. EXIF fields record camera make and model, lens data, exposure parameters, GPS location, and a modification timestamp. EXIF is writable by editing software, so its contents are a provenance claim to be verified, not a trusted record.
- XMP (Extensible Metadata Platform)
- An Adobe-defined metadata format stored as an XML packet embedded in image files. XMP carries a software history field (xmpMM:History) that editing applications append to when they process a file. A populated XMP history block is direct structural evidence of post-capture software processing.
- Thumbnail inconsistency
- Many cameras embed a reduced-resolution preview image inside the EXIF header of a JPEG. If the main image has been cropped or altered, the embedded thumbnail may retain the original framing. A mismatch between the thumbnail and the main image is a structural indicator of post-capture editing.
JPEG internal structure and integrity markers
A JPEG file is a sequence of markers, each two bytes long (FF followed by a type byte), followed by a length-prefixed data segment. The file begins with SOI (Start of Image, FF D8) and ends with EOI (End of Image, FF D9). Between them, the examiner finds the APP0 or APP1 marker (which carries JFIF or EXIF metadata), one or more DQT markers (Define Quantisation Table), a SOF marker (Start of Frame, which records image dimensions and encoding parameters), DHT markers (Define Huffman Table), and one or more SOS markers (Start of Scan, which precede the compressed image data).
Forensic analysis of a JPEG begins with a marker audit: are the markers present in a sequence consistent with camera-generated output? Cameras typically produce a fixed sequence. Editing software may reorder markers, introduce new ones (for example, an ICC colour profile marker APP2), or omit markers that the camera included. A JPEG that begins with JFIF (APP0) and also contains full EXIF data (APP1) is unusual, because cameras write one or the other, not both; the combination typically indicates that an image editor rewrote the file header.
The DQT segments are the most forensically valuable part of the JPEG structure. Each camera model uses a specific quantisation table set, derived from the manufacturer's quality settings. Researchers have assembled databases of camera-specific quantisation tables. When the tables in a questioned image match a known camera model but the EXIF claims a different model, or when the tables match Photoshop's defaults rather than any camera, the file's claimed origin is contradicted by its own structure. This analysis does not require the original camera, only the file and a reference database.
JPEG double-compression and re-save detection
JPEG compression divides an image into 8x8-pixel blocks, applies a Discrete Cosine Transform (DCT) to each block, and then quantises the resulting frequency coefficients using the quantisation table. Each coefficient is rounded to the nearest multiple of the corresponding quantisation value. When a JPEG is decoded, altered, and re-encoded at a different quality setting, the DCT coefficients go through two rounds of quantisation. The statistical distribution of DCT coefficients in a doubly-compressed JPEG differs from that of a singly-compressed one in a measurable way: the histogram of coefficients develops characteristic dips at multiples of the first-round quantisation step.
This phenomenon is the basis of double JPEG compression analysis, a well-established forensic method. Tools such as the Ghosts algorithm and related approaches detect the periodicity in the coefficient histogram caused by prior quantisation. If the entire image shows uniform double-compression artefacts, the most likely explanation is a workflow re-save (for example, the image was opened and resaved at a lower quality). If only a region of the image shows double-compression artefacts while the surrounding area does not, that region may have been inserted from a singly-compressed source, or vice versa.
| Feature | Single compression | Double compression at same quality | Double compression at different quality |
|---|---|---|---|
| DCT coefficient histogram | Smooth, approximately Laplacian | Smooth (same table cancels out) | Periodic dips at first-round step multiples |
| Blocking artefacts | Uniform 8x8 pattern | Same pattern | Shifted or misaligned 8x8 blocks in some regions |
| Quantisation tables in file | Match original camera | Match original camera | May match editing software defaults |
| Thumbnail consistency | Thumbnail matches main image | Thumbnail matches main image | Thumbnail may retain pre-edit version |
Regional double-compression analysis is more complex than whole-image analysis. The boundary between a doubly-compressed inserted region and a singly-compressed background is not always sharp, because JPEG block boundaries do not align with object edges. Examiners use this analysis to flag areas for further investigation, not to assert the exact boundary of an inserted element. Pixel-level methods such as copy-move and splicing detection are used in conjunction to narrow the location and nature of any manipulation.
PNG chunk structure and anomaly detection
A PNG file begins with an eight-byte signature (89 50 4E 47 0D 0A 1A 0A) followed by a sequence of chunks. The structure is strictly defined by the PNG specification (ISO/IEC 15948:2004). The first chunk must always be IHDR (Image Header), which declares image dimensions, bit depth, colour type, and interlacing method. The last chunk must always be IEND (Image End). Between them, one or more IDAT chunks carry the compressed pixel data. All other chunks are ancillary and optional.
Forensic chunk analysis checks for: duplicate critical chunks (two IHDR chunks indicates deliberate or accidental header manipulation); chunks appearing in a sequence that violates the PNG specification; ancillary chunks not produced by any camera firmware (such as tEXt chunks containing software names or comment fields populated by editing tools); and CRC mismatches, which indicate that a chunk's data was modified after the checksum was computed. A CRC mismatch on an IDAT chunk is particularly significant because it means the compressed pixel data was altered, which cannot happen accidentally.
The tEXt and iTXt chunks carry human-readable text and can record the software that created or modified the file. A tEXt chunk with the keyword 'Software' and value 'Adobe Photoshop 26.0' is structural evidence that Photoshop wrote the file. This does not prove manipulation of the image content, because legitimate workflows process images in editing software, but it does contradict a claim that the file is an unmodified camera original. The examiner's task is to determine whether the declared processing history is consistent with the claimed context.
TIFF structure and professional imaging workflows
TIFF (Tagged Image File Format, standardised under ISO 12234-2 for the raw form and widely used in its general form) is the preferred format in many law-enforcement and forensic imaging workflows because it supports lossless compression and carries extensive metadata. A TIFF file consists of an Image File Header, one or more Image File Directories (IFDs), and the associated image data. Each IFD is a list of tagged data fields; the tags carry both technical metadata (image dimensions, photometric interpretation, compression type) and provenance metadata (make, model, software, date/time).
Forensic TIFF analysis focuses on the tag inventory. A camera-generated TIFF contains a predictable set of tags. Tags absent in the original but present in the questioned file, or tags whose values are inconsistent with the declared camera (for example, a tag claiming Professional Photo Suite as the writing software in a file claimed to be a direct camera output), indicate post-capture processing. Sub-IFDs, which are nested directory structures used to store thumbnail images or alternate resolutions, are also checked for consistency with the main image.
TIFF supports multiple compression methods within a single file, including no compression, LZW, PackBits, and CCITT Group 4 for bilevel images. The declared compression tag must match the actual data encoding. A file declaring LZW compression but containing data that does not decompress correctly under LZW indicates corruption or deliberate alteration of the compression tag to hide the actual encoding method.
EXIF and XMP metadata as provenance evidence
EXIF metadata is written by the capture device at the moment of capture and is intended to record the conditions under which the image was taken. The DateTimeOriginal field records when the shutter was pressed; the Make and Model fields record the camera; Software records the firmware version or, if the file has been processed, the editing application. Examiners treat the full EXIF block as a set of provenance claims and test each claim for internal consistency and consistency with the file's structural properties.
Three inconsistency patterns are most significant in practice. First, the EXIF Software field names an editing application while the file's quantisation tables match a specific camera model: this is consistent with a workflow where the image was opened but not re-saved at a different quality. Second, the EXIF Software field names a camera firmware version but the quantisation tables match Photoshop defaults: this is consistent with an attempt to restore camera-like EXIF after editing. Third, the EXIF DateTimeOriginal is earlier than the file's filesystem creation timestamp: this is expected for scanned or transferred images but requires explanation in claimed originals.
XMP metadata extends the provenance record further. The xmpMM:History element is an ordered sequence of editing events, each recording the action, the software instance identifier, the date, and the parameters of the action. When this field is populated, it documents the processing history in the file's own structure. The chain of custody for digital media, as discussed in Chain of Custody for Digital Media, requires that any processing applied to an image after acquisition be documented; XMP history is one form of that documentation when present in submitted evidence.
GPS coordinates embedded in EXIF (the GPSLatitude, GPSLongitude, and GPSAltitude tags, when the device recorded them) are a verifiable provenance claim. If the stated location is inconsistent with the scene depicted, or if the GPS timestamp differs from the DateTimeOriginal by more than a few seconds, the coordinates require explanation. They may indicate a device with an incorrect GPS lock, a file transplanted from another device, or deliberate metadata editing.
Reporting format integrity findings in legal proceedings
Format integrity analysis produces findings about the processing history of a file, not about the truth or falsity of what the image depicts. An examiner who finds that a JPEG was re-saved by Photoshop can say that the file is not consistent with an unmodified camera original; they cannot say from format analysis alone what, if anything, was changed in the image content. Reports should state precisely what the structural evidence shows, what interpretations are supported, and what further analysis would be required to characterise any manipulation.
Legal frameworks in several jurisdictions have addressed the admissibility of digital image evidence. In the United States, Federal Rule of Evidence 901(b)(9) allows authentication of a process or system through evidence describing the process and showing that it produces an accurate result; forensic format analysis is one such process description. In the United Kingdom, the ACPO Good Practice Guide for Digital Evidence (superseded by the NPCC guidelines but still referenced in courts) requires that any process applied to digital evidence be documented and that the examiner be able to explain what they did and why. The Bharatiya Sakshya Adhiniyam 2023 in India, sections 61-65, addresses electronic records and requires a certificate from the responsible person that the computer producing the record was operating properly; format analysis supports that certification by characterising the file's provenance.
Expert reports in this area typically follow a structure of: tools used and their versions, each structural element examined and its findings, any anomalies detected and their significance, the overall consistency assessment, and the limitations of the analysis. The EU General Data Protection Regulation 2016/679 and the UK Data Protection Act 2018 also bear on the handling of images that contain personal data (particularly biometric data), requiring that forensic processing of such images be covered by a legitimate legal basis.
A JPEG file begins with the bytes FF D8 FF E0 and the extension .png. What does this indicate?
Key Takeaways
- Image file format integrity checks examine a file's internal structure (headers, markers or chunks, metadata blocks, and data segments) to determine whether the file is consistent with an unmodified camera original or shows evidence of post-capture processing.
- JPEG quantisation tables are the most reliable structural indicator of the software that last compressed the file; mismatches between the tables and the declared camera model contradict a camera-original claim without requiring access to the original device.
- PNG chunk analysis checks for non-standard chunk sequences, unexpected ancillary chunks naming editing software, and CRC mismatches; a CRC mismatch on an IDAT chunk is strong evidence of post-write modification of the pixel data.
- EXIF and XMP metadata fields are provenance claims to be tested, not trusted records; the xmpMM:History element, when populated, is direct structural evidence of post-capture editing, and thumbnail inconsistency is evidence of cropping or content substitution.
- Format integrity findings establish what processing history a file carries; they do not by themselves establish what image content was altered, so they are used alongside pixel-level methods such as noise analysis and copy-move detection to form a complete authentication opinion.
What is a file format integrity check in image forensics?
Why does re-saving a JPEG reveal editing?
What can EXIF metadata tell a forensic examiner?
How does PNG chunk analysis help detect tampering?
Is image file format evidence sufficient on its own to prove tampering in court?
Test yourself on Multimedia Authentication and Deepfake Forensics with free, timed mocks.
Practice Multimedia Authentication and Deepfake Forensics questionsSpotted an error in this page? Report a correction or read our editorial standards.