ResAIKit
Research Integrity Toolkit
Back to the encyclopedia
I7Image forensicsGeneric ForensicsLayer 1 (Deterministic)

JPEG Ghost

Recompresses a JPEG across a range of qualities and watches where each region's difference bottoms out. A region previously saved at a given quality reaches its minimum near that quality, a "ghost," so a block whose ghost quality differs from the rest of the image was spliced from a differently-compressed source. It works on the pixels of a JPEG, with no model.

Technical description

I7 is a deterministic, generator-agnostic screen for inconsistent JPEG compression history, the trace of splicing a region from a differently-compressed image. It implements Farid's JPEG ghost method: when a JPEG is recompressed at quality q, a region that was previously compressed at quality Q changes least when q is near Q, so the recompression difference, plotted against q, dips to a minimum at the region's own prior quality. A singly-compressed image dips at one quality everywhere; a spliced region dips at a different quality and stands out as a ghost. The indicator estimates each block's ghost quality across a quality sweep and flags textured blocks whose ghost quality departs from the image's consensus. It runs only on JPEG inputs, requires at least 64 by 64 pixels, and produces a 0 to 5 score.

How it works

The indicator runs deterministically at Layer 1, and only when the input is a JPEG (by file format or the JFIF signature). The image is recompressed at each quality in the sweep 50, 55, ..., 95. For quality q the per-pixel difference d_q(x, y) = mean over the RGB channels of |original - recompressed_q| is formed, its global mean is recorded, and its mean over each 16 by 16 block is stored, giving every block a curve of difference against quality.

For each block the ghost quality is the quality at which its curve is minimised. Only textured blocks are used, those whose curve spread (maximum minus minimum over the sweep) exceeds 0.5, because a flat block changes little at any quality and has no reliable minimum. Across the textured blocks the consensus ghost quality is the mode of their individual ghost qualities, and a block is anomalous when its ghost quality differs from that mode by more than 10 quality units. The score is min(5.0, 25 x (anomalous blocks / textured blocks)), so a uniform compression history scores zero and a sizeable region with a different history scores high. The estimated quality reported is the quality that minimises the global difference. The metadata records the estimated and consensus qualities, the per-quality global differences, the coefficient of variation of the block differences at the estimated quality, the textured-block count, and the anomalous-block count.

Score thresholds

Score Meaning
0 to 1 All textured regions share one ghost quality, a single uniform compression history.
2 to 3 A region of moderate size ghosts at a different quality than the rest.
4 to 5 A large region with a different compression history. Consistent with a splice from another JPEG.

Why this matters

Farid's ghost method is a foundational JPEG forensic technique. Its insight is that recompressing a JPEG at the quality it already carries barely changes it, so sweeping the quality and watching where the difference bottoms out localises regions whose prior quality differs from the host, which is exactly the signature of a splice from another JPEG [1]. The broader study of JPEG artifacts confirms that the compression history is a rich and localisable cue: block-grained analyses of the quantisation traces detect and map tampered regions by the mismatch between a block's artifacts and the global compression [2], and learned models that read the discrete-cosine-transform coefficients directly now localise manipulation from the same compression evidence [3]. Reading the compression history is one of the strongest pixel-level forensic signals because it is a physical consequence of how the file was built rather than a property of its content [4]. I7 keeps the deterministic, interpretable form of the ghost test and hardens it by estimating the ghost quality per block and requiring texture, so that flat regions and the image's own quality do not masquerade as a manipulation.

Limitations

The ghost test needs the right conditions. It works only on JPEGs and only when the spliced region's prior quality falls within the sweep and differs enough from the host to separate; a splice from a source compressed at nearly the same quality leaves no ghost, and re-saving the whole composite at a low quality can erase the prior traces. Textured content is required, since a flat region has no minimum to locate, so the flat-block filter that prevents false positives also blinds the test inside smooth areas. The block grid bounds the spatial resolution. The thresholds are directional rather than exact, and unusual but authentic content can produce block-to-block quality variation. Single-quality Error Level Analysis, noise consistency, copy-move, and frequency fingerprints live in sibling indicators, so I7 stays on the multi-quality ghost.

Theoretical background

I7 rests on the idempotence of JPEG compression at its own quality. JPEG quantises the discrete cosine transform of each 8 by 8 block; recompressing at the same quantisation reproduces nearly the same coefficients, so the recompression difference is small, while recompressing at a finer or coarser quantisation moves the coefficients more and raises the difference. Plotted against quality, the difference of a region therefore has a minimum at the region's own prior quality, the ghost. A splice carries the quantisation of its source, so its ghost sits at a different quality from the host's, and estimating the ghost quality per block turns this into a localisation. Restricting to textured blocks encodes the prior that only blocks with high-frequency content carry a measurable quantisation signature. The whole test is a property of the file's compression history rather than a learned fingerprint, which keeps it interpretable and complementary to the other pixel screens.

References

  1. Farid H. Exposing Digital Forgeries from JPEG Ghosts. IEEE Transactions on Information Forensics and Security. 2009;4(1):154-160. DOI: 10.1109/TIFS.2008.2012215
  2. Bianchi T, Piva A. Image Forgery Localization via Block-Grained Analysis of JPEG Artifacts. IEEE Transactions on Information Forensics and Security. 2012;7(3):1003-1017. DOI: 10.1109/TIFS.2012.2187516
  3. Kwon MJ, Nam SH, Yu IJ, Lee HK, Kim C. Learning JPEG Compression Artifacts for Image Manipulation Detection and Localization. International Journal of Computer Vision. 2022. arXiv:2108.12947. https://arxiv.org/abs/2108.12947
  4. Verdoliva L. Media Forensics and DeepFakes: An Overview. IEEE Journal of Selected Topics in Signal Processing. 2020;14(5):910-932. arXiv:2001.06564. https://arxiv.org/abs/2001.06564