ResAIKit
Research Integrity Toolkit
Back to the encyclopedia
I6Image forensicsGeneric ForensicsLayer 1 (Deterministic)

Clone Detection

Finds regions within an image that have been copied and pasted from other parts of the same image, a common manipulation technique used to duplicate or hide objects.

Technical description

Detects copy-move forgery (a region duplicated within the same image) with two signals. Block hashing tiles the grayscale image into 16x16 blocks at stride 8, skips flat blocks (pixel variance below 50, which match trivially and cause false positives), hashes each surviving block by its quantised 8x8 low-frequency DCT, and, for same-hash pairs more than 48 px apart, tallies their canonical shift vector; only shifts supported by at least 8 pairs count, enforcing the Fridrich shift-consensus. ORB self-matching extracts up to 1500 ORB keypoints, matches descriptors by Hamming distance, keeps pairs below distance 50 and more than 64 px apart, and clusters them by translation (at least 4 pairs). The final score is the max of the two signals, capped at 5.0.

How it works

Layer 1 (deterministic). Runs DCT block hashing with a flat-block filter and a shift-vector consensus, and ORB keypoint self-matching with translation clustering, then takes the larger score. Reports the total and non-flat block counts, the distinct-hash count, the consistent block-pair count, the ORB cluster count, a combined clone-pair count, and whether the image was downsampled.

Why this matters

Copy-move is one of the oldest and most common manipulations. Fridrich and colleagues introduced block matching, confirming a forgery only when many matched block pairs share one shift vector, which separates a real copied region from chance matches. The keypoint family (Amerini, with SIFT) matches distinctive features and clusters them, recovering copies that were rotated, scaled, or re-compressed; this indicator uses ORB for that role. A systematic evaluation established the practice of suppressing matches in smooth low-entropy regions to control false positives, which this indicator follows.

Score thresholds

0-1
No duplicated region: matches absent or scattered without a consistent shift
2-3
A duplicated region of moderate size, by block hashing or ORB clustering
4-5
A large or strongly supported duplicated region, consistent with copy-move forgery

Limitations

The signals have complementary blind spots. Block hashing finds near-exact copies but is weakened by strong rotation or scaling; ORB recovers many such cases but needs texture, so a copy in or out of a smooth area yields too few keypoints. The flat-region filter that prevents sky and background false positives also hides a copy within such a region. Heavy compression or noise can break the exact hashes. A scene with genuine repeating texture (a tiled floor, a repeating facade) can produce consistent matches that are not forgeries. Cross-image splicing, compression history, and noise consistency live in sibling indicators.

References

  1. Fridrich J, Soukal D, Lukáš J. (2003). Detection of Copy-Move Forgery in Digital Images. Digital Forensic Research Workshop (DFRWS)
  2. Amerini I, Ballan L, Caldelli R, Del Bimbo A, Serra G. (2011). A SIFT-Based Forensic Method for Copy-Move Attack Detection and Transformation Recovery. IEEE Transactions on Information Forensics and Security 6(3):1099-1110
  3. Rublee E, Rabaud V, Konolige K, Bradski G. (2011). ORB: An Efficient Alternative to SIFT or SURF. IEEE ICCV 2011, p.2564-2571
  4. Christlein V, Riess C, Jordan J, Riess C, Angelopoulou E. (2012). An Evaluation of Popular Copy-Move Forgery Detection Approaches. IEEE Transactions on Information Forensics and Security 7(6):1841-1854