Clone Detection
Finds regions copied and pasted from elsewhere in the same image, the technique used to duplicate or hide an object. It combines exact block matching, which requires many copied blocks to share one shift vector, with ORB keypoint self-matching, which survives brightness changes and re-compression. Flat regions are excluded so that plain skies and backgrounds do not register as clones. It works on the pixels alone, with no model.
Technical description
I6 is a deterministic, generator-agnostic screen for copy-move forgery, in which part of an image is duplicated to cover or repeat content. Because the copied region shares the source's exact pixels (or close to them), the manipulation leaves two recoverable traces: many small blocks that are identical to blocks elsewhere, and feature keypoints that match other keypoints, in both cases displaced by the same translation. I6 runs the classic block-hashing method of Fridrich and colleagues, hardened against its two failure modes (flat regions and incidental matches) by discarding textureless blocks and requiring a consistent shift vector, alongside an ORB keypoint self-match that tolerates photometric edits and JPEG re-saves. The two signals produce a 0 to 5 score, and the larger is taken. It requires the image to be at least 64 by 64 pixels and downsamples very large images to a 1024-pixel long side.
How it works
The indicator runs deterministically at Layer 1 on the grayscale image.
The block-hashing signal tiles the image into 16 by 16 blocks at stride 8. A block whose pixel variance is below 50 is flat or textureless and is skipped, because flat blocks match each other trivially and are the dominant source of false positives. For each surviving block the two-dimensional discrete cosine transform (DCT) is taken, its 8 by 8 low-frequency sub-block is quantised by dividing by 8 and rounding, and the result is used as a hash. Blocks with the same hash are candidate copies. A real copy-move duplicates a whole region, so its block pairs all share one translation: for every pair of same-hash blocks more than 48 pixels apart, the shift (dx, dy) is computed, canonicalised so that a copy and its source map to the same bucket, and tallied in a shift histogram. Only shifts supported by at least 8 pairs are kept, and the block-hashing pair count is the total number of pairs in those dominant shifts. The score is min(5.0, 0.1 x pairs).
The ORB signal extracts up to 1500 ORB keypoints and matches every descriptor against all others by Hamming distance (k nearest with k = 3, skipping the trivial self-match). A pair is kept when its descriptor distance is below 50 and its two keypoints are more than 64 pixels apart. The kept pairs are clustered by their translation vector within a 30-pixel radius, and a cluster of at least 4 pairs is a copied region. The ORB score grows with the number of clusters and the pairs within them.
The final score is the maximum of the two signals. The metadata records the total and non-flat block counts, the number of distinct hashes, the consistent block-hashing pair count, the ORB cluster count, a backward-compatible combined clone-pair count, and whether the image was downsampled.
Score thresholds
| Score | Meaning |
|---|---|
| 0 to 1 | No duplicated region: matches are absent or scattered without a consistent shift. |
| 2 to 3 | A duplicated region of moderate size, by block hashing or ORB clustering. |
| 4 to 5 | A large or strongly supported duplicated region. Consistent with copy-move forgery. |
Why this matters
Copy-move is one of the oldest and most common image manipulations, and detecting it has a mature toolbox. Fridrich and colleagues introduced the block-matching approach, sliding a window over the image, representing each block compactly, sorting to find identical blocks, and confirming a forgery only when many matched pairs share the same shift vector, which is what separates a real copied region from chance matches [1]. The keypoint family, introduced for this task by Amerini and colleagues with SIFT, matches distinctive features against each other and clusters them, which recovers copies that have been rotated, scaled, or re-compressed, where exact block matching fails [2]. I6 uses ORB, an efficient binary keypoint and descriptor, for that role [3]. A systematic evaluation of copy-move methods established the practices this indicator follows, including the need to suppress matches in smooth, low-entropy regions to control false positives [4]. By pairing the two complementary signals and enforcing both a flat-region filter and a shift-vector consensus, I6 detects pixel-exact copies and photometrically-edited ones while staying quiet on plain backgrounds.
Limitations
The two signals have complementary blind spots, and some cases defeat both. The block-hashing signal finds near-exact copies but is weakened by strong rotation or scaling of the pasted region, where the copied blocks no longer hash like their sources; the ORB signal recovers many such cases but needs texture, so a copy pasted into or out of a smooth area yields too few keypoints to cluster. The flat-region filter that prevents false positives on skies and backgrounds also makes a copy within such a region invisible. Heavy uniform compression or noise can break the exact block hashes. The thresholds are directional, and a scene with genuine near-duplicate texture, a tiled floor or a repeating facade, can produce consistent matches that are not forgeries. Splicing from a different image, compression-history analysis, and noise consistency live in sibling indicators, so I6 stays on within-image duplication.
Theoretical background
I6 rests on the defining property of copy-move: the duplicate and its source are the same content at two places, related by a single translation. Two independent representations expose this. A compact, quantised DCT signature makes near-identical blocks collide in a hash table, and the requirement that the colliding pairs agree on one shift vector turns a set of incidental collisions into evidence of a coherent copied region, the criterion that gives the method its specificity. A binary keypoint descriptor captures the same duplication at salient points and, being invariant to brightness and robust to small geometric change, recovers copies that survived editing; clustering the keypoint matches by translation applies the same shift-consensus logic in feature space. Suppressing flat blocks encodes the prior that a textureless region carries no individuating content to copy. All of this is a property of the pixels and their geometry rather than a learned fingerprint, which keeps the screen interpretable and independent of any generator.
References
- Fridrich J, Soukal D, Lukáš J. Detection of Copy-Move Forgery in Digital Images. In: Proceedings of the Digital Forensic Research Workshop (DFRWS). 2003. https://www.ws.binghamton.edu/fridrich/Research/copymove.pdf
- Amerini I, Ballan L, Caldelli R, Del Bimbo A, Serra G. A SIFT-Based Forensic Method for Copy-Move Attack Detection and Transformation Recovery. IEEE Transactions on Information Forensics and Security. 2011;6(3):1099-1110. DOI: 10.1109/TIFS.2011.2129512
- Rublee E, Rabaud V, Konolige K, Bradski G. ORB: An Efficient Alternative to SIFT or SURF. In: IEEE International Conference on Computer Vision (ICCV). 2011. p. 2564-2571. DOI: 10.1109/ICCV.2011.6126544
- Christlein V, Riess C, Jordan J, Riess C, Angelopoulou E. An Evaluation of Popular Copy-Move Forgery Detection Approaches. IEEE Transactions on Information Forensics and Security. 2012;7(6):1841-1854. DOI: 10.1109/TIFS.2012.2218597