ResAIKit
Research Integrity Toolkit
Back to the encyclopedia
M2Image forensicsMicroscopyLayer 1 (Deterministic)

Panel Overlap

Detects when the same field of view is reused across panels of a microscopy figure to stand in for different experimental conditions. The image is split into quadrants and halves, and every pair is compared twice: by matching ORB keypoints and verifying their geometry with a RANSAC homography, and by masked normalized cross-correlation on the content pixels. Printed labels and headers are masked out before matching, so panels are flagged for shared image content, not for sharing the same caption. It works on the pixels alone, with no model.

Technical description

M2 is a deterministic, generator-agnostic screen for intra-figure region reuse, the most common form of image manipulation found in the biomedical literature. A genuine multi-panel figure shows distinct fields of view, one per condition; a fabricated one reuses a single capture, sometimes shifted, rotated, flipped, or rescaled, to manufacture an apparently independent result. M2 partitions the image into its four quadrants and into top, bottom, left, and right halves, then compares the six quadrant pairs and the two half pairs with two complementary detectors. The first matches ORB keypoints between two regions and fits a RANSAC homography, accepting the pair only when a large fraction of matches are geometrically consistent and the recovered transform is effectively affine, which is what a genuine copy-move duplication produces. The second computes the normalized cross-correlation of the two regions over the pixels that both regions agree are image content rather than text. Either detector can raise a pair. The image must be at least 128 by 128 pixels, or the indicator returns a zero score and records that it was skipped.

How it works

Each compared region is masked before matching. A content mask is built by binarising with Otsu, finding connected components, keeping those whose size and aspect ratio match printed characters, and dilating them along rows into a text exclusion zone; the complement is the content available for matching. This stops panels that share only an identical label, such as the same antibody name printed over every blot, from matching on the text.

ORB keypoints and binary descriptors are detected inside the content mask of each region. Descriptors are matched with a brute-force Hamming matcher under Lowe's ratio test, keeping a match when its nearest-neighbour distance is below 0.75 times the second-nearest. If at least ten matches survive, a homography is fit with RANSAC at a reprojection threshold of five pixels, and the inlier ratio is

inlier_ratio = inliers / good_matches.

A pair is accepted on geometry when the inlier ratio is at least 0.5 with at least ten inliers and the homography is effectively affine. The affine test reads the projective row of the homography, normalised so that h33 = 1, and requires the perspective influence across the region,

influence = |h31| * width + |h32| * height,

to stay below 0.1. A genuine duplication is a planar affine map (translation, rotation, flip, and uniform or mild anisotropic scale), so its perspective terms are near zero; a large perspective influence means RANSAC fit a projective transform to coincidental matches rather than recovering a real copy, and the pair is rejected. The bounding boxes of the inlier keypoints, padded and mapped back to full-image coordinates, locate the matched region in each panel.

In parallel, the masked normalized cross-correlation is computed over the pixels that both content masks accept. With f and g the two mean-subtracted, content-masked regions,

NCC = sum(f * g) / sqrt(sum(f^2) * sum(g^2)),

which is one for an exact copy and near zero for unrelated content. A pair is accepted when the larger of the inlier ratio and the NCC exceeds the corresponding threshold (0.5 for geometry, 0.85 for correlation).

The score follows the number of accepted pairs. No pair scores zero. One pair scores 2.0 + 2 * max_ncc, capped at 5.0, so a near-exact duplicate approaches the maximum. More than one pair scores 3.0 + 0.5 * n_pairs, capped at 5.0. Each accepted pair yields two findings, one on the source region and one on the duplicate, marked critical above a similarity of 0.95 and warning otherwise, and a colour-coded difference overlay in the metadata renders the two panels beside their false-colour absolute difference. The metadata records the number of pairs compared, the number of accepted overlap pairs, and the maximum NCC.

Score thresholds

Score Meaning
0 to 1 No region reuse detected across quadrants or halves.
2 to 3 One overlapping pair, or several weak matches. A similar specimen can produce moderate similarity.
4 to 5 One near-exact duplicate, or multiple overlapping pairs. Consistent with a field of view reused across conditions.

Why this matters

Reusing one image to represent several experiments is the single most prevalent integrity problem in the published record: a survey of more than twenty thousand papers found problematic figures in 3.8 percent of them, with image duplication the dominant category and a substantial share showing signs of deliberate manipulation [1]. Detecting that reuse from the pixels is the copy-move forgery problem, and the keypoint-and-geometry pipeline M2 uses is the approach the forensics literature established for it: SIFT-class features are matched and a geometric transform is fit to confirm that one region is a transformed copy of another, which also recovers how the copy was placed [2]. Comparative evaluation of copy-move detectors confirms that keypoint methods with geometric verification are robust and efficient against the rotation, scaling, and compression that manipulators apply [3]. M2 builds this from ORB, the fast rotation-invariant binary feature designed as an efficient alternative to SIFT and SURF [4], and RANSAC, the robust estimator that fits a model while rejecting the outlier matches that coincidental texture produces [5], with Lowe's ratio test screening ambiguous matches before fitting [6]. Restricting the accepted transform to an affine map encodes the fact that a duplication is a planar copy, not a perspective reprojection.

Limitations

The screen reasons at the scale of quadrants and halves, so a small duplicated insert that does not align with those partitions can be missed, and a duplication that spans a quadrant boundary is split across comparisons. ORB struggles on low-texture content such as a uniform fluorescence field, where too few keypoints survive for the geometry test, leaving only the correlation detector. Genuinely similar specimens, periodic structures, and tiling patterns can raise the correlation or produce consistent matches without manipulation. The text mask assumes dark printed characters on a lighter background and can miss inverted or stylised labels. The affine restriction is deliberate and will not flag a region that has been strongly perspective-warped before pasting. General block-based copy-move anywhere in the image, regardless of panel layout, is the job of the clone-detection indicator I6, while M2 specialises in panel-to-panel field-of-view reuse with geometric and correlation confirmation; the two overlap by design and corroborate each other. Depth-of-field and noise-based composite cues live in sibling indicators, so M2 stays on region correspondence.

Theoretical background

M2 rests on the geometry of copying. When a region is duplicated, the copy and its source are related by a planar transform: the pixels are the same up to a rotation, a reflection, a scale, and a translation, and possibly a mild anisotropic stretch, but not a perspective change, because no new viewpoint is involved. Two independent measurements expose that relationship. Local features give a sparse set of corresponding points whose consistency under a single homography is strong evidence of a shared origin, and RANSAC makes that test robust by finding the largest subset of matches that one transform explains while discarding the rest. Normalized cross-correlation gives a dense, photometric measurement that is invariant to global brightness and contrast and is decisive when the copy is unrotated and unscaled, the case where keypoints are least distinctive. Masking the text first removes the one source of spurious agreement that has nothing to do with the imaged specimen. Reading the two detectors together, and requiring the recovered geometry to be affine, lets M2 separate a true duplication from the coincidental similarity of two genuine captures.

References

  1. Bik EM, Casadevall A, Fang FC. The Prevalence of Inappropriate Image Duplication in Biomedical Research Publications. mBio. 2016;7(3):e00809-16. DOI: 10.1128/mBio.00809-16
  2. Amerini I, Ballan L, Caldelli R, Del Bimbo A, Serra G. A SIFT-Based Forensic Method for Copy-Move Attack Detection and Transformation Recovery. IEEE Transactions on Information Forensics and Security. 2011;6(3):1099-1110. DOI: 10.1109/TIFS.2011.2129512
  3. Christlein V, Riess C, Jordan J, Riess C, Angelopoulou E. An Evaluation of Popular Copy-Move Forgery Detection Approaches. IEEE Transactions on Information Forensics and Security. 2012;7(6):1841-1854. DOI: 10.1109/TIFS.2012.2218597
  4. Rublee E, Rabaud V, Konolige K, Bradski G. ORB: An efficient alternative to SIFT or SURF. Proceedings of the 2011 IEEE International Conference on Computer Vision (ICCV). 2011:2564-2571. DOI: 10.1109/ICCV.2011.6126544
  5. Fischler MA, Bolles RC. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM. 1981;24(6):381-395. DOI: 10.1145/358669.358692
  6. Lowe DG. Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision. 2004;60(2):91-110. DOI: 10.1023/B:VISI.0000029664.99615.94