ResAIKit
Research Integrity Toolkit
Back to the encyclopedia
W1Image forensicsWestern BlotLayer 1 (Deterministic)

Duplicate Bands

Finds protein bands in a western blot that are copies of one another. Reusing a single band to stand in for different lanes or experiments is a common fabrication, so the indicator detects every band, compares each pair by structural similarity across flips and rotations, and flags pairs that match too closely. It compares only bands of compatible size, because a copied band keeps its dimensions, which avoids false matches from stretching dissimilar bands to a common shape. It works on the pixels alone, with no model.

Technical description

W1 is a deterministic, generator-agnostic screen for band-level duplication in western blots. A blot presents protein bands in lanes, and a frequent manipulation is to copy one band into another lane, or from another figure, to manufacture a result. W1 detects the bands by adaptive thresholding and contour analysis, extracts a small patch around each, and compares every pair of patches with the structural similarity index, evaluated over four orientations of one patch, the original, a horizontal flip, a ninety-degree rotation, and a one-hundred-eighty-degree rotation, to catch copies that were reused in a transformed pose. A pair whose best structural similarity exceeds 0.85 is a duplicate, and the count of duplicate pairs sets the score. Before comparing, the indicator requires the two bands to have compatible dimensions, since a genuine copy preserves size, which keeps the comparison meaningful and avoids spurious matches. The image must be at least 32 pixels on a side and contain at least two detected bands, or the indicator returns a zero score and records what it found.

How it works

Bands are detected on the grayscale image with an inverted adaptive Gaussian threshold and morphological closing, and the external contours above a minimum area become band regions, sorted top to bottom and left to right. A patch is extracted around each band with a small margin and converted to grayscale.

Every pair of bands is then considered. A size-compatibility gate first checks that the two bands match in width and height to within thirty percent, or match with width and height swapped to allow a ninety-degree-rotated copy; pairs that fail are skipped, because a copy keeps its dimensions and forcing dissimilar bands to a common size before comparison would manufacture similarity. For a compatible pair, the second patch is taken in its four orientations, each resized to the first patch's shape, and the structural similarity index is computed with an odd window of up to seven pixels; the best value over the orientations is the pair's similarity. A similarity above 0.85 records a duplicate pair.

The score follows the count: no pairs score 0; a single pair scores 1.5 plus ten times its excess over the threshold, capped at 5.0, so a near-perfect match approaches the maximum; and more than one pair scores 2.5 plus half a point per pair, capped at 5.0. Each duplicate becomes a finding marking the two band locations and the similarity, critical above 0.95 and warning otherwise. The metadata records the band count, the duplicate-pair count, and the maximum similarity.

Score thresholds

Score Meaning
0 to 1 No bands are duplicates of one another.
2 to 3 One band pair matches closely, a possible reuse or two genuinely similar bands.
4 to 5 A near-perfect band match, or several duplicate pairs. Consistent with a band copied to fabricate lanes.

Why this matters

Band reuse is one of the most documented forms of figure manipulation in the life sciences: the large survey of biomedical figures by Bik and colleagues found that inappropriate duplication, including bands and panels copied within and between blots, was the dominant category of problematic images and appeared in a measurable fraction of papers [1]. Detecting it requires a similarity measure that matches human perception of whether two bands are the same, which is exactly what the structural similarity index was designed to provide, comparing local luminance, contrast, and structure rather than raw pixel differences [2]. The ethical guidelines for scientific images make the standard explicit, that splicing or duplicating bands to represent independent results is misconduct, and that detection tools are needed because the eye is easily fooled by repositioned copies [3]. The size-compatibility requirement reflects how copying actually works: a pasted band retains its dimensions, so insisting on matching size before declaring a duplicate keeps the screen specific. By comparing every band pair across orientations, W1 catches the reuse that drives much of the duplication problem.

Limitations

Band detection is the first dependency: faint bands, smears, or overlapping lanes can be missed or merged, and what is not detected cannot be compared. Structural similarity behaves poorly on near-uniform patches, so a clean, low-texture band carries little structure to match and a genuine copy of such a band can be missed, while a textured band copy is caught reliably. The size gate prevents matching dissimilar bands but will skip a copy that was strongly rescaled before pasting. Resizing each orientation to a common shape introduces interpolation that can slightly raise or lower the similarity. The four orientations cover flips and right-angle rotations but not arbitrary small rotations. The thresholds are directional rather than exact. Generic copy-move anywhere in an image is the clone indicator I6, and panel-level field-of-view reuse is the microscopy indicator M2, so W1 specialises in band-to-band duplication within a blot.

Theoretical background

W1 rests on the improbability that two independently produced bands are pixel-similar. A western blot band is the visible trace of a unique electrophoretic separation and detection, so its exact shape, intensity profile, and noise are an effectively random fingerprint; two bands from genuinely different samples share the general look of a band but not the fine structure. Copying breaks this, because a pasted band reproduces the source's structure exactly up to a simple geometric transform and any light retouching, so its structural similarity to the source is far higher than chance allows. The structural similarity index captures this by decomposing the comparison into luminance, contrast, and structural correlation terms, the last of which is the normalized covariance that distinguishes a true copy from a coincidentally similar shape. Restricting the comparison to size-compatible bands encodes the geometry of copying, since a paste preserves dimensions, and evaluating several orientations covers the simple transforms a manipulator applies to disguise reuse. Reading the best similarity over those orientations turns the uniqueness of a real band into a test of whether it was produced once or copied.

References

  1. Bik EM, Casadevall A, Fang FC. The Prevalence of Inappropriate Image Duplication in Biomedical Research Publications. mBio. 2016;7(3):e00809-16. DOI: 10.1128/mBio.00809-16
  2. Wang Z, Bovik AC, Sheikh HR, Simoncelli EP. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing. 2004;13(4):600-612. DOI: 10.1109/TIP.2003.819861
  3. Cromey DW. Avoiding twisted pixels: ethical guidelines for the appropriate use and manipulation of scientific digital images. Science and Engineering Ethics. 2010;16(4):639-667. DOI: 10.1007/s11948-010-9201-y