ResAIKit
Research Integrity Toolkit
Back to the encyclopedia
D8Statistical analysisFabrication DetectionLayer 1 (Deterministic)

Identical SDs

Looks for standard deviations that repeat across groups or variables when they should not. Two independently measured groups almost never produce exactly the same standard deviation, and two variables on different scales should not either, so reused or near-identical spread values point to copied or invented numbers. The indicator compares the standard deviations of the reported triplets, flags suspicious repetition within a source and across differently named variables, and checks that reported standard errors are consistent with the standard deviations and sample sizes.

Technical description

A deterministic screen for the reuse of dispersion values. On the (mean, SD, n) triplets, three checks run. Within-source: triplets are grouped by source and every pair's ratio of smaller to larger SD is computed; a ratio in [0.99, 1.01] (essentially identical) is counted, with more than three such pairs a strong flag and one to three a milder one. Cross-variable: pairs with different labels yet near-identical SD are flagged, since different-scale variables should not share spread; labels are normalised (lower-cased, trimmed) before comparison so a variable written with different capitalisation is not mistaken for two, and exact-duplicate statistics (same mean, SD, n) are skipped as one value captured twice. Standard-error: the expected SE = SD / sqrt(n) is recomputed and a reported SE matching no computed value within ten percent is flagged. The contributions sum, capped at 5.0.

How it works

Layer 1 (deterministic): triplets with non-positive SD are skipped. The within-source check adds 4.0 for more than three near-identical SD pairs within a source, 2.0 for one to three. The cross-variable check adds 2.0 when any differently labelled pair (after label normalisation, excluding exact-duplicate statistics) shares a near-identical SD. The standard-error check, using any SEs supplied in the context, adds 1.0 when one or more fail to reconstruct SD/sqrt(n) to the reported precision (tolerance = half a unit in the reported SE's last decimal plus the SD's rounding over sqrt(n), a granularity-aware GRIM-family bound replacing the flat ten percent). Total capped at 5.0. Metadata records sd_pairs_checked, identical_pairs, cross_variable_pairs, se_mismatches, and se_reconstructions.

Why this matters

Standard deviations summarise the scatter of a particular sample, so two independent groups producing the same SD to the reported precision is improbable, and the improbability compounds with each match. Simonsohn detected fabrication in published work from the implausible similarity of SDs across conditions. Carlisle's re-analyses repeatedly identified fabricated trials by the reuse of summary statistics including duplicated SDs, and Al-Marzouki and colleagues used variance structure to discriminate genuine from invented data. The cross-variable check captures a stronger impossibility: two variables on different scales (a weight and a height) have no reason to share an SD, so an exact match is hard to explain except by copying. The standard-error check adds internal consistency, since a reported SE must equal SD over the root of n.

Score thresholds

0
Standard deviations vary naturally across groups and variables.
2-3
A small number of near-identical standard deviations, or a standard-error inconsistency.
4-5
Many repeated standard deviations, or identical spread shared across different-scale variables.

Limitations

Works on the reported triplets, so it depends on means, SDs, and sample sizes being extracted correctly, and can only test SEs supplied to it. The one percent near-identical band treats two genuinely different SDs that round to the same reported value as identical, so coarse rounding can create apparent matches, especially when many triplets make the pair count grow quadratically. The cross-variable premise can fail legitimately when two differently named variables share a scale (two subscales of one instrument), so a cross-variable flag prompts inspection. Label normalisation reduces but does not eliminate treating one variable as two. The thresholds and the one percent band are directional. The text-level identical-SD sub-check on reported tables is part of S13, and exact value duplication in tables is S14.

References

  1. Simonsohn U. (2013). Just Post It: The Lesson From Two Cases of Fabricated Data Detected by Statistics Alone. Psychological Science 24(10):1875-1888
  2. Carlisle JB. (2017). Data fabrication and other reasons for non-random sampling in 5087 randomised, controlled trials in anaesthetic and general medical journals. Anaesthesia 72(8):944-952
  3. Al-Marzouki S, Evans S, Marshall T, Roberts I. (2005). Are these data real? Statistical methods for the detection of data fabrication in clinical trials. BMJ 331(7511):267-270
  4. George SL, Buyse M. (2015). Data fraud in clinical trials. Clinical Investigation 5(2):161-173
  5. Carlisle JB. (2021). False individual patient data and zombie randomised controlled trials submitted to Anaesthesia. Anaesthesia 76(4):472-479
  6. Bordewijk EM, Li W, van Eekelen R, Wang R, Showell M, Mol BW, van Wely M. (2021). Methods to assess research misconduct in health-related research: A scoping review. Journal of Clinical Epidemiology 136:189-202
  7. Wilkinson J, Heal C, Antoniou GA, et al.. (2024). A survey of experts to identify methods to detect problematic studies: stage 1 of the INveStigating ProblEmatic Clinical Trials in Systematic Reviews project. Journal of Clinical Epidemiology 175:111512
  8. Crone G, Green CD. (2025). Tools of the data detective: A review of statistical methods to detect data and result anomalies in psychology. Theory & Psychology 35(3):359-380
  9. Brown NJL, Heathers JAJ. (2017). The GRIM Test: A Simple Technique Detects Numerous Anomalies in the Reporting of Results in Psychology. Social Psychological and Personality Science 8(4):363-369