ResAIKit
Research Integrity Toolkit
Back to the encyclopedia
G7-imgImage forensicsChart AnalysisLayer 1 (Deterministic)

Too-Perfect Data

Tests whether data points in a chart follow suspiciously perfect statistical distributions, such as perfectly bell-shaped curves or unrealistically smooth trends that real data rarely achieves.

Technical description

Extracts every number from the figure by OCR, separates plotted data values from axis tick labels (a number left of the detected y-axis line or below the x-axis line, with a 12% margin fallback, is an axis label and is excluded because ticks are equidistant by construction), and runs five distributional tests on the data values: Shapiro-Wilk normality (p > 0.99 adds 1.5), skewness (|skew| < 0.01 adds 1.0), excess kurtosis (|kurtosis| < 0.1 adds 0.5), outlier absence by the IQR rule (zero outliers with n > 30 adds 1.0), and coefficient of variation (CV < 0.02 with mean > 1 adds 1.0). Requires at least ten data values after axis labels are removed. Score is capped at 5.0.

How it works

Layer 1 (deterministic). OCR-reads all numbers, removes axis tick labels by position, and on the remaining data values computes a Shapiro-Wilk normality p-value, skewness, excess kurtosis, the IQR-rule outlier count, and the coefficient of variation. Each suspiciously clean result adds to the score, which is summed and capped at 5.0, and reported with the data and excluded-label counts.

Why this matters

Real measurements carry sampling noise, occasional outliers, and asymmetry; fabricated data tends toward suspicious cleanliness. Forensic statistics has a strong record of detecting fabrication from the shape of the data alone: values inconsistent with random sampling, excessively similar, or unexpectedly uniform with an absence of natural variation. G7 tests for too-perfect normality, symmetry, peakedness, lack of outliers, and tightness. Crucially it tests only the plotted data, not the axis grid, which is a uniform coordinate scale rather than a sample and would otherwise manufacture the very cleanliness being screened for.

Score thresholds

0-1
The plotted values carry normal sampling messiness: some skew, some spread, the occasional outlier
2-3
One or two cleanliness signals: near-perfect symmetry, suspicious normality, or an unusually tight spread
4-5
Several signals together: perfectly normal, symmetric, outlier-free, and tightly clustered, consistent with fabricated data

Limitations

Needs at least ten plotted data values read by OCR after axis labels are removed, so charts that print only a few values, or only bars and an axis, are not scored. The axis-label split is positional and assumes a left y-axis and bottom x-axis; title or legend numbers can be misclassified. Thresholds are strict so ordinary data does not trip them, which means fabrication that keeps some noise passes; a chart plotting several different series together can look messier than any one series. Digit-level fabrication, mean-and-SD plausibility (GRIM/SPRITE), and p-value distributions live in sibling chart indicators.

References

  1. Simonsohn U. (2013). Just Post It: The Lesson From Two Cases of Fabricated Data Detected by Statistics Alone. Psychological Science 24(10):1875-1888
  2. Carlisle JB. (2017). Data fabrication and other reasons for non-random sampling in 5087 randomised, controlled trials in anaesthetic and general medical journals. Anaesthesia 72(8):944-952
  3. Crone G, Green CD. (2025). Tools of the data detective: A review of statistical methods to detect data and result anomalies in psychology. Theory & Psychology
  4. Luo J, Li Z, Wang J, Lin CY. (2021). ChartOCR: Data Extraction from Charts Images via a Deep Hybrid Framework. IEEE/CVF WACV 2021