ResAIKit
Research Integrity Toolkit
Back to the encyclopedia
T9Image forensicsTable AnalysisLayer 2 (Contextual)

Textbook Data (Table)

Flags data that matches statistical theory too closely to be real. Genuine measurements scatter and are never perfectly normal, never exactly equal in spread across variables, and never land on the round effect sizes textbooks use; data that is too good is a classic fabrication signal, the one that first exposed Mendel's suspiciously perfect genetics data.

Technical description

Extracts the table grid by OCR, gates on it holding statistical data, and applies three checks to numeric columns (>=10 values). Excessive normality: flags when every column returns Shapiro-Wilk p above 0.99. Spread uniformity: flags when the coefficient of variation of the column standard deviations falls below 0.05. Textbook effect sizes: flags any column pair whose pooled Cohen's d falls within 0.01 of a conventional benchmark (0.2, 0.5, 0.8). The flag count sets the score: 0 scores 0, one 2.0, two 3.5, three 4.5.

How it works

Layer 2 (statistical): collects numeric columns and tests for excessive normality (all columns Shapiro-Wilk p above 0.99), uniform spread (coefficient of variation of column SDs below 0.05), and textbook effect sizes (a column pair's Cohen's d within 0.01 of 0.2, 0.5, or 0.8). Each check that fires is a flag; the count sets the score, escalating to error severity when all three fire.

Why this matters

The oldest documented case of suspected fabrication turned on data fitting theory too well: Fisher found Mendel's ratios matched prediction so closely that the chi-square was far smaller than chance allows. Real samples carry noise, so they deviate from perfect normality, differ in spread, and land on arbitrary effect sizes rather than round benchmarks. Data that is implausibly clean and regular is a hallmark of invention, and a fabricator reaching for a plausible result naturally writes down a conventional small, medium, or large effect.

Score thresholds

0-1
The data shows the natural deviation from theory expected of real measurements
2-3
One or two textbook-perfect properties, possibly coincidence or a clean dataset
4-5
Excessive normality, uniform spreads, and benchmark effect sizes together, consistent with data fabricated to look textbook-correct

Limitations

Fitting theory closely is not always fabrication: large, clean, well-controlled datasets can be genuinely near-normal, and standardized instruments produce comparable spreads, so the flags are suggestive and the higher bands require more than one. The Shapiro-Wilk threshold of 0.99 is strict, keeping false positives low but missing subtler over-fitting. The effect-size check compares all column pairs, including unrelated ones, so a coincidental benchmark match grows more likely with more columns and a single such flag should be read cautiously. It depends on OCR and enough values per column. The statistical-data gate skips mostly-text tables. The granularity tests T2 to T4 address a different impossibility.

References

  1. Fisher RA. (1936). Has Mendel's work been rediscovered?. Annals of Science 1(2):115-137
  2. Simonsohn U. (2013). Just Post It: The Lesson From Two Cases of Fabricated Data Detected by Statistics Alone. Psychological Science 24(10):1875-1888
  3. Carlisle JB. (2017). Data fabrication and other reasons for non-random sampling in 5087 randomised, controlled trials in anaesthetic and general medical journals. Anaesthesia 72(8):944-952