ResAIKit
Research Integrity Toolkit
Back to the encyclopedia
T1Image forensicsTable AnalysisLayer 1 (Deterministic)

Arithmetic Consistency

Checks whether the numbers in a table extracted from an image actually add up: totals equal the sum of their components, a totals row matches its column sums, percentage columns sum to about one hundred, and subgroup sample sizes sum to the reported total. It runs only on tables that hold statistical data, not text or layout tables.

Technical description

Extracts the table grid by OCR and, after a statistical-data gate (at least two numeric data cells and at least 40 percent of data cells numeric), runs four arithmetic checks with tolerances. Row sums: a column headed total/sum/overall must equal the sum of its component cells, excluding the label column and other total or percentage columns, within 1 percent. Column sums: a final totals row must match the sum of the data rows within 1 percent. Percentages: two or more percent columns must sum per row to 100 within 1.5 points. Sample sizes: subgroup N columns must sum to a total N column within 1 percent. Each violation is a flag.

How it works

Layer 1 (deterministic): reads the table by OCR, confirms it holds statistical data, then checks row totals against component sums, a totals row against column sums, percentage columns against 100, and subgroup Ns against a total N, each within a small tolerance. The flag count sets the score: 0 flags scores 0, one scores 2.0, two scores 3.0, three or more scores 4.5. Each flag becomes a finding naming the inconsistency.

Why this matters

Numbers that do not add up are among the most reliable and overlooked signs of fabricated or erroneous data. A consistent table satisfies exact identities (total equals sum of parts, percentages of a partition sum to 100, subgroup sizes sum to the sample), because the figures describe one coherent dataset. Fabrication and careless editing break these identities by changing a cell without propagating its totals, so direct arithmetic verification catches manipulations that deeper statistical tests would miss.

Score thresholds

0-1
The table's totals, percentages, and sample sizes are internally consistent
2-3
One or two arithmetic inconsistencies, possibly rounding, a transcription slip, or a real error
4-5
Three or more inconsistencies, consistent with fabricated or carelessly altered table data

Limitations

The screen depends on OCR, so a misread digit, merged cell, or multi-line header can create or hide an inconsistency, and the tolerances let small genuine errors pass. Total, percentage, and sample-size columns are found by header keywords, so an unlabelled or non-English total is missed and a mis-identified total can be falsely flagged. The statistical-data gate skips mostly-text tables rather than checking them. Percentages that legitimately exceed 100 (overlapping categories, multiple-response items) are a known false-positive source. Granularity and distributional checks are indicators T2, T3, and T4; T1 stays on the plain arithmetic.

References

  1. Nuijten MB, Hartgerink CHJ, van Assen MALM, Epskamp S, Wicherts JM. (2016). The prevalence of statistical reporting errors in psychology (1985-2013). Behavior Research Methods 48(4):1205-1226
  2. Brown NJL, Heathers JAJ. (2017). The GRIM Test: A Simple Technique Detects Numerous Anomalies in the Reporting of Results in Psychology. Social Psychological and Personality Science 8(4):363-369
  3. Carlisle JB. (2017). Data fabrication and other reasons for non-random sampling in 5087 randomised, controlled trials in anaesthetic and general medical journals. Anaesthesia 72(8):944-952