ResAIKit
Research Integrity Toolkit
Back to the encyclopedia
R2Statistical analysisMethodological CoherenceLayer 1 (Deterministic)

Sample Size Consistency

Checks whether the sample size is reported consistently across the abstract, methods, results, and tables, catching discrepancies that suggest data manipulation or sloppy reporting.

Technical description

R2 checks that the reported sample size is consistent across an article's sections and tables. It extracts candidate N values from each section with patterns for the common phrasings (N equals a number; a number followed by participants, patients, or subjects; a stated sample size; an enrolled count) and from any table column headed N, sample size, or count. For each section it treats the largest value as the reported total (smaller values being per-arm or subgroup counts) and compares totals across the abstract, methods, and results and between each section and the tables. A drift above twenty percent is a serious inconsistency and a smaller non-zero drift a minor one; separately, an analysed count exceeding the enrolled count is flagged as impossible. It also reconciles the participant flow: the enrolled count minus the reported losses (exclusions, withdrawals, dropouts) should equal the analysed count, and a flow that does not add up is flagged.

How it works

Layer 1 (deterministic): patterns extract a set of N values per section and the largest is the section total. For the pairs abstract-methods, abstract-results, methods-results, and each section against the tables, the drift between totals is the absolute difference over the larger value: above twenty percent adds an error finding and sets the score to at least 4.0; a smaller positive drift adds a warning and at least 2.0; equal totals are skipped. Enrollment sections (abstract, methods, study design) and analysis sections (results, findings) are pooled, and any analysed N exceeding the maximum enrolled N adds an error and sets at least 4.0. The score is the maximum across checks, capped at 5.0. Returns zero with no sections, no N values, or only undivided full text. When both an enrolled total and an analysed total are present and the analysed does not exceed the enrolled, the attrition rate (enrolled minus analysed, over the enrolled total) is reported as a participant-flow diagnostic. It also extracts the reported losses and checks that enrolled minus losses equals analysed; a flow that does not reconcile adds a warning, with the total lost and the reconciliation flag in the metadata.

Why this matters

Accounting consistently for every participant is a core reporting requirement and a sensitive integrity signal. CONSORT requires reconciling the numbers enrolled, allocated, and analysed, because an unexplained shift in the denominator between sections obscures attrition and can hide selective exclusion, and a denominator that changes undermines every statistic computed from it. Within-paper numerical inconsistencies are known to be pervasive, motivating a mechanical cross-section check of the most basic number a study reports; an analysed count larger than the enrolled count is the strongest form, being arithmetically impossible.

Score thresholds

0
Sample-size totals agree across sections and tables, or none could be extracted
2
A minor drift in the reported total between sections or tables
4-5
A drift above twenty percent, or an analysed count exceeding the enrolled count

Limitations

Extraction is pattern-based, so an N in an unanticipated phrasing is missed and a number matching a pattern but not a sample size (an event count, a dosage followed by patients) can be captured in error. Taking the largest value as the total fails when a section reports only subgroup counts or when the largest number is not the sample size. Legitimate attrition produces a real enrolled-versus-analysed difference, reported as a minor drift rather than an error, so a flag prompts checking the participant flow rather than proving misconduct. The check needs the document segmented into sections; undivided text is skipped. Whether reported statistics are consistent with the sample size is the domain of the granularity indicators; R2 focuses on the agreement of the sample size itself across the paper.

References

  1. Schulz KF, Altman DG, Moher D (CONSORT Group). (2010). CONSORT 2010 Statement: updated guidelines for reporting parallel group randomised trials. BMJ
  2. Strasak AM, Zaman Q, Pfeiffer KP, Goebel G, Ulmer H. (2007). Statistical errors in medical research: a review of common pitfalls. Swiss Medical Weekly
  3. Nuijten MB, Hartgerink CHJ, van Assen MALM, Epskamp S, Wicherts JM. (2016). The prevalence of statistical reporting errors in psychology (1985-2013). Behavior Research Methods
  4. Bordewijk EM, Wang R, Askie LM, et al.. (2020). Data integrity of 35 randomised controlled trials in women's health. European Journal of Obstetrics & Gynecology and Reproductive Biology 249:72-83
  5. Mansournia MA, Collins GS, Nielsen RO, Nazemipour M, Jewell NP, Altman DG, Campbell MJ. (2021). CHecklist for statistical Assessment of Medical Papers: the CHAMP statement. British Journal of Sports Medicine 55(18):1002-1003
  6. Parker L, Boughton S, Lawrence R, Bero L. (2022). Experts identified warning signs of fraudulent research: a qualitative study to inform a screening tool. Journal of Clinical Epidemiology 151:1-17
  7. Carlisle JB. (2021). False individual patient data and zombie randomised controlled trials submitted to Anaesthesia. Anaesthesia 76(4):472-479
  8. Wilkinson J, Heal C, Antoniou GA, et al.. (2024). A survey of experts to identify methods to detect problematic studies: stage 1 of the INveStigating ProblEmatic Clinical Trials in Systematic Reviews project. Journal of Clinical Epidemiology 175:111512
  9. Crone G, Green CD. (2025). Tools of the data detective: A review of statistical methods to detect data and result anomalies in psychology. Theory & Psychology 35(3):359-380