DEBIT Test
Checks binary proportions reported in the text for basic mathematical validity. A proportion written as a count over a total, such as 15 out of 30, is impossible if the count is negative, is not a whole number, or exceeds the total. The indicator finds these count-over-total expressions, filters out look-alikes such as dates, grant and regulation numbers, blood pressures, and ratios, and flags any remaining proportion that no real count could produce.
Technical description
Works in the spirit of the DEBIT (DEscriptive BInary Test) of Heathers and Brown, which checks that the mean m, standard deviation, and sample size N of a binary variable are mutually consistent: for a 0-or-1 variable the SD is fixed by the mean, SD = sqrt(N/(N-1) * m * (1-m)). On the text S6 extracts proportions written as a fraction k/n or as a phrase k of n or k out of n, and verifies that k is a non-negative integer no greater than n. A count exceeding its total, a negative count, or a non-integer count is impossible for a binary outcome and is flagged. Look-alikes are excluded first: calendar dates, grant and funding identifiers, EU regulation and directive numbers, fractions whose components fall in the year range 1900 to 2100, fractions near legal-document keywords, and clinical or ratio contexts in which a count exceeding a total is normal, namely blood pressure (a fraction followed by mmHg or near systolic/diastolic/BP), Snellen visual acuity (near vision/acuity, where 20/15 is valid), and explicit ratios (the bare word odds, or hazard ratio, aspect ratio, or ratio). In addition, any pre-extracted (mean, SD, N) triplet whose label explicitly marks a binary or dichotomous variable is checked against the full DEBIT relation: a reported SD that no integer count of ones reproduces together with the reported mean, each at its stated precision, is flagged.
How it works
Layer 1 (deterministic): the text is scanned for the fraction form k/n and the phrase form k of n or k out of n. Fractions are screened against the date, grant, regulation, year, legal-keyword, blood-pressure, Snellen, and ratio exclusions (the ratio set includes the bare word odds, so an odds written 3/1 is not flagged); survivors are passed to the shared debit_test, which returns false when k is negative, non-integer, or greater than n. Each (mean, SD, N) triplet whose label explicitly marks a binary or dichotomous variable is also passed to debit_mean_sd_test(mean, sd, n), which returns false when no integer count of ones reproduces both the reported mean and SD at their stated precision. The flag count maps to the score: zero failures 0.0, exactly one 3.0, two or more 4.5 (capped at 5.0). Each failure is an error-severity finding naming the offending proportion; metadata records proportions found, total failures, binary_triplets_checked, and binary_sd_failures.
Why this matters
A count that exceeds its own total is a hard impossibility, not a matter of degree, and is one of the clearest single-number signs that a result was fabricated or mis-transcribed. The DEBIT framework of Heathers and Brown showed that binary summary statistics carry strong internal constraints that fabricators routinely violate, because the mean, standard deviation, and counts of a 0-or-1 variable are tightly linked and hard to fake jointly. The same arithmetic logic powers the GRIM granularity test on means, and forensic re-analysis of clinical trials treats impossible summary statistics, including counts that cannot fit their denominators, as a primary signal of fabrication. Suppressing the common false-positive classes keeps a flag interpretable as a genuine numerical impossibility.
Score thresholds
- 0
- Every detected binary proportion is valid, with the count between zero and its total.
- 3
- Exactly one impossible proportion, for example a count larger than its stated total.
- 4-5
- Two or more impossible proportions, consistent with fabricated or mis-transcribed counts.
Limitations
Tests the count-validity part of the descriptive binary idea and the full mean-and-standard-deviation consistency form of DEBIT, but the latter fires only when a (mean, SD, N) triplet's label explicitly marks the variable as binary or dichotomous, because the mean and SD test is sound only for a known-binary variable and a continuous proportion must not be treated as a 0-or-1 variable. A binary variable whose label lacks an explicit marker is left untested by the mean and SD form. It depends on proportions being written as an explicit count over a total in the text; a proportion given only as a percentage or split across a table is handled by the arithmetic and table indicators. The exclusion lists are heuristic, so an unusual date, identifier, or ratio can slip through, and a genuine impossible proportion written immediately next to a blood-pressure or ratio term can be suppressed; the windows are deliberately short to limit this. The granularity tests on means and standard deviations are S3, S4, and S5, and table arithmetic consistency is S1.
References
- Heathers JAJ, Brown NJL. (2019). Using Statistics from Binary Variables to Detect Data Anomalies, Even Possibly Fraudulent Research. Psychology Research and Applications 1(4)
- Brown NJL, Heathers JAJ. (2017). The GRIM Test: A Simple Technique Detects Numerous Anomalies in the Reporting of Results in Psychology. Social Psychological and Personality Science 8(4):363-369
- Jung L. (2025). scrutiny: Error Detection in Science (R package version 0.6.1). Comprehensive R Archive Network (CRAN)
- Crone G, Green CD. (2025). Tools of the data detective: A review of statistical methods to detect data and result anomalies in psychology. Theory & Psychology 35(3):359-380
- Carlisle JB. (2017). Data fabrication and other reasons for non-random sampling in 5087 randomised, controlled trials in anaesthetic and general medical journals. Anaesthesia 72(8):944-952
- Bordewijk EM, Li W, van Eekelen R, Wang R, Showell M, Mol BW, van Wely M. (2021). Methods to assess research misconduct in health-related research: A scoping review. Journal of Clinical Epidemiology 136:189-202
- Hunter KE, Aberoumand M, Libesman S, et al.. (2024). The Individual Participant Data Integrity Tool for assessing the integrity of randomised trials. Research Synthesis Methods 15(6):917-939
- Wilkinson J, Heal C, Antoniou GA, et al.. (2024). A survey of experts to identify methods to detect problematic studies: stage 1 of the INveStigating ProblEmatic Clinical Trials in Systematic Reviews project. Journal of Clinical Epidemiology 175:111512