Benford Test
Tests whether the leading digits of numbers in a chart follow the expected natural pattern. Fabricated numbers often have too many values starting with 5, 6, 7 because people avoid repeating small digits.
Technical description
OCR-extracts the numbers from the chart, removes axis tick labels (a number left of the detected y-axis line or below the x-axis line, with a 12% margin fallback, is an assigned scale value, not measured data, and does not obey Benford), and on the remaining positive values tests the first significant digit against Benford's Law, where digit d occurs with probability log10(1 + 1/d). Runs only when at least twenty data values remain and the max/min ratio is at least ten. Grades the mean absolute deviation (MAD) from Benford on Nigrini's bands (< 0.006 close = 0.0, < 0.012 acceptable = 1.0, < 0.015 marginal = 2.5, above = 4.0) and adds 1.0 when a chi-squared test has p < 0.01. Score capped at 5.0.
How it works
Layer 1 (deterministic). OCR-reads all numbers, removes axis tick labels by position, and on the remaining positive data values (at least twenty, spanning an order of magnitude) compares the first-digit distribution to Benford via the mean absolute deviation and a chi-squared test. Grades the MAD on Nigrini's bands, adds a chi-squared bonus, caps at 5.0, and reports the MAD, chi-squared p-value, and per-digit distribution.
Why this matters
In many naturally occurring datasets the leading digit is far from uniform: about 30% of values begin with 1 and only about 5% with 9. Data from a multiplicative process spanning several orders of magnitude inherits this pattern, while some fabricated numbers, distributed too evenly by a human, deviate measurably. Nigrini turned Benford's Law into a practical fraud-detection method with mean-absolute-deviation conformity bands, used here to grade a chart's plotted numbers. The test is meaningful only on measured data spanning a wide range, which is why axis ticks are removed and a range guard is applied.
Score thresholds
- 0-1
- The leading digits follow Benford closely or acceptably, consistent with natural data
- 2-3
- Marginal conformity: the distribution departs from Benford but not decisively
- 4-5
- Nonconformity reinforced by a significant chi-squared result
Limitations
A screen, not proof, with well-documented limits. It needs a large sample for power; at the twenty-value floor a chart can reach it is underpowered, and the ordinary chi-squared test performs poorly on very small samples, so passing is weak evidence of authenticity. The law does not apply to narrow-range, assigned, or bounded numbers (percentages, years) and the range guard only partly screens these. First-digit conformity is a weak fabrication signal: fabricators often reproduce the first-digit decline while failing on later digits. Most ordinary bar and line charts do not provide twenty data values spanning an order of magnitude. Second-digit, last-digit, and mean-and-SD plausibility live in sibling indicators.
References
- Benford F. (1938). The law of anomalous numbers. Proceedings of the American Philosophical Society 78(4):551-572
- Nigrini MJ. (2012). Benford's Law: Applications for Forensic Accounting, Auditing, and Fraud Detection. John Wiley & Sons
- Diekmann A. (2007). Not the First Digit! Using Benford's Law to Detect Fraudulent Scientific Data. Journal of Applied Statistics 34(3):321-329
- Cerasa A. (2022). Testing for Benford's Law in very small samples: Simulation study and a new test proposal. PLOS ONE 17(7):e0271969