ResAIKit
Research Integrity Toolkit
Back to the encyclopedia
G9-imgImage forensicsChart AnalysisLayer 1 (Deterministic)

Benford Test

Tests whether the leading digits of the plotted numbers in a chart follow Benford's Law, the logarithmic distribution in which about 30% of natural values begin with a 1 and only about 5% begin with a 9. Axis tick labels are removed first, because they are assigned, rounded values that do not obey Benford. A significant deviation can indicate fabricated numbers. It works from optical character recognition (OCR) of the plotted numbers, with no model.

Technical description

G9 is a deterministic screen that applies Benford's Law to the numbers a chart plots. Benford's Law states that in many naturally occurring datasets the first significant digit d appears with probability log10(1 + 1/d), so 1 is the most common leading digit and 9 the least. Data that arises from a natural or semi-random process spanning several orders of magnitude tends to follow this distribution; certain fabricated numbers do not. The indicator extracts the numbers by OCR, removes the axis tick labels (which are assigned, rounded scale values rather than measured data and would distort the test), keeps the positive values, and compares their first-digit distribution to Benford using the mean absolute deviation (MAD) and a chi-squared test. The deviation maps to a 0 to 5 score. It requires the image to be at least 32 by 32 pixels, at least twenty data values after axis labels are removed, and a value range spanning at least one order of magnitude.

How it works

The indicator runs deterministically at Layer 1 using extract_numbers (OCR) and detect_axes.

The first step removes axis tick labels. A number whose bounding-box center (x + w/2, y + h/2) lies to the left of the detected y-axis line or below the detected x-axis line, with the fallbacks center_x <= 0.12 · W and center_y >= 0.88 · H, is an axis tick label and is excluded, because axis ticks are assigned scale values that do not obey Benford's Law.

The test then runs only when at least twenty positive data values remain and the range spans at least one order of magnitude, v_max / v_min >= 10; otherwise the indicator returns zero and records the reason.

For each value the first significant digit D in {1, ..., 9} is read from its scientific notation. Let O_d be the observed count of leading digit d, n = Σ_{d=1}^{9} O_d the total, and P(d) = log10(1 + 1/d) the Benford probability, so that the observed proportions are p_d = O_d / n and the expected counts are E_d = n · P(d). The deviation is the mean absolute deviation MAD = (1/9) Σ_{d=1}^{9} |p_d − P(d)|, graded on Nigrini's conformity bands: MAD < 0.006 is close conformity and scores 0.0, MAD < 0.012 is acceptable and scores 1.0, MAD < 0.015 is marginal and scores 2.5, and MAD >= 0.015 is nonconformity and scores 4.0. A Pearson chi-squared test of the observed against the expected counts, χ² = Σ_{d=1}^{9} (O_d − E_d)² / E_d, adds 1.0 when its p-value is below 0.01.

The score is capped at 5.0. The metadata records the data-value count n, the total numbers found and how many were excluded as axis labels, the MAD, the chi-squared p-value, and the per-digit counts O_d.

Score thresholds

Score Meaning
0 to 1 The leading digits follow Benford closely or acceptably. Consistent with natural data.
2 to 3 Marginal conformity: the distribution departs from Benford but not decisively.
4 to 5 Nonconformity, reinforced by a significant chi-squared result. The leading digits do not match Benford.

Why this matters

Benford's Law has been a forensic tool for decades. Newcomb and then Benford observed that leading digits in natural data are far from uniform, with 1 appearing about thirty percent of the time [1], and Nigrini turned the law into a practical fraud-detection method for accounting and auditing, defining the mean-absolute-deviation conformity bands this indicator uses to grade a distribution from close to nonconforming [2]. Applied to a chart, the test asks whether the plotted numbers carry the leading-digit signature of a natural process or the flatter, rounder signature that some fabricated data shows. Two preconditions make the test meaningful and are enforced directly: the data must span several orders of magnitude, and it must be measured data rather than an assigned scale, which is why the axis ticks are removed before the digits are counted. The result is a model-free screen for a specific kind of digit anomaly, one signal among the chart-forensics suite rather than a verdict on its own.

Limitations

Benford's Law is a screen, not proof, and it has well-documented limits that bound this indicator. It needs a reasonably large sample to have power; with the twenty-value floor a chart can reach, the test is underpowered, and simulation work on very small samples shows the ordinary chi-squared test in particular performs poorly there, so a passing result is weak evidence of authenticity [4]. The law does not apply to data within a narrow range, to assigned or bounded numbers such as percentages and years, or to distributions that are not the product of a multiplicative process, and the range guard only partly screens these out. First-digit conformity is also a comparatively weak fabrication signal: people fabricating numbers often reproduce the Benford-like decline of the first digit while failing on later digits, so first-digit analysis misses fabrication that gets the leading digit right [3]. The test needs at least twenty readable data values spanning an order of magnitude, which most ordinary bar and line charts do not provide. Second-digit and last-digit analyses, and mean-and-standard-deviation plausibility, live in sibling chart indicators, so G9 stays on the first-digit test.

Theoretical background

Benford's Law follows from scale invariance: if a dataset spans many orders of magnitude and is the product of a multiplicative process, the logarithms of its values are roughly uniform, which makes the leading digit logarithmically distributed with P(d) = log10(1 + 1/d). Genuine measurements of this kind inherit the pattern; numbers invented without it, or assigned on a fixed scale, do not. G9 measures the gap between the observed leading-digit proportions and this expectation with the mean absolute deviation, graded on Nigrini's empirical conformity bands, and corroborates large gaps with a chi-squared test. The decisive modelling choice is the input: because a chart's axis is an assigned, evenly spaced scale rather than measured data, its tick labels are removed so that only the plotted values are tested. The signal is a property of the data's digits rather than a learned fingerprint, which keeps the screen independent of how the chart was produced.

References

  1. Benford F. The law of anomalous numbers. Proceedings of the American Philosophical Society. 1938;78(4):551-572. https://www.jstor.org/stable/984802
  2. Nigrini MJ. Benford's Law: Applications for Forensic Accounting, Auditing, and Fraud Detection. Hoboken (NJ): John Wiley & Sons; 2012. https://www.wiley.com/en-us/Benford's+Law:+Applications+for+Forensic+Accounting,+Auditing,+and+Fraud+Detection-p-9781118152850
  3. Diekmann A. Not the First Digit! Using Benford's Law to Detect Fraudulent Scientific Data. Journal of Applied Statistics. 2007;34(3):321-329. DOI: 10.1080/02664760601004940
  4. Cerasa A. Testing for Benford's Law in very small samples: Simulation study and a new test proposal. PLOS ONE. 2022;17(7):e0271969. DOI: 10.1371/journal.pone.0271969