ResAIKit
Research Integrity Toolkit
Back to the encyclopedia
G3-imgImage forensicsChart AnalysisLayer 1 (Deterministic)

Scale/Axis Incoherence

Reads the numeric labels off a chart's axes and screens them for the axis manipulations catalogued in the misleading-visualization literature: non-monotone axes, chaotic tick spacing, labels that do not match tick positions, inverted axes that reverse the perceived trend, a truncated (non-zero baseline) value axis on a bar chart, and dual y-axes. It works from optical character recognition (OCR) and axis geometry alone, with no model.

Technical description

G3 is a deterministic, generator-agnostic screen for charts whose axis scale does not behave the way a genuine plotting library would draw it. It detects the axis lines and tick positions, runs OCR to read the numeric label at each tick, and tests the resulting (pixel position, value) pairs against six structural expectations of a real numeric axis: values run in one direction (monotonicity), gaps are regular in linear or logarithmic space (spacing), pixel position maps cleanly to value (transfer function), orientation follows convention (inversion), the value axis on a bar chart starts at zero (truncation), and a single chart does not silently carry two different y-scales (dual axis). Each violation adds to a 0 to 5 score (capped). The indicator requires the image to be at least 32 by 32 pixels and at least two numeric labels on an axis before that axis is analysed.

How it works

The indicator runs deterministically at Layer 1 using detect_axes, extract_axis_values (which OCRs the label at each tick), detect_bars, and extract_numbers. For each axis it works from the numeric tick labels v_1, ..., v_m paired with their pixel positions p_1, ..., p_m.

The monotonicity test requires the labels to run in one direction. The sequence is monotone when v_i <= v_{i+1} for all i (increasing) or v_i >= v_{i+1} for all i (decreasing). A non-monotone sequence contributes 2.5 at error severity; the x-axis additionally requires at least three distinct values before it can fire, because OCR can read spurious numbers from categorical labels.

The spacing-regularity test examines the consecutive differences d_i = v_{i+1} − v_i. Their coefficient of variation is CV = σ(d) / |μ(d)|, the standard deviation of the gaps over their mean. A logarithmic axis is also admitted by computing the same CV on the differences of log10(v_i), in either direction. The axis is flagged only when both the linear CV >= 0.1 and the log-space CV >= 0.1 (or the log test is undefined for non-positive values), contributing 1.5 at warning severity.

The transfer-function test checks that the labels sit where their tick positions predict. For at least three valid (pixel, value) pairs a least-squares line is fit and the coefficient of determination R² = r² is taken, where r is the Pearson correlation r = Σ(p_i − p̄)(v_i − v̄) / sqrt[ Σ(p_i − p̄)² · Σ(v_i − v̄)² ]. When all values are positive, a second fit of p_i against log10(v_i) is computed and the larger R² is kept, so a legitimate logarithmic axis is not penalised. A best R² < 0.95 means the labels do not map cleanly to their positions under either a linear or a logarithmic scale, contributing 1.0 at warning severity.

The inversion test uses the same pairs to read the orientation. The least-squares slope b and the correlation r of value against pixel position are computed, and the test fires only when the axis is cleanly linear, |r| >= 0.9. A value axis is inverted when value rises as pixel-y increases (slope b > 0), and a horizontal axis is inverted when value falls as pixel-x increases (slope b < 0); either case reverses the perceived trend and contributes 1.5 at warning severity.

The truncation test applies to the value axis of a bar chart. When detect_bars finds at least two bars and every value label is positive, the baseline ratio is ρ = v_min / v_max. A ratio ρ > 0.05 with v_min > 0 means the axis starts well above zero, exaggerating the bar differences, and contributes 1.0 at warning severity.

The dual-axis test reads every numeric label in the figure with extract_numbers. A vertical axis column is a run of at least three labels within 0.22 · W of an edge (W the image width), tightly clustered in x (the standard deviation of the label-center x at most 0.06 · W), spanning at least 0.4 · H vertically (H the image height), and monotone in value with vertical position. When both a left and a right column are present, the chart carries two independent y-scales and contributes 1.0 at warning severity.

The contributions are summed across both axes and reported as min(5.0, total), with findings capped at ten. The metadata records, per axis, the values found, monotonicity, spacing CV, transfer R², and inversion flag, plus the truncation ratio and the truncation and dual-axis flags.

Score thresholds

Score Meaning
0 to 1 Monotone axes, regular linear or log spacing, clean pixel-to-value mapping, conventional orientation, zero-based bar axis, single y-scale. Consistent with a chart drawn by genuine plotting software.
2 to 3 One manipulation present: an inverted axis, a truncated bar baseline, a dual y-axis, chaotic spacing, or a poor transfer fit.
4 to 5 A non-monotone axis, or several manipulations together. Consistent with a fabricated, hand-edited, or deliberately misleading chart.

Why this matters

Axis manipulation is the most common family of chart deception. Every recent taxonomy of misleading visualizations places it at the centre: a benchmark of 2,604 real-world charts and 57,665 synthetic ones built axis classifiers and rule-based systems precisely because truncated and inverted axes recur so often [1]; a study of vision-language models on misleading data found that truncated and dual axes are among the design errors that models, and readers, are most affected by [2]; a benchmark of 21 misleader types over ten chart formats lists truncated, inverted, and dual axes as core axis subcategories [3]; and a study of eight misleading designs centres on truncated axis, inverted axis, and dual axis among them [4]. A truncated axis, where the value axis does not start at zero, exaggerates the differences between bars; an inverted axis flips the direction a trend appears to move; a dual y-axis lets two unrelated series be overlaid to suggest a correlation that the data does not support. G3 turns each of these into a deterministic test on the numeric labels. The transfer-function and spacing checks rest on the chart-data-extraction literature, where reconstructing values from a chart depends on fitting tick pixel positions to label values, on both linear and logarithmic axes [5], a mapping that scatter-plot and chart extractors build from the detected axis ticks [6]. When that mapping does not fit, the labels and the geometry disagree, which is the trace of a fabricated or inconsistent axis.

Limitations

G3 reads numeric axis labels, so it analyses an axis only when OCR recovers at least two numbers from it; purely categorical axes, or axes whose labels OCR cannot read, are not scored on the numeric checks. The thresholds are directional rather than exact, and some flagged designs are legitimate in context: a non-zero baseline can be appropriate for data that never approaches zero, an inverted axis is conventional in a few fields, and a dual y-axis is sometimes a reasonable choice, so the value-axis checks are screening cues that warrant review rather than proof of manipulation. The truncation check is deliberately confined to charts where bars are detected, since that is where a truncated baseline distorts the visual comparison. Localized pixel editing, error-bar fabrication, and the mismatch between a bar's drawn height and its printed label live in sibling chart indicators, so G3 stays on the scale and axis-geometry axes to avoid duplicating them.

Theoretical background

G3 formalises the axis-manipulation taxonomy as a set of tests on the correspondence between tick pixel positions and OCR-read label values. A genuine numeric axis is a monotone, affine (or log-affine) map from pixels to values: monotonicity and regular spacing test that the labels themselves form such a scale; the transfer-function R-squared tests that the labels sit where that scale predicts, trying both a linear and a logarithmic fit so that scientific log axes pass; the slope sign tests orientation; the baseline ratio tests truncation; and the presence of two independent label columns tests for a hidden second scale. Each test is a structural property that real plotting software guarantees and that fabricated, edited, or deliberately misleading charts break, which keeps the screen robust and free of any dependence on having seen a particular chart generator before.

References

  1. Tonglet J, Zimny J, Tuytelaars T, Gurevych I. Is this chart lying to me? Automating the detection of misleading visualizations. In: Proceedings of the Association for Computational Linguistics (ACL). 2026. arXiv:2508.21675. https://arxiv.org/abs/2508.21675
  2. Lalai HN, Shah RS, Pfister H, Varma S, Guo G. When Visuals Aren't the Problem: Evaluating Vision-Language Models on Misleading Data Visualizations. arXiv preprint arXiv:2603.22368. 2026. https://arxiv.org/abs/2603.22368
  3. Chen Z, Song S, Shum K, Lin Y, Sheng R, Wang W, Qu H. Unmasking Deceptive Visuals: Benchmarking Multimodal Large Language Models on Misleading Chart Question Answering. In: Proceedings of EMNLP. 2025. arXiv:2503.18172. https://arxiv.org/abs/2503.18172
  4. Mahbub R, Islam MS, Laskar MTR, Rahman M, Nayeem MT, Hoque E. The Perils of Chart Deception: How Misleading Visualizations Affect Vision-Language Models. In: IEEE VIS. 2025. arXiv:2508.09716. https://arxiv.org/abs/2508.09716
  5. Luo J, Li Z, Wang J, Lin CY. ChartOCR: Data Extraction from Charts Images via a Deep Hybrid Framework. In: IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). 2021. https://openaccess.thecvf.com/content/WACV2021/html/Luo_ChartOCR_Data_Extraction_From_Charts_Images_via_a_Deep_Hybrid_WACV_2021_paper.html
  6. Cliche M, Rosenberg D, Madeka D, Yee C. Scatteract: Automated Extraction of Data from Scatter Plots. In: Machine Learning and Knowledge Discovery in Databases (ECML PKDD), LNCS vol 10534. Springer; 2017. arXiv:1704.06687. https://arxiv.org/abs/1704.06687