ResAIKit
Research Integrity Toolkit
Back to the encyclopedia
G5-imgImage forensicsChart AnalysisLayer 1 (Deterministic)

Proportion Mismatch

Checks whether the heights of bars in a chart actually match their labeled values, detecting cases where bars were resized to misrepresent the data.

Technical description

Screens bar charts for data-visual disproportion (Tufte's lie factor, the size of the effect in the graphic divided by the size in the data, which is 1.0 when honest). Detects bars and numeric data labels and matches each label to its nearest bar within 80 px. In absolute mode (when the y-axis calibrates), each bar top is converted to a value via the axis mapping and compared to its label: relative error above 0.15 is a warning, above 0.30 an error; score is min(5.0, mean(error) x 10), capped at 2.5 for a single pair and 3.5 for two. In a calibration-free fallback (when axis labels cannot be read), an affine line is fit from label value to bar top pixel across at least three labeled bars; an R-squared below 0.90 means the heights do not encode the labels consistently and the largest-residual bar is reported, scored min(5.0, (1 - R-squared) x 10). Reports a per-bar lie factor and the proportionality R-squared.

How it works

Layer 1 (deterministic). Detects bars and numeric data labels and matches each label to its nearest bar. When the y-axis calibrates, converts each bar top to a value and measures the relative error and lie factor against the label. When it does not, fits an affine line from label values to bar-top pixels and measures how well the heights encode the labels (R-squared), flagging the largest-residual bar. Caps the score on sparse evidence and reports each mismatch as a finding.

Why this matters

Drawing a bar at the wrong height is one of the most direct ways to mislead with a chart, and data-visual disproportion is a named category in every recent misleading-visualization taxonomy, alongside truncated and inverted axes. The standard is Tufte's graphical integrity rule that the size of a mark must be proportional to the quantity it encodes, quantified by the lie factor. When a chart is produced from real data by plotting software, bar heights match their labels exactly; a bar whose height contradicts its own label is a self-inconsistency that needs no external data to detect.

Score thresholds

0-1
Bar heights match their labels (lie factor near 1.0) or are a consistent linear encoding of them
2-3
A measurable mismatch on one or two bars, or a moderately poor height-to-label fit
4-5
Large or widespread mismatch: bars drawn well out of proportion to their stated values

Limitations

Needs detected bars and readable numeric data labels, so charts without printed bar values are not scored. The absolute check also needs a calibrated y-axis; when axis labels cannot be read, the calibration-free check still runs from labels and heights but measures relative proportionality, so a chart whose every bar is scaled by the same wrong factor passes it. Label-to-bar matching is by nearest position and can mislink in dense layouts; rounding of printed labels produces small honest errors, so low single-bar errors are not flagged and sparse evidence is capped. Axis-scale manipulation, error-bar fabrication, and raster-level editing live in sibling chart indicators.

References

  1. Tonglet J, Zimny J, Tuytelaars T, Gurevych I. (2026). Is this chart lying to me? Automating the detection of misleading visualizations. ACL 2026 (arXiv:2508.21675)
  2. Lalai HN, Shah RS, Pfister H, Varma S, Guo G. (2026). When Visuals Aren't the Problem: Evaluating Vision-Language Models on Misleading Data Visualizations. arXiv:2603.22368
  3. Chen Z, Song S, Shum K, Lin Y, Sheng R, Wang W, Qu H. (2025). Unmasking Deceptive Visuals: Benchmarking Multimodal Large Language Models on Misleading Chart Question Answering. EMNLP 2025 (arXiv:2503.18172)
  4. Tufte ER. (2001). The Visual Display of Quantitative Information (2nd ed.). Graphics Press (first published 1983)
  5. Akhtar M, Subedi N, Gupta V, Tahmasebi S, Cocarascu O, Simperl E. (2024). ChartCheck: Explainable Fact-Checking over Real-World Chart Images. Findings of ACL 2024 (arXiv:2311.07453)
  6. Luo J, Li Z, Wang J, Lin CY. (2021). ChartOCR: Data Extraction from Charts Images via a Deep Hybrid Framework. IEEE/CVF WACV 2021