ResAIKit
Research Integrity Toolkit
Back to the encyclopedia
D13Statistical analysisFabrication DetectionLayer 2 (Contextual)

Heteroscedasticity Absent

Checks whether a variable's spread stays suspiciously constant across the low, middle, and high parts of its value range. Real measurements usually spread out more in some parts of their range than others, but values generated from a single simple distribution (a machine-fabrication shortcut) tend to be uniformly spread throughout. The indicator sorts each numeric column, splits it into thirds by value, compares the variance within each third, and flags columns whose variance is nearly identical across thirds.

Technical description

A contextual screen for an implausibly uniform spread across a variable's value range, a univariate proxy for absent variance heterogeneity. It requires at least twenty rows and at least three numeric columns with non-trivial standard deviation and at least ten distinct values (so the variance comparison is not dominated by ties in binary or ordinal-coded columns). For each qualifying column it sorts the values, splits them into three equal-sized value thirds, computes the within-third variance, and forms the ratio of the largest to the smallest. A ratio below 1.5 (spread nearly the same in every third) marks the column suspiciously homogeneous. The proportion of such columns maps to the score, a high proportion indicating data from a fixed simple distribution rather than measured.

How it works

Layer 2 (contextual): each qualifying column's complete values are sorted and divided into low, middle, and high thirds; the within-third sample variances give a homogeneity ratio (max over max(min, small floor)). A ratio below 1.5 marks the column homogeneous. The proportion maps to score: above 0.90 gives 4.0, above 0.75 gives 3.0, above 0.60 gives 2.0, above 0.45 gives 1.0, else 0; with five or more columns all homogeneous, a further 0.5 is added; and when median_brown_forsythe_p exceeds 0.95 (within-third variance statistically indistinguishable for the typical column), a further 0.5 is added, capped at 5.0. A finding is raised once the proportion exceeds 0.45. Metadata records columns_checked, n_homoscedastic, prop_homosc, and median_brown_forsythe_p (the median p of the formal median-centred Levene test of Brown and Forsythe across the value-range thirds, a high p confirming equal variance).

Why this matters

Real measured variables rarely have the same spread everywhere along their range: they fan out at higher values, pile up against limits, or mix subpopulations, so local variability changes across the range, as Micceri's survey of real datasets documented. Data generated by sampling each value independently from one fixed distribution carries that distribution's uniform structure across its whole range, so the spread is homogeneous by construction. Taloni and colleagues showed a model can fabricate a clinical dataset far too regular to be real, and the broader literature treats implausible regularity as a fabrication signal. A dataset where nearly every column has constant spread across its range is more consistent with synthetic generation than measurement.

Score thresholds

0-1
Variance differs across the value range as in most real data.
2-3
Many columns show suspiciously uniform spread across their range.
4-5
Almost all columns have near-identical variance across thirds, consistent with a fixed generating distribution.

Limitations

This is a univariate proxy, not a test of true conditional heteroscedasticity, which concerns how the variance of a response changes with a predictor and needs a paired predictor a single column does not provide. What it actually measures is whether the within-range dispersion of one variable is constant across its own sorted range, which detects uniform-like or single-distribution data but does not correspond to the regression notion the name evokes. The thirds-variance ratio is sensitive near its 1.5 threshold, so a borderline column's classification can hinge on the exact split. A genuinely uniform real variable or a bounded score will be classed homogeneous though real, so a flag is a screening signal. The check needs at least twenty rows and excludes constant and low-cardinality columns. The thresholds are heuristic. Related too-clean distributional signals are D2 and D28.

References

  1. Micceri T. (1989). The unicorn, the normal curve, and other improbable creatures. Psychological Bulletin 105(1):156-166
  2. Taloni A, Scorcia V, Giannaccare G. (2023). Large Language Model Advanced Data Analysis Abuse to Create a Fake Data Set in Medical Research. JAMA Ophthalmology 141(12):1174-1175
  3. Simonsohn U. (2013). Just Post It: The Lesson From Two Cases of Fabricated Data Detected by Statistics Alone. Psychological Science 24(10):1875-1888
  4. Brown MB, Forsythe AB. (1974). Robust tests for the equality of variances. Journal of the American Statistical Association 69(346):364-367
  5. Carlisle JB. (2017). Data fabrication and other reasons for non-random sampling in 5087 randomised, controlled trials in anaesthetic and general medical journals. Anaesthesia 72(8):944-952
  6. Bordewijk EM, Li W, van Eekelen R, Wang R, Showell M, Mol BW, van Wely M. (2021). Methods to assess research misconduct in health-related research: A scoping review. Journal of Clinical Epidemiology 136:189-202
  7. Wilkinson J, Heal C, Antoniou GA, et al.. (2024). A survey of experts to identify methods to detect problematic studies: stage 1 of the INveStigating ProblEmatic Clinical Trials in Systematic Reviews project. Journal of Clinical Epidemiology 175:111512
  8. Crone G, Green CD. (2025). Tools of the data detective: A review of statistical methods to detect data and result anomalies in psychology. Theory & Psychology 35(3):359-380