Higher-Order Interactions Absent
Tests whether the relationship between two variables changes depending on a third, which is what an interaction or moderation effect means. In real data the correlation between two measurements often differs between, say, younger and older participants. In data where each variable was sampled independently (a signature of machine-fabricated datasets), splitting on a third variable leaves the correlation essentially unchanged. The indicator splits each candidate third variable at its median, compares the correlation of the other two across the halves, and flags data where such interactions are almost never present.
Technical description
A contextual screen for absent two-way interaction effects in individual-patient data. It requires at least three numeric columns and thirty complete rows, and evaluates up to twenty triples (sampling deterministically if more). For each triple it median-splits the third variable, computes the correlation of the first two within each group, Fisher-z-transforms both, and takes the absolute difference as the moderation signal. An interaction is counted only when that difference clears BOTH a minimum effect-size floor of 0.30 AND twice the standard error of the difference (which depends on the group sizes), so the sampling noise of small split groups is not mistaken for moderation. The proportion of triples with an interaction, plus the mean difference, set the score; a low proportion indicates fabrication.
How it works
Layer 2 (contextual): complete-case numeric data is used. Each triple is skipped if either median-split group is below five observations or any sub-group has zero variance. The within-group correlations are Fisher-z-transformed and their absolute difference recorded; the interaction is flagged when that difference exceeds the larger of 0.30 and twice the standard error of the difference (SE = sqrt(1/(n_low-3) + 1/(n_high-3))). With at least three valid triples, the proportion flagged maps to the score: below 0.05 gives 3.5, below 0.15 gives 2.5, below 0.25 gives 1.5, below 0.35 gives 0.5, else 0; a mean difference below 0.10 adds half a point, capped at 5.0. Metadata records n_triples_evaluated, n_with_interaction, prop_with_interaction, mean_delta_z, median_delta_z (robust to a few extreme triples), and mean_interaction_pvalue (the mean formal two-sided p of the z-difference test the flag approximates).
Why this matters
Real biological and behavioural systems are full of moderation: the effect of one factor depends on another, so the correlation between two variables genuinely shifts across strata of a third. Independently sampled data has no such conditional structure, so within every stratum the correlation is the same up to noise, and interactions are absent by construction. This is a deeper version of the missing-dependence problem Taloni and colleagues observed in a model-fabricated clinical dataset. Using multivariate structure to separate genuine from invented data is well established (Al-Marzouki and colleagues; Simonsohn). Moderation is especially hard to fake because it requires the joint distribution of three variables to carry a specific shape, not merely a pairwise correlation, so its systematic absence is strong evidence the data were assembled one variable at a time. Requiring interactions to clear a significance bound, not just a fixed threshold, prevents small-group noise from masquerading as moderation.
Score thresholds
- 0-1
- Interactions appear at the rate expected of real multivariate data.
- 2-3
- Interactions are largely absent, suggesting weak or no moderation structure.
- 4-5
- Interactions are almost entirely absent, consistent with independently generated variables.
Limitations
Requires individual-patient data with at least three numeric variables and thirty complete rows, so smaller or summary-only studies are out of scope. It detects only moderation that shows as a change in linear correlation across a median split, so non-linear, three-way and higher, or unmeasured-moderator interactions are missed. The median split halves the sample, so each within-group correlation rests on few observations; the significance bound accounts for this but power is still limited, and genuinely structured data with weak moderation could show few detectable interactions without being fabricated. The triple set is capped at twenty and sampled deterministically when larger. The thresholds and effect-size floor are heuristic. Marginal and conditional correlation structure is assessed by D1 and D11; D12 probes moderation by a third variable.
References
- Taloni A, Scorcia V, Giannaccare G. (2023). Large Language Model Advanced Data Analysis Abuse to Create a Fake Data Set in Medical Research. JAMA Ophthalmology 141(12):1174-1175
- Al-Marzouki S, Evans S, Marshall T, Roberts I. (2005). Are these data real? Statistical methods for the detection of data fabrication in clinical trials. BMJ 331(7511):267-270
- Simonsohn U. (2013). Just Post It: The Lesson From Two Cases of Fabricated Data Detected by Statistics Alone. Psychological Science 24(10):1875-1888
- Fisher RA. (1921). On the "Probable Error" of a Coefficient of Correlation Deduced from a Small Sample. Metron 1:3-32
- Aiken LS, West SG. (1991). Multiple Regression: Testing and Interpreting Interactions. Newbury Park, CA: Sage Publications. ISBN 978-0761907121
- Bordewijk EM, Li W, van Eekelen R, Wang R, Showell M, Mol BW, van Wely M. (2021). Methods to assess research misconduct in health-related research: A scoping review. Journal of Clinical Epidemiology 136:189-202
- Wilkinson J, Heal C, Antoniou GA, et al.. (2024). A survey of experts to identify methods to detect problematic studies: stage 1 of the INveStigating ProblEmatic Clinical Trials in Systematic Reviews project. Journal of Clinical Epidemiology 175:111512
- Crone G, Green CD. (2025). Tools of the data detective: A review of statistical methods to detect data and result anomalies in psychology. Theory & Psychology 35(3):359-380