Conditional Independence Violations
Checks specific pairs of variables against the relationships physiology says they must have. Some pairs should correlate (such as systolic and diastolic blood pressure) and in a known direction; others should be related only through a third variable, so controlling for that mediator makes the correlation disappear. The indicator uses a dictionary of domain-knowledge triples and flags pairs whose correlation is missing, has the wrong sign, or persists after the mediator is controlled for.
Technical description
A contextual screen testing named variable pairs against expected dependence or conditional independence from a domain-knowledge dictionary. Each entry specifies two variables, an expected correlation range, whether they should be independent after conditioning on a third mediator, and aliases. For a dependent pair, it computes the Pearson correlation and flags it when the magnitude is much weaker than expected OR when the correlation runs in the wrong direction (an expected-negative relationship appearing positive or the reverse, which a magnitude-only check would miss). For a mediated pair, it computes the unconditional correlation and, when meaningful, the partial correlation controlling for the mediator, flagging when a strong residual correlation remains. The failure rates among dependent and mediated pairs set the score.
How it works
Layer 2 (contextual): each triple whose variables map to numeric columns with at least ten complete rows is checked. A dependent pair fails when its correlation is below the expected lower magnitude by more than 0.10, or when the expected range lies wholly on one side of zero but the observed correlation is meaningfully on the other (wrong sign). A mediated pair is checked only when its unconditional correlation exceeds 0.15, and fails when the partial correlation (regressing out the mediator) both exceeds 0.20 and is significantly nonzero by the Fisher z-transform test at five percent. The score sums a dependent-pair contribution (bands at 20, 40, 60 percent failure) and a more heavily weighted mediated-pair contribution (bands at 25, 50 percent), capped at 5.0. Findings name the variables, observed and expected correlations, and any residual partial. Metadata records the per-type and total checked and failed counts, the failure rates for both pair types, and the smallest Fisher z partial-correlation p-value among mediated pairs.
Why this matters
Domain knowledge specifies not just that variables relate but exactly how (strength, direction, conditional structure), and these specifics are far harder to fabricate than a generic correlation. Taloni and colleagues showed a model can fabricate a clinical dataset whose variables fail to carry realistic relationships, and the use of correlation structure to separate genuine from invented data is established (Al-Marzouki and colleagues; Simonsohn). D20 sharpens this with literature-anchored expectations: an expected coupling that is absent, a known negative relationship appearing positive, or a mediation that fails to remove a correlation are each specific, checkable contradictions. The wrong-direction check is especially hard to evade, because reproducing the correct sign of every physiological relationship requires the generator to encode the underlying biology.
Score thresholds
- 0
- Expected correlations are present and mediated pairs become independent as predicted.
- 1-2
- Some expected relationships are missing, reversed, or fail to mediate.
- 3-5
- Many domain-expected dependence or independence relationships are violated.
Limitations
Can only test variable pairs in its triples dictionary whose names it matches to columns, so unencoded relationships or unrecognisably named columns are not examined, and the absence of violations is only as informative as the pairs checkable. Correlation and partial correlation are linear, so a genuine non-linear dependence can appear weak and be misread as missing, and the partial correlation controls for a single linear mediator only. The expected ranges and thresholds (unconditional minimum 0.15, partial threshold 0.20, sign threshold 0.10, weakness margin 0.10) are from the literature but applied as fixed cutoffs. Real populations can differ from reference ranges for legitimate case-mix reasons, so a flag prompts investigation. The global correlation-matrix structure is assessed by D1 and S17 and conditional correlations broadly by D11.
References
- Taloni A, Scorcia V, Giannaccare G. (2023). Large Language Model Advanced Data Analysis Abuse to Create a Fake Data Set in Medical Research. JAMA Ophthalmology 141(12):1174-1175
- Al-Marzouki S, Evans S, Marshall T, Roberts I. (2005). Are these data real? Statistical methods for the detection of data fabrication in clinical trials. BMJ 331(7511):267-270
- Simonsohn U. (2013). Just Post It: The Lesson From Two Cases of Fabricated Data Detected by Statistics Alone. Psychological Science 24(10):1875-1888
- Baba K, Shibata R, Sibuya M. (2004). Partial Correlation and Conditional Correlation as Measures of Conditional Independence. Australian & New Zealand Journal of Statistics 46(4):657-664
- Spirtes P, Glymour C, Scheines R. (2000). Causation, Prediction, and Search. 2nd ed. Cambridge, MA: MIT Press. ISBN 978-0262194402
- Bordewijk EM, Li W, van Eekelen R, Wang R, Showell M, Mol BW, van Wely M. (2021). Methods to assess research misconduct in health-related research: A scoping review. Journal of Clinical Epidemiology 136:189-202
- Wilkinson J, Heal C, Antoniou GA, et al.. (2024). A survey of experts to identify methods to detect problematic studies: stage 1 of the INveStigating ProblEmatic Clinical Trials in Systematic Reviews project. Journal of Clinical Epidemiology 175:111512
- Crone G, Green CD. (2025). Tools of the data detective: A review of statistical methods to detect data and result anomalies in psychology. Theory & Psychology 35(3):359-380
- Kalisch M, Bühlmann P. (2007). Estimating High-Dimensional Directed Acyclic Graphs with the PC-Algorithm. Journal of Machine Learning Research 8:613-636