D20Statistical analysisFabrication DetectionLayer 2 (Contextual)

Conditional Independence Violations

Checks specific pairs of variables against the relationships that physiology says they must have. Some pairs should correlate, such as systolic and diastolic blood pressure, and in a known direction. Others should be related only through a third variable, so that controlling for that mediator makes the correlation disappear. The indicator uses a dictionary of such domain-knowledge triples and flags pairs whose correlation is missing, has the wrong sign, or persists after the mediator is controlled for. It works on the individual-patient data and complements the global correlation-structure checks.

Technical description

D20 is a contextual screen that tests named variable pairs against expected dependence or conditional independence drawn from a domain-knowledge dictionary. Each entry specifies two variables, an expected correlation range, whether they should be independent after conditioning on a third mediator variable, and aliases for matching to columns. For a dependent pair, expected to correlate, the indicator computes the Pearson correlation and flags it when the magnitude is much weaker than the expected range, or when the correlation runs in the wrong direction, an expected-negative relationship appearing positive or the reverse, which a magnitude-only check would miss. For a mediated pair, expected to become independent once a third variable is held constant, it computes the unconditional correlation and, when that is meaningful, the partial correlation controlling for the mediator, flagging the pair when a strong residual correlation remains. The failure rates among dependent and mediated pairs set the score.

How it works

Each triple whose variables map to numeric columns with at least ten complete rows is checked. A dependent pair fails when its observed correlation is below the expected lower magnitude by more than 0.10, or when the expected range lies wholly on one side of zero but the observed correlation is meaningfully on the other side. A mediated pair is checked only when its unconditional correlation exceeds 0.15, and fails when the partial correlation, after regressing out the mediator, both exceeds 0.20 in magnitude and is significantly different from zero by the Fisher z-transform test at the five percent level, so that a residual correlation too small for the sample to distinguish from zero is not counted as a surviving dependence [9]. The score sums two contributions: from the dependent-pair failure rate, with bands at twenty, forty, and sixty percent, and from the mediated-pair failure rate weighted more heavily, with bands at twenty-five and fifty percent, capped at 5.0. Each failure produces a finding naming the variables, the observed and expected correlations, and, for a mediated pair, the residual partial correlation. The metadata records the per-type and total checked and failed counts, the failure rates for both pair types, and the smallest Fisher z partial-correlation p-value among the mediated pairs examined.

Score thresholds

Score	Meaning
0	Expected correlations are present and mediated pairs become independent as predicted.
1 to 2	Some expected relationships are missing, reversed, or fail to mediate.
3 to 5	Many domain-expected dependence or independence relationships are violated.

Why this matters

Domain knowledge specifies not just that variables relate but exactly how, in strength and direction and conditional structure, and these specifics are far harder to fabricate than a generic correlation. Taloni and colleagues showed that a model can fabricate a clinical dataset whose variables fail to carry the realistic relationships of a real cohort [1], and the use of correlation structure to separate genuine from invented data is established: Al-Marzouki and colleagues exploited the correlations and variances of trial variables [2], and Simonsohn detected fabrication from relationships a fabricator could not reproduce [3]. D20 sharpens this by testing literature-anchored expectations: an expected coupling that is absent, a known negative relationship that appears positive, or a mediation that fails to remove a correlation are each specific, checkable contradictions of how the variables behave in real physiology. The wrong-direction check is particularly hard to evade, because reproducing the correct sign of every physiological relationship requires the generator to encode the underlying biology, not merely to inject correlations. Partial correlation is the standard measure of conditional independence in multivariate statistics [4], and conditional-independence tests are the engine of causal-structure discovery, where the presence or absence of a dependence after conditioning identifies the underlying graph [5]; recent scoping reviews and trustworthiness instruments place dependence-structure checks among the standard screens for fabricated and machine-generated data [6, 7, 8].

Limitations

The check can only test variable pairs present in its triples dictionary and whose names it matches to columns, so relationships not encoded, or columns named unrecognisably, are not examined, and the absence of violations is only as informative as the pairs that could be checked. Correlation and partial correlation are linear measures, so a genuine non-linear dependence can appear weak and be misread as a missing correlation, and the partial correlation controls for a single linear mediator only. The expected ranges and the thresholds, an unconditional minimum of 0.15, a partial threshold of 0.20, a sign threshold of 0.10, and the weakness margin of 0.10, are drawn from the literature but applied as fixed cutoffs. Real populations can differ from the reference ranges for legitimate reasons of case mix, so a flag is a prompt to investigate rather than proof. The global correlation-matrix structure is assessed by indicators D1 and S17 and conditional correlations broadly by D11; D20 uses these specific domain pairs.

Theoretical background

D20 rests on encoding physiological causal knowledge as testable constraints on the joint distribution. Two variables that share a direct mechanism are marginally dependent, with a sign and strength set by that mechanism, so a genuine dataset reproduces both, and a reversed sign is not a weak version of the relationship but its contradiction, since it implies the mechanism runs backwards. Two variables related only through a common mediator are marginally dependent but conditionally independent given that mediator, a structure formalised in graphical models: conditioning on the mediator blocks the path between them, so the partial correlation collapses toward zero in real data. Fabricated data, generated without the underlying causal graph, tends to violate both: independently drawn variables lose the marginal couplings, and variables given a spurious direct association retain a partial correlation that mediation should have removed. The indicator estimates the marginal correlation directly and the conditional correlation by regressing out the mediator and correlating the residuals, which is the linear partial correlation. Reading dependent-pair failures and mediated-pair failures separately, and weighting the latter more, reflects that a surviving conditional dependence is a stronger structural anomaly than a merely weak marginal correlation, while checking the sign closes the gap that a magnitude-only test leaves for reversed relationships. Judging the surviving partial correlation by the Fisher z-transform rather than a fixed magnitude makes the conditional-independence decision inferential, the same test that constraint-based causal-structure search uses to read a dependence as present or absent given the conditioning set [9], so the verdict scales with the sample size rather than treating a noisy partial correlation in a small dataset as a real violation.

References

Taloni A, Scorcia V, Giannaccare G. Large Language Model Advanced Data Analysis Abuse to Create a Fake Data Set in Medical Research. JAMA Ophthalmology. 2023;141(12):1174-1175. DOI: 10.1001/jamaophthalmol.2023.5162
Al-Marzouki S, Evans S, Marshall T, Roberts I. Are these data real? Statistical methods for the detection of data fabrication in clinical trials. BMJ. 2005;331(7511):267-270. DOI: 10.1136/bmj.331.7511.267
Simonsohn U. Just Post It: The Lesson From Two Cases of Fabricated Data Detected by Statistics Alone. Psychological Science. 2013;24(10):1875-1888. DOI: 10.1177/0956797613480366
Baba K, Shibata R, Sibuya M. Partial Correlation and Conditional Correlation as Measures of Conditional Independence. Australian & New Zealand Journal of Statistics. 2004;46(4):657-664. DOI: 10.1111/j.1467-842X.2004.00360.x
Spirtes P, Glymour C, Scheines R. Causation, Prediction, and Search. 2nd ed. Cambridge, MA: MIT Press; 2000. ISBN 978-0262194402. https://direct.mit.edu/books/monograph/2057/Causation-Prediction-and-Search
Bordewijk EM, Li W, van Eekelen R, Wang R, Showell M, Mol BW, van Wely M. Methods to assess research misconduct in health-related research: A scoping review. Journal of Clinical Epidemiology. 2021;136:189-202. DOI: 10.1016/j.jclinepi.2021.05.012
Wilkinson J, Heal C, Antoniou GA, et al. A survey of experts to identify methods to detect problematic studies: stage 1 of the INveStigating ProblEmatic Clinical Trials in Systematic Reviews project. Journal of Clinical Epidemiology. 2024;175:111512. DOI: 10.1016/j.jclinepi.2024.111512
Crone G, Green CD. Tools of the data detective: A review of statistical methods to detect data and result anomalies in psychology. Theory & Psychology. 2025;35(3):359-380. DOI: 10.1177/09593543241311861
Kalisch M, Bühlmann P. Estimating High-Dimensional Directed Acyclic Graphs with the PC-Algorithm. Journal of Machine Learning Research. 2007;8:613-636. https://www.jmlr.org/papers/v8/kalisch07a.html