D11Statistical analysisFabrication DetectionLayer 2 (Contextual)

Conditional Correlations Absent

Tests whether controlling for a third variable changes the correlation between two others, as it does in real data with genuine causal structure. When age relates to both blood pressure and cholesterol, holding age fixed should weaken the apparent link between the latter two. In data where each variable was generated independently, a common signature of machine-fabricated datasets, controlling for a third variable barely changes anything, because there is no shared structure to remove. The indicator compares each pairwise correlation with its partial correlation given a third variable and flags data where conditioning has almost no effect. It works on the individual-patient data when at least five numeric variables are present.

Technical description

D11 is a contextual screen for the absence of conditional structure in individual-patient data, a hallmark of variables drawn independently rather than from a real system with confounders and mediators. It requires at least five numeric columns with at least ten non-missing values each, and excludes constant columns because a zero-variance column has no defined correlation and would propagate undefined entries through the whole matrix and collapse the analysis. It forms the correlation matrix and enumerates triples of variables. For each triple in which all three pairwise correlations are non-trivial, with absolute value above 0.05, it computes the partial correlation of the first two given the third using the standard formula, the pairwise correlation minus the product of the other two divided by the square root of the product of their complements, and records the absolute change from the unconditional correlation. It then summarises the mean change across triples and the proportion of triples whose change exceeds 0.15. A small mean change means conditioning does almost nothing, the signature of independence, and drives the score; a low proportion of meaningfully-changed triples adds to it.

How it works

The correlation matrix is computed over the complete-case numeric data, and every ordered triple of distinct variables is considered. Triples where any pairwise absolute correlation is at or below 0.05 are skipped, since there is no relationship to condition, as are triples whose partial-correlation denominator collapses. For each remaining triple the absolute difference between the unconditional and partial correlation is the delta. With at least three valid triples, the mean delta maps to a base score: below 0.03 scores 4.0, below 0.07 scores 3.0, below 0.12 scores 2.0, below 0.18 scores 1.0, and otherwise 0. A proportion of triples with delta above 0.15 that is below five percent adds 1.0, capped at 5.0. The pairwise formula uses only the three correlations and ignores the sample size, so the indicator additionally brings it in through Fisher's z-transform: the standard error of a partial correlation that controls for one variable is one over the square root of n minus four, giving a two-sided p-value that each partial correlation differs from zero and an n-aware test of whether each conditioning change is real, namely whether the shift in z-space exceeds twice that standard error [9]. Findings are raised when the score reaches the concerning band. The metadata records the number of triples, the mean delta, the median delta (robust to a few extreme triples), the proportion exceeding 0.15, the number of columns tested, the mean partial-correlation p-value, the proportion of partial correlations that are individually significant, and the proportion of triples whose conditioning change is significant.

Score thresholds

Score	Meaning
0 to 1	Conditioning changes correlations as expected of data with real structure.
2 to 3	Conditioning has little effect, suggesting weak or absent confounding structure.
4 to 5	Conditioning has almost no effect across triples, consistent with independently generated variables.

Why this matters

Real multivariate data is shaped by shared causes, so the relationships among variables are entangled: the correlation between two variables generally changes, often substantially, once a third related variable is held constant, because part of their association ran through that third variable. Data generated by sampling each variable independently, the default behaviour of a language model asked to produce a dataset, has no such entanglement, so partial correlations nearly equal their unconditional counterparts. Taloni and colleagues showed that a model can fabricate a clinical dataset whose variables lack the realistic dependence of a real cohort [1], and the absence of conditional structure is a sharper form of that defect. The principle that the multivariate dependence structure distinguishes genuine from fabricated data predates language models: Al-Marzouki and colleagues used the correlation structure of trials to separate real from invented datasets [2], and Simonsohn showed that fabrication is exposed by relationships among reported quantities that a fabricator cannot reproduce [3]. Conditional correlations are powerful precisely because reproducing a believable web of confounding, rather than a few marginal correlations, is beyond a naive generator. The use of partial correlation as a measure of conditional independence is well established in statistics, where for multivariate-normal data a zero partial correlation coincides with conditional independence [4], and recent forensic re-analyses, scoping reviews, and trustworthiness instruments place multivariate dependence-structure checks among the standard screens for fabricated and machine-generated data [5, 6, 7, 8].

Limitations

The check requires individual-patient data with at least five qualifying numeric variables, so smaller or summary-only studies are outside its scope. It uses listwise deletion, so heavy missingness shrinks the effective sample. Partial correlation as computed here controls for one variable at a time and assumes linear relationships, so non-linear dependence or higher-order confounding is not captured, and a real dataset whose variables genuinely lack confounders could show small deltas without being fabricated. The triple enumeration is over ordered triples, so the same pair conditioned on the same third variable is counted in both orderings, which does not bias the mean or proportion but inflates the raw triple count. The thresholds and the 0.15 delta cutoff are heuristic. The unconditional correlation structure, including suspiciously flat matrices and near-perfect correlations, is assessed by indicators D1 and S17; D11 focuses specifically on whether conditioning changes those correlations.

Theoretical background

D11 rests on the distinction between marginal and conditional association. In a system with real causal structure, two variables can be correlated because one influences the other, because both share a common cause, or because one mediates the effect of a third; in the latter two cases, conditioning on the third variable removes the shared component and changes the correlation, which is the basis of confounding adjustment and mediation analysis. The partial correlation formalises this by projecting out the linear effect of the conditioning variable, and the difference between the marginal and partial correlation measures how much of the association was attributable to that third variable. In genuinely structured data these differences are routinely large for many triples, because variables are densely interdependent. When variables are generated independently, their population partial correlations equal their marginal correlations exactly, since there is no shared variance to remove, so the sample deltas cluster near zero up to estimation noise. D11 reads the mean delta as a measure of how much conditional structure exists and the proportion of large deltas as a measure of how widespread it is, treating their joint smallness as evidence that the data lack the confounding web that real measurement produces. Excluding constant columns is essential, because a degenerate variable would inject undefined correlations that would corrupt every triple it touches. The pairwise formula uses only the three correlations and not the count of observations behind them; bringing in the sample size through Fisher's variance-stabilising z-transform, whose standard error depends only on n, lets the indicator say not merely that a conditioning change is small but that it is statistically indistinguishable from zero, so a near-absent effect is read as genuine independence rather than as the noise of a small sample [9].

References

Taloni A, Scorcia V, Giannaccare G. Large Language Model Advanced Data Analysis Abuse to Create a Fake Data Set in Medical Research. JAMA Ophthalmology. 2023;141(12):1174-1175. DOI: 10.1001/jamaophthalmol.2023.5162
Al-Marzouki S, Evans S, Marshall T, Roberts I. Are these data real? Statistical methods for the detection of data fabrication in clinical trials. BMJ. 2005;331(7511):267-270. DOI: 10.1136/bmj.331.7511.267
Simonsohn U. Just Post It: The Lesson From Two Cases of Fabricated Data Detected by Statistics Alone. Psychological Science. 2013;24(10):1875-1888. DOI: 10.1177/0956797613480366
Baba K, Shibata R, Sibuya M. Partial Correlation and Conditional Correlation as Measures of Conditional Independence. Australian & New Zealand Journal of Statistics. 2004;46(4):657-664. DOI: 10.1111/j.1467-842X.2004.00360.x
Carlisle JB. Data fabrication and other reasons for non-random sampling in 5087 randomised, controlled trials in anaesthetic and general medical journals. Anaesthesia. 2017;72(8):944-952. DOI: 10.1111/anae.13938
Bordewijk EM, Li W, van Eekelen R, Wang R, Showell M, Mol BW, van Wely M. Methods to assess research misconduct in health-related research: A scoping review. Journal of Clinical Epidemiology. 2021;136:189-202. DOI: 10.1016/j.jclinepi.2021.05.012
Wilkinson J, Heal C, Antoniou GA, et al. A survey of experts to identify methods to detect problematic studies: stage 1 of the INveStigating ProblEmatic Clinical Trials in Systematic Reviews project. Journal of Clinical Epidemiology. 2024;175:111512. DOI: 10.1016/j.jclinepi.2024.111512
Crone G, Green CD. Tools of the data detective: A review of statistical methods to detect data and result anomalies in psychology. Theory & Psychology. 2025;35(3):359-380. DOI: 10.1177/09593543241311861
Fisher RA. On the "Probable Error" of a Coefficient of Correlation Deduced from a Small Sample. Metron. 1921;1:3-32.