D33Statistical analysisFabrication ExtendedLayer 1 (Deterministic)

Temporal Anomalies

Looks at the dates in a dataset, such as visit or enrollment dates, and checks that they behave like real scheduling. Genuine clinical data spreads across weekdays and months with natural irregularity, while fabricated dates often betray themselves by landing mostly on weekends, all bunching into one week, falling on a single day, being spaced at a perfectly even interval, or sitting in the future or the distant past. The indicator parses each date column and flags these implausible patterns. It works on the individual-patient data (IPD).

Technical description

D33 is a deterministic screen for implausible temporal structure in the date columns of individual-patient data (IPD). It identifies candidate date columns by name keyword, parses each into timestamps, and requires at least ten parseable dates to assess a column. Numeric columns are not parsed, because interpreting their values as dates would turn an ordinary measurement into spurious timestamps. On each genuine date column it computes six signals: the fraction of dates falling on a weekend, since scheduled clinical visits are predominantly on weekdays; whether the seven day-of-week counts are statistically indistinguishable from a uniform spread, as a column of randomly generated dates would be; whether more than half the dates fall within any seven-day window, indicating mass bulk entry; the presence of impossible dates in the future or before 1900; perfectly uniform spacing between consecutive dates, indicating a fixed schedule rather than real appointments; and whether every date is the same calendar day. Each signal contributes additively to a capped score.

How it works

A column is a date candidate when its name contains a keyword such as date, time, visit, enrolled, admission, discharge, dob, or birth. Its values are parsed with the datetime parser, but only when the column is textual or already a datetime type; a numeric column is skipped because the parser would read its numbers as nanoseconds since 1970 and manufacture a cluster of false dates. A column needs at least ten parsed dates. The weekend rate adds 2.5 above one half or 1.5 above thirty percent, these being mutually exclusive. When at least twenty dates are present, a chi-square goodness-of-fit test compares the seven day-of-week counts against a uniform expectation; if the distribution cannot be distinguished from uniform, with a p-value above 0.10, while the weekend share sits between twenty and thirty percent so the separate weekend check has not already fired, this adds 1.5. Clustering of more than half the dates within a seven-day window adds 2.0. A future date or a date before 1900 each adds 1.0. If all dates fall on one calendar day the single-day check adds 3.0; otherwise, perfectly uniform spacing, all consecutive intervals equal within one day, adds 1.5. The total is capped at 5.0. Each triggered check emits a finding naming the column and the pattern, and the metadata records the analysed columns and the per-column results.

Score thresholds

Score	Meaning
0 to 1	Dates spread across weekdays and time as real scheduling produces.
2 to 3	One strong anomaly, such as heavy weekend scheduling or week-long clustering.
4 to 5	Several anomalies, or all dates on one day, consistent with programmatically generated dates.

Why this matters

Dates are metadata that fabrication frequently neglects. Carlisle's examination of trials submitted with individual-patient data found that structural impossibilities in the records were among the features that exposed fabricated datasets, and temporal fields are a natural place for such impossibilities to surface [1]. George and Buyse describe central statistical monitoring of trial data quality, in which implausible patterns invisible to a casual reader, including the timing and clustering of records, are precisely the signals that distinguish generated from collected data [2]. Buyse and colleagues set out the broader role of biostatistics in detecting fraud and note that genuine trials leave incidental structure, the weekday rhythm of clinic visits, the gradual accrual of enrolment, the natural jitter of appointment intervals, that a fabricator typing or generating a table rarely reproduces [3]. Each of D33's checks targets one such incidental feature: a real visit schedule avoids weekends, a real enrolment accrues over time rather than within a week, a real appointment series varies in spacing, and no real cohort is all recorded on a single day or dated in the future. Carlisle later applied a baseline-distribution screen of this family to thousands of trials and identified a tail whose summary statistics were too regular to arise by chance [4]. More recent data-integrity audits formalised the approach: Bordewijk and colleagues compared recruitment windows and baseline distributions across trials from single author groups and found patterns inconsistent with proper randomisation [5], and a scoping review catalogued the statistical methods now used to screen health research for misconduct [6]. An expert interview study placed implausible timelines and dates among the warning signs of a prepublication screening checklist [7], and Grey and colleagues argued that publication-integrity checks of this kind belong before any formal misconduct finding [8].

Limitations

The checks are heuristic and a flag is a prompt to inspect provenance rather than proof of fabrication. Some study designs legitimately violate the assumptions: weekend or single-session recruitment events, intensive protocols with fixed-interval visits, and registries entered in a single batch can each raise a signal without any fraud. The indicator only examines columns whose names carry a date keyword, so an unlabelled date column is missed, and it does not parse numeric date encodings such as year integers or spreadsheet serial numbers, which it deliberately skips to avoid misreading measurements as dates. Date parsing depends on recognisable formats, and ambiguous day-month orders can be misread. The clustering and uniform-spacing checks need enough dates to be meaningful, which the ten-date minimum only partly ensures. Implausible demographic dates relative to age are indicator D03, so D33 focuses on the standalone temporal structure of the IPD date columns.

Theoretical background

D33 rests on the idea that timestamps in real data are the imprint of a physical scheduling process with known regularities, and that generation tends to violate those regularities in detectable ways. Clinic operations concentrate visits on weekdays, so the weekend fraction of a genuine visit column sits well below the two-sevenths a uniform random process would give, and a fabricator who draws dates uniformly, or who is indifferent to the calendar, raises that fraction toward or beyond it. The same logic extends from the weekend fraction to the whole week: a genuine visit column is markedly concentrated on particular weekdays, so a profile that is statistically flat across all seven days, which a uniform random date generator produces, is itself a generation signature, and a chi-square goodness-of-fit test of the day-of-week counts against the uniform expectation captures it even when the weekend share alone stays just below the weekend threshold. Enrolment in a real trial accrues gradually as eligible participants present, so the dates spread over months or years; a generator that assigns dates from a narrow window, or copies one date, collapses that spread into a cluster or a single day, which the window and single-day checks detect. Appointment intervals in real follow-up vary because life intervenes, so a sequence of exactly equal gaps is the signature of a schedule template rather than recorded reality, and the uniform-spacing check captures it. The impossible-date checks rest on hard physical constraints: data cannot record the future, and clinical dates do not precede the modern era, so either indicates an entry error or a generation artefact. Restricting parsing to textual and datetime columns is essential to the logic, because coercing a numeric measurement column into the nanosecond epoch would itself fabricate the very clustering and single-day patterns the indicator is built to find, turning the tool into a source of the false positives it should avoid.

References

Carlisle JB. False individual patient data and zombie randomised controlled trials submitted to Anaesthesia. Anaesthesia. 2021;76(4):472-479. DOI: 10.1111/anae.15263
George SL, Buyse M. Data fraud in clinical trials. Clinical Investigation. 2015;5(2):161-173. DOI: 10.4155/cli.14.116
Buyse M, George SL, Evans S, et al. The role of biostatistics in the prevention, detection and treatment of fraud in clinical trials. Statistics in Medicine. 1999;18(24):3435-3451. DOI: 10.1002/(SICI)1097-0258(19991230)18:24<3435::AID-SIM365>3.0.CO;2-O
Carlisle JB. Data fabrication and other reasons for non-random sampling in 5087 randomised, controlled trials in anaesthetic and general medical journals. Anaesthesia. 2017;72(8):944-952. DOI: 10.1111/anae.13938
Bordewijk EM, Wang R, Askie LM, et al. Data integrity of 35 randomised controlled trials in women's health. European Journal of Obstetrics & Gynecology and Reproductive Biology. 2020;249:72-83. DOI: 10.1016/j.ejogrb.2020.04.016
Bordewijk EM, Li W, van Eekelen R, et al. Methods to assess research misconduct in health-related research: a scoping review. Journal of Clinical Epidemiology. 2021;136:189-202. DOI: 10.1016/j.jclinepi.2021.05.012
Parker L, Boughton S, Lawrence R, Bero L. Experts identified warning signs of fraudulent research: a qualitative study to inform a screening tool. Journal of Clinical Epidemiology. 2022;151:1-17. DOI: 10.1016/j.jclinepi.2022.07.006
Grey A, Bolland MJ, Avenell A, Klein AA, Gunsalus CK. Check for publication integrity before misconduct. Nature. 2020;577(7789):167-169. DOI: 10.1038/d41586-019-03959-6