D33Statistical analysisFabrication ExtendedLayer 1 (Deterministic)

Temporal Anomalies

Detects impossible or implausible temporal patterns in longitudinal IPD: enrollment spikes, perfectly regular visit intervals, or impossible date sequences.

Technical description

Genuine clinical data spreads dates across weekdays and months with natural irregularity, while fabricated dates often land mostly on weekends, bunch into one week, fall on a single day, sit at a perfectly even interval, or lie in the future or before 1900. D33 identifies candidate date columns of the individual-patient data (IPD) by name keyword, parses each into timestamps (skipping numeric columns, whose values would be misread as nanosecond epochs), and requires at least ten parsed dates. It then computes the weekend fraction, the uniformity of the day-of-week distribution by a chi-square goodness-of-fit test, week-window clustering, presence of impossible dates, perfectly uniform spacing, and single-day concentration.

How it works

Layer 1 (deterministic): a column is a date candidate when its name contains a keyword such as date, time, visit, enrolled, admission, discharge, dob, or birth. Values are parsed with the datetime parser only when the column is textual or already datetime; a numeric column is skipped because the parser reads its numbers as nanoseconds since 1970. A column needs at least ten parsed dates. The weekend rate adds 2.5 above one half or 1.5 above thirty percent (mutually exclusive). When at least twenty dates are present, a chi-square goodness-of-fit test of the seven day-of-week counts against a uniform expectation adds 1.5 if the distribution is indistinguishable from uniform (p above 0.10) while the weekend share is between twenty and thirty percent, so the weekend check has not fired. Clustering of more than half the dates within a seven-day window adds 2.0. A future date or a date before 1900 each adds 1.0. All dates on one calendar day adds 3.0; otherwise perfectly uniform spacing (all consecutive intervals equal within one day) adds 1.5. Capped at 5.0.

Why this matters

Dates are metadata that fabrication frequently neglects. Structural impossibilities in individual-patient records are among the features that expose fabricated datasets, and central statistical monitoring of trial data quality relies on implausible patterns invisible to a casual reader, including the timing and clustering of records. Genuine trials leave incidental temporal structure, the weekday rhythm of clinic visits, the gradual accrual of enrolment, the natural jitter of appointment intervals, that a fabricator typing or generating a table rarely reproduces.

Score thresholds

0-1: Dates spread across weekdays and time as real scheduling produces
2-3: One strong anomaly, such as heavy weekend scheduling or week-long clustering
4-5: Several anomalies, or all dates on one day, consistent with programmatically generated dates

Limitations

The checks are heuristic and a flag prompts inspection rather than proving fabrication. Some designs legitimately violate the assumptions: weekend or single-session recruitment, fixed-interval intensive protocols, and batch-entered registries can each raise a signal without fraud. Only columns whose names carry a date keyword are examined, so unlabelled date columns are missed, and numeric date encodings such as year integers or spreadsheet serials are deliberately skipped to avoid misreading measurements as dates. Parsing depends on recognisable formats, and ambiguous day-month orders can be misread. The clustering and spacing checks need enough dates to be meaningful. Implausible demographic dates relative to age are indicator D03; D33 focuses on the standalone temporal structure of the IPD date columns.