Copy Artifacts
Detects traces of copy-paste from chatbot interfaces: residual Markdown, LaTeX and XML markup, conversational prefixes and closings, AI self-reference, and the near-decisive leaked output handles (turn0search3), citation-markup tokens (oaicite, contentReference) and mangled-LaTeX entities that do not occur in human writing.
Technical description
Scans for artifacts left when AI output is pasted verbatim: residual Markdown (bold, headings, bullets, links, code), conversational prefixes (Sure, Certainly, Here's) and closings (I hope this helps, feel free to), AI self-reference (As an AI, As a language model), residual LaTeX commands and inline math, XML/Claude artifacts (<thinking>, <artifact>), and numbered Markdown headings. It additionally detects three near-unfalsifiable fingerprints: structured-output handles of the form turn{n}{type}{m} (for example turn0search3) that GPT-5-class models emit for internal traceability; citation-markup leaks such as oaicite, oai_citation, contentReference and grok_card; and LaTeX commands mangled into HTML-entity-like tokens (for example &frac{). These last three carry high weight because they are absent from genuine human writing.
How it works
Layer 1 (deterministic): eleven sub-checks scan the text. Residual Markdown, LaTeX, XML, numbered headings, conversational prefixes/closings, self-reference and presentational scaffolding each add weighted points. The decisive sub-check matches structured-output handles turn{n}{type}{m}, citation-markup tokens (oaicite/oai_citation/contentReference/attributableIndex/attached_file/grok_card), and mangled-LaTeX entities (&name{, which cannot be a real HTML entity because those end in ';'); a handle or markup leak adds 3.0 points at error severity. Points are summed and capped at 5.0.
Why this matters
When AI-generated text is copied from a chatbot interface and pasted without editing, it carries formatting and scaffolding artifacts. Most are stylistic, but a few are decisive: the internal output handles (turn{n}{type}{m}) and citation-markup tokens are emitted only by the model's tooling and never produced by a human author, so their presence is near-definitive evidence that the passage was pasted verbatim from an AI. Wikipedia's guide to signs of AI writing lists exactly these markup leaks (turn0search0, contentReference, oaicite) among the strongest tells.
Score thresholds
- 0-1
- Clean formatting consistent with native authoring
- 2-3
- Some formatting inconsistencies suggesting partial copy-paste
- 4-5
- Abundant copy-paste artifacts indicating wholesale text transfer
Limitations
DOCX conversion from other formats introduces formatting artifacts. Collaborative writing tools may insert unusual whitespace. Some text editors preserve original formatting differently.
References
- Wikipedia contributors (WikiProject AI Cleanup). (2025). Wikipedia:Signs of AI writing. Wikipedia
- Petrovic D. (2025). Using GPT-5 structured output markers to detect AI-generated content online. DEJAN AI
- Krishna K, Song Y, Karpinska M, Wieting J, Iyyer M. (2023). Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense. arXiv:2303.13408