ResAIKit
Research Integrity Toolkit
Back to the encyclopedia
E3Text analysisForensicLayer 1 (Deterministic)

Punctuation Entropy

Measures punctuation diversity and detects AI-characteristic punctuation patterns. large language model (LLM) output is dominated by periods and commas, underuses semicolons and parentheses, and overuses mid-sentence dash parentheticals at 3.28x human rates.

Technical description

E3 computes Shannon entropy on the distribution of 16 punctuation characters and measures rates of specific marks per 1000 words. The core insight is directional: human writing shows richer, more varied punctuation (semicolons, parentheses, colons, varied dashes), while LLM output clusters on a small set of high-probability marks (periods, commas, and em-dashes used as parentheticals). The indicator also checks for the "paradoxical cleanliness" pattern familiar from E2: human writing almost always contains minor spacing errors around punctuation.

How it works

1. Shannon entropy on punctuation distribution. Counts 16 punctuation characters (period, comma, semicolon, colon, exclamation, question, hyphen, em-dash, parentheses, brackets, straight quotes, slash, ampersand) and computes the Shannon entropy of the frequency distribution. Entropy below 1.5 on texts over 1000 words fires the sub-check (+1.5). Low entropy means the punctuation profile is dominated by 2-3 characters, typical of LLM output.

2. Semicolon rate. Counts semicolons per 1000 words. Rate below 0.5 on texts over 500 words fires (+1.0). LLMs systematically underuse semicolons in academic text, defaulting to comma splices or separate sentences instead.

3. Parenthesis absence. Zero parentheses on texts over 1500 words fires (+1.0). Human academic writing uses parentheses for citations, clarifications, and abbreviations; their complete absence is a strong LLM signal.

4. Spacing-error paradox. Zero spacing errors (space before punctuation, missing space after sentence-ending punctuation) on texts over 2000 words fires (+0.5). Human typing produces these errors regularly; LLM output is machine-perfect.

5. Mid-sentence dash parentheticals. Counts occurrences of words connected by dashes or em-dashes ("word, word", "word, word"). Rate above 3.0 per 1000 words on texts over 200 words fires (+1.5). Independent analysis (Freeburg, 2025) found GPT-4.1 uses em-dash parentheticals at 3.28x the human rate, with the pattern resistant to prompt manipulation. The human baseline is approximately 1 per 1000 words.

Score thresholds

Score Meaning
0 to 1 Rich, varied punctuation profile with semicolons, parentheses, and moderate dash usage. Spacing errors present. Typical of human academic writing.
2 to 3 Moderate punctuation narrowing: low entropy, sparse semicolons, or elevated dash parentheticals.
4 to 5 Severe punctuation uniformity: low entropy, no semicolons, no parentheses, zero spacing errors, or heavy dash parenthetical overuse. Highly consistent with LLM output.

Limitations

Punctuation patterns vary systematically by genre, discipline, and language. Legal and legislative texts naturally use low semicolon rates. Poetry and creative writing show different punctuation profiles than academic prose. The thresholds are calibrated for English academic and scientific writing and may not generalise to other registers.

The mid-sentence dash check uses a pattern matching that matches both standard hyphens and em-dashes. A human writer who uses spaced hyphens as parentheticals ("like this, with spaces, as shown") will trigger the sub-check. The 3.0 per 1000 words threshold is deliberately conservative to minimise false positives on writers with a dash-heavy style.

References

  1. Freeburg M. Independent analysis of em-dash frequency in GPT-4.1 output vs. human benchmarks. 2025.
  2. McGill OSS. Why Did LLMs Steal Our Em-Dashes? 2025. https://www.mcgill.ca/oss/article/critical-thinking-student-contributors-technology/why-did-llms-steal-our-em-dashes
  3. smellcheck: detect AI/LLM smells in texts. GitHub. 2025. https://github.com/fbuchinger/smellcheck