Punctuation Entropy
Measures the variety and unpredictability of punctuation usage. AI-generated text tends to use punctuation in very regular, predictable patterns compared to human writing.
Technical description
Computes Shannon entropy over the distribution of punctuation marks (periods, commas, semicolons, colons, dashes, parentheses, exclamation/question marks). Builds a punctuation n-gram model to measure sequence predictability. Compares the observed punctuation distribution against the expected distribution for academic text. Low entropy indicates overly regular punctuation patterns typical of AI generation.
How it works
Layer 1 (deterministic): Extracts all punctuation marks from the text. Computes Shannon entropy of the punctuation distribution. Builds bigram frequency tables for punctuation sequences. Compares against expected entropy range for human academic writing. Flags texts with entropy below the human baseline threshold.
Why this matters
Human writers use punctuation with natural variation — some sentences have multiple clauses with semicolons, others are short and punchy. AI models produce remarkably consistent punctuation patterns, favoring commas and periods while underusing semicolons, colons, dashes, and parentheses. This regularity is measurable through information-theoretic metrics.
Score thresholds
- 0-1
- Rich, varied punctuation usage matching human patterns
- 2-3
- Somewhat regular punctuation with reduced variety
- 4-5
- Highly predictable punctuation patterns with minimal variety
Limitations
Very short texts yield unreliable entropy measurements. Some academic styles (especially in STEM fields) naturally use simpler punctuation. Technical writing with many equations may have unusual punctuation distributions.