Voice Analysis
Uses a language model to read authorial voice, scoring a flat uniform voice, abrupt voice shifts between human and machine writing, and weak authorial stance, and combining them into a weighted score. It reads for the person behind the prose.
Technical description
L4-Voice is the semantic, model-based counterpart to the deterministic voice-variation check, which measures stylistic variance by the numbers. A document can fail on voice in opposite ways, a flat, average machine voice with little human idiosyncrasy, or abrupt seams where human and machine writing meet, and it can simply lack a real authorial stance; the rubric scores all three. It runs at Layer 4 when a model is configured and the text is at least 100 words, sends the document (first 8000 characters) with a three-dimension rubric, and aggregates the dimension scores into a weighted 0 to 5 score with the per-dimension breakdown retained.
How it works
The model is given the text with a rubric of three dimensions scored 0 to 5 independently.
Flat voice asks whether the voice is unnaturally even and average throughout, lacking the idiosyncratic rhythm, syntactic variety, and texture a distinctive author shows, the characteristic trace of machine writing. Voice shifts asks whether there are abrupt changes in formality, confidence, or expertise between passages, or signs of mixed human and machine authorship and machine polishing; this is judged as a matter of degree rather than as hard boundaries, since layered editing blurs where one author ends and another begins. Authorial stance asks whether a real authorial presence comes through, with a position taken and the reader engaged, or only a generalized, impersonal surface with little of the hedging, emphasis, and engagement that mark a human voice.
Abstention. Because voice is partly a matter of taste, the prompt instructs the model to flag a passage only when reasonably confident, and any finding it marks low-confidence is dropped at the code level.
Aggregation. The dimension scores s are combined by a weighted mean with weights uniformity 0.35, shifts 0.30, stance 0.35, over the dimensions returned: score = Σ wᵢ sᵢ / Σ wᵢ, clamped to 0 to 5. When the model returns a single overall score instead of the rubric, that value is used directly. Each surviving finding is labelled with its dimension (Flat voice, Voice shift, Weak authorial stance), and the per-dimension sub-scores and a summary are returned in the metadata.
Score thresholds
| Score | Meaning |
|---|---|
| 0 to 1 | A consistent, distinctive authorial voice runs throughout. |
| 2 to 3 | A somewhat flat voice, a passage that shifts, or a weak authorial presence. |
| 4 to 5 | A uniformly average voice, clear seams between styles, or no real authorial stance at all. |
Why this matters
Voice is where machine writing is at once most fluent and least itself. Stylometric measures find that model outputs cluster tightly, distinct from the broader spread of human authors, because the models default to a standardized, average profile rather than an individual one. The personal devices that build a voice in academic prose, the hedges, the emphasis, the moments where an author takes a position and addresses a reader, are exactly what machine text underuses, leaving a smooth but impersonal surface. The opposite failure appears in collaborative writing: when a human draft is polished by a model or a model draft continued by a human, the voice can shift mid-document, and recent work treats the extent of that editing as a continuous quantity rather than a clean boundary, because layered revision blurs the seam. L4-Voice reads for both, and for the absence of a voice in between, weighing each as a signal rather than a verdict.
Limitations
L4-Voice depends on a language model, so it is slower and costlier than the deterministic check, and its judgement carries the model's biases. Voice is partly a matter of taste, and how much stylistic variation is natural depends on the genre and field, so the indicator weighs its dimensions as signals and abstains rather than over-flagging; even so it can read an idiosyncratic human voice as inconsistent or a competent uniform one as machine-made. Its findings are assessments rather than proofs. Results vary between runs and between models, and the text is truncated to 8000 characters, so a long document is judged on its opening. It judges the texture of the writing, not whether its claims are true or its sources real, which other indicators handle.
Theoretical background
L4-Voice draws on three strands. The uniformity dimension follows the stylometric homogenization literature, which finds that model outputs converge to a narrow, average profile measurable through tight clustering and reduced diversity, with the caveat that the degree of homogenization is task-dependent. The shifts dimension follows the work on mixed human-machine authorship, which has moved from hard boundary detection toward estimating the continuous extent of machine editing, since layered revision makes sentence-level boundaries ill-posed; L4-Voice mirrors that by judging shifts as a matter of degree. The stance dimension rests on the stance-and-engagement framework, in which authorial presence is built from hedges, boosters, self-mention and reader engagement, the very devices machine writing underuses. Treating these as a rubric judged by a model, with abstention, follows the LLM-as-judge findings on subjective evaluation.
References
- Thai K, Emi B, Masrour E, Iyyer M. EditLens: quantifying the extent of AI editing in text. arXiv preprint arXiv:2510.03154. 2025. https://arxiv.org/abs/2510.03154
- Jain S, Lanchantin J, Nickel M, Ross C, Ullrich K, Wilson A, Watson-Daniels J. Task-dependent evaluation of LLM output homogenization: a taxonomy-guided framework. arXiv preprint arXiv:2509.21267. 2025. https://arxiv.org/abs/2509.21267
- Basani AR, Chen PY. Diversity boosts AI-generated text detection. arXiv preprint arXiv:2509.18880. 2025. https://arxiv.org/abs/2509.18880
- Alsadhan NA. Decoding AI authorship: can LLMs truly mimic human style across literature and politics? arXiv preprint arXiv:2603.23219. 2026. https://arxiv.org/abs/2603.23219
- Hyland K. Stance and engagement: a model of interaction in academic discourse. Discourse Studies. 2005;7(2):173-192. DOI: 10.1177/1461445605050365