Claude Fingerprint
Detects vocabulary and phrasing patterns specifically associated with Anthropic's Claude models, such as nuanced hedging, 'I'd be happy to' constructions, and distinctive politeness markers.
Technical description
Matches text against a curated lexicon of Claude-characteristic patterns including: politeness markers ('I'd be happy to', 'That's a great question'), balanced perspective phrases ('On one hand... on the other'), characteristic hedges ('I should note', 'It's worth considering'), and self-referential transparency ('As an AI'). Also detects Claude's distinctive paragraph structure pattern of presenting balanced arguments. The lexicon spans conversational, cautious-hedging and balanced meta-discourse registers (EN + RO), and a structural sub-check flags em-dash overuse density -- the documented 'Claude em-dash problem' -- as a model-distinguishing signal complementary to E1's general em-dash flag.
How it works
Layer 1 (deterministic): Matches against a Claude-specific vocabulary and phrase dictionary. Detects balanced-argument paragraph structures. Identifies characteristic hedging patterns unique to Claude. Computes a weighted fingerprint score from multiple signal types.
Why this matters
Claude has a distinctive voice characterized by careful hedging, balanced perspectives, and explicit acknowledgment of uncertainty. These patterns are measurably different from other models and can fingerprint text generated by Claude specifically. Identifying the specific model used is important for tracing the provenance of AI-generated academic text.
Score thresholds
- 0-1
- No Claude-specific patterns detected
- 2-3
- Some Claude-associated phrasing present
- 4-5
- Strong Claude vocabulary fingerprint throughout
Limitations
Claude's style may overlap with naturally cautious academic writing. Model updates change Claude's vocabulary patterns over time. The politeness markers may appear in other contexts (customer service, educational writing). Calibration finding (AAVR controlled triad of source-confirmed samples, 2026): on a finished, cleaned document, vendor attribution from prose is unreliable. ChatGPT, Claude and Gemini produced superimposable stylistic profiles on the same topic and prompt; the shared LLM signature (negative parallelism, rule of three, systematic hedging, rigid structure) fires across all three, and the classic lexical cliches are cross-vendor and now down-weighted for vendor discrimination. This indicator should be read as 'patterns associated with this vendor', and on text without category-F technical artifacts the correct report is 'LLM, vendor uncertain'. The signals that actually separate vendors are leaked output handles and bibliography integrity, not style.
References
- Bisztray T, Cherif B, Dubniczky RA, et al.. (2025). I know which LLM wrote your code last summer: LLM-generated code stylometry for authorship attribution. arXiv:2506.17323
- Bitton Y, Bitton E, Nisan S. (2025). Detecting stylistic fingerprints of large language models. arXiv:2503.01659
- Tercon L, et al.. (2025). Linguistic characteristics of AI-generated text: a survey. arXiv:2510.05136