Gemini Fingerprint
Detects vocabulary and phrasing patterns specifically associated with Google's Gemini models, such as distinctive information structuring and Google-style formatting preferences.
Technical description
Matches text against a curated lexicon of Gemini-characteristic patterns including: information-dense sentence structures, Google-style formatting (bold headers, bullet points in prose), characteristic knowledge synthesis patterns, and vocabulary preferences distinct from other models. Detects Gemini's tendency toward encyclopedic comprehensiveness and its distinctive transition phrases. The lexicon spans conversational and synthesis or structuring registers (EN + RO); structural checks flag Gemini's report-style formatting, including heavy bold-header density, and a sub-check flags em-dash overuse density, a punctuation habit shared with Claude.
How it works
Layer 1 (deterministic): Matches against a Gemini-specific vocabulary and phrase dictionary. Detects characteristic formatting patterns and information structuring. Identifies knowledge synthesis patterns unique to Gemini. Computes a weighted fingerprint score from multiple signal types.
Why this matters
Gemini has a distinctive writing style influenced by its training on Google's vast knowledge base. It tends to produce information-dense, comprehensive responses with particular formatting preferences. Identifying Gemini-specific patterns helps trace the provenance of AI-generated text to a specific model family.
Score thresholds
- 0-1
- No Gemini-specific patterns detected
- 2-3
- Some Gemini-associated phrasing present
- 4-5
- Strong Gemini vocabulary fingerprint throughout
Limitations
Gemini's style may overlap with well-researched encyclopedic writing. The model's formatting preferences may be stripped during copy-paste. Gemini is relatively newer, so its distinctive patterns are less well-documented than ChatGPT's. Calibration finding (AAVR controlled triad of source-confirmed samples, 2026): on a finished, cleaned document, vendor attribution from prose is unreliable. ChatGPT, Claude and Gemini produced superimposable stylistic profiles on the same topic and prompt; the shared LLM signature (negative parallelism, rule of three, systematic hedging, rigid structure) fires across all three, and the classic lexical cliches are cross-vendor and now down-weighted for vendor discrimination. This indicator should be read as 'patterns associated with this vendor', and on text without category-F technical artifacts the correct report is 'LLM, vendor uncertain'. The signals that actually separate vendors are leaked output handles and bibliography integrity, not style.
References
- Bisztray T, Cherif B, Dubniczky RA, et al.. (2025). I know which LLM wrote your code last summer: LLM-generated code stylometry for authorship attribution. arXiv:2506.17323
- Bitton Y, Bitton E, Nisan S. (2025). Detecting stylistic fingerprints of large language models. arXiv:2503.01659
- Tercon L, et al.. (2025). Linguistic characteristics of AI-generated text: a survey. arXiv:2510.05136