Locale Specific Signals
Detects locale-specific language artifacts in non-English text, including encoding anomalies, English calques, and unnatural connector usage. Supports 11 languages: Romanian, German, French, Spanish, Portuguese, Italian, Turkish, Chinese, Japanese, Korean, and Russian.
Technical description
Applies three language-specific checks via a strategy pattern: (1) encoding/diacritic anomalies (e.g., comma-below vs cedilla mixing in Romanian, umlaut substitution in German, simplified/traditional mixing in Chinese), (2) English calques (literal translations that sound unnatural in the target language, matched against per-language dictionaries), and (3) translated vs idiomatic connector ratio (high ratio of literal English-style connectors vs natural ones suggests AI translation).
How it works
Layer 1 (deterministic): Reads the document language from analysis context. Delegates to a per-language strategy that checks encoding anomalies (regex-based), calques (dictionary matching), and connector ratios (count-based). Each check contributes to a cumulative score capped at 5.0.
Why this matters
When AI generates text in a non-English language, it often produces encoding inconsistencies, literal translations of English idioms, and overuses formally correct but unnatural connectors. These locale-specific signals help detect AI-generated text across diverse language contexts in academic publishing.
Score thresholds
- 0-1
- No locale-specific anomalies detected
- 2-3
- Minor encoding or translation artifacts present
- 4-5
- Strong signals of AI generation in the target language
Limitations
Only active for non-English languages. Requires correct language selection. Calque and connector dictionaries cover common patterns but may not detect novel AI artifacts. CJK word counting is approximate.