A Translated Line Had Quietly Reverted to English — Guarding String Resources an Agent's Refactor Touched
Let an agent tidy your values folder and translated strings can silently revert to the source text. Here is a design and implementation that treats the default locale as the source of truth, reads every other locale as a diff, and blocks only dropped keys, reverted translations, and broken format arguments at pre-commit.
Last week, during a pre-release check, I stopped short. The Chinese settings screen showed the English word "Notifications" sitting plainly in the list. That was a line I had translated half a year earlier, down to fixing the layout that wrapped awkwardly.
Just before, I had handed the values folder to an agent to tidy up — merge duplicate keys, drop unused ones. Somewhere in that pass, a handful of translated values had reverted to the default-language source text. The diff spanned hundreds of lines, far past what the eye can follow.
When you ship a wallpaper app in many languages as an indie developer, string resources swell to several hundred keys before you notice. I have had a translated line quietly break and stay broken until a store review pointed it out. So this is the kind of accident I would rather stop with a machine's eye than a human's.
Ignore reordering, catch only the change in value
When an agent tidies XML, it reorders keys and re-indents. That on its own does no harm. If you try to guard with a text diff, those harmless reformats fire as violations too, and the gate stops being read within a day.
What I want to stop is the three events where meaning, not appearance, changes: a key disappearing, a translated value reverting to the source, and a format-argument count drifting from the default. The last one is not a cosmetic problem — String.format throws IllegalFormatException and takes the whole screen down with it.
Event
What happens at runtime
Gate verdict
Keys reordered / reformatted
Nothing changes
Allowed (ignored)
Key missing from a locale
Falls back to the default; that one line mixes languages
Blocked
Translation reverts to source
English shows up on a supposedly translated screen
Blocked
Format-argument count mismatch
That screen crashes
Blocked
Put the comparison on a key-to-value dictionary rather than a text diff, and reordering is treated as identical automatically, leaving only changes in value. That is the heart of the design.
Treat the default locale as the source, each locale as a diff
I treat res/values/strings.xml as the source of truth, and read each locale — res/values-en/, res/values-zh-rCN/ — as a diff against it. plurals expand into name#quantity form, and keys marked translatable="false" are left out.
#!/usr/bin/env python3"""A pre-commit gate that diffs string resources under res/values againstHEAD to detect translation drift introduced by an agent's edits."""import reimport subprocessimport sysimport xml.etree.ElementTree as ETfrom pathlib import Path# Keys allowed to equal the source value (brand names, symbolic labels)ALLOW_EQUAL = {"app_name", "ok_label", "brand_tagline"}# Normalize %1$s / %d / %@ / {count} into an order-free multisetARG_RE = re.compile(r"%(?:\d+\$)?[-#+ 0,(]*\d*(?:\.\d+)?([@a-zA-Z])|\{(\w+)\}")def format_args(value: str): args = [] for m in ARG_RE.finditer(value): if m.group(2) is not None: # ICU form {name} args.append("{}") else: # printf form %1$s / %d / %@ args.append("%" + m.group(1)) return sorted(args)def parse_strings(xml_text: str): """Turn strings.xml into a name->value dict; plurals as name#quantity.""" out = {} if not xml_text.strip(): return out root = ET.fromstring(xml_text) for s in root.findall("string"): name = s.get("name") if name and s.get("translatable") != "false": out[name] = "".join(s.itertext()) for p in root.findall("plurals"): name = p.get("name") for item in p.findall("item"): out[f"{name}#{item.get('quantity')}"] = "".join(item.itertext()) return outdef at_head(rel_path: str): """Fetch the same file at HEAD (empty dict for a new file).""" r = subprocess.run(["git", "show", f"HEAD:{rel_path}"], capture_output=True, text=True) return parse_strings(r.stdout) if r.returncode == 0 else {}
Using itertext() matters: when a <string> wraps decoration tags like <b>, it still gathers the display text without dropping anything. If the agent reshapes the tag structure while reformatting, the joined string can still be compared as equal.
✦
Thank you for reading this far.
Continue Reading
What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.
WHAT YOU'LL LEARN
✦A design that compares translation resources key by key with the default locale as the source of truth, catching removed keys, values reverted to the source, and broken format arguments mechanically
✦An implementation that diffs strings.xml and xcstrings against HEAD and stops untranslated fall-through and placeholder-count mismatches at pre-commit
✦A diff rule that tolerates reordering and flags only value changes, plus the exit-code contract and excluded-key design for putting it in CI
Secure payment via Stripe · Cancel anytime
✦
Unlock This Article
Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.
Block only three things — removed, reverted, format
Every extra rule you add raises false positives and hollows the gate out. I keep it to the three I can tell apart from a legitimate human edit with confidence.
The reverted check is the keystone. It flags a key only when the current value equals the source and the value at HEAD differed from the source. Keys that always matched the source — short symbols, brand names — fall out, and only a line that used to be translated and has now snapped back to the source is caught.
def check_locale(default_now, base_loc, cur_loc, locale): v = [] # 1. removed: a key present at HEAD is gone from this locale now for key in base_loc.keys() - cur_loc.keys(): v.append((locale, key, "removed", "a key present at HEAD is gone")) # 2. reverted: a translated value has snapped back to the source text for key in cur_loc.keys() & base_loc.keys(): src = default_now.get(key) if src is None or key in ALLOW_EQUAL: continue if cur_loc[key] == src and base_loc[key] != src: v.append((locale, key, "reverted", "value reverted to the source")) # 3. format: argument count/type no longer matches the default (crash source) for key, val in cur_loc.items(): src = default_now.get(key) if src and format_args(val) != format_args(src): v.append((locale, key, "format", f"format args differ {format_args(src)} != {format_args(val)}")) return v
Format arguments are compared as a multiset, not by order. Android lets you position arguments with %1$s, but what I want to guard here is the crash from a mismatched count or type. Swapping positions is often the correct translation, so only the set of types is checked.
Violation
Condition
Why block it
removed
A HEAD key is absent in the current locale
Prevents mixed languages
reverted
current value = source AND HEAD value != source
Prevents silent un-translation
format
Format-argument multiset differs from the default
Prevents a runtime crash
Run it only when resources are staged
The driver sweeps the locale directories and exits with code 1 on any violation. So CI can reuse the same script unchanged, output goes to stderr and the verdict is the exit code.
def main(): res = Path("app/src/main/res") default_now = parse_strings((res / "values/strings.xml").read_text("utf-8")) failures = [] for d in sorted(res.glob("values-*")): f = d / "strings.xml" if not f.exists(): continue locale = d.name.removeprefix("values-") rel = str(f) failures += check_locale(default_now, at_head(rel), parse_strings(f.read_text("utf-8")), locale) for locale, key, kind, msg in failures: print(f"x [{locale}] {key} — {kind}: {msg}", file=sys.stderr) if failures: print(f"\n{len(failures)} translation drift issue(s). Commit stopped.", file=sys.stderr) sys.exit(1) print("ok: no translation resource drift")if __name__ == "__main__": main()
In the pre-commit hook, run it only when a string resource is staged. Sweeping every key on every commit is slow, so narrow the target with git diff --cached.
I recommend placing this check at both pre-commit and CI. CI catches what slips past your machine, and a change an agent opened unattended still passes through the same eye just before the release pipeline. For keys where matching the source is correct, list them in ALLOW_EQUAL and keep the reason for the exclusion in code.
What measuring eight languages and ~540 keys showed
I ran the gate for a week while tidying the wording around the AdMob consent dialog — eight languages, roughly 540 keys per release. I handed the duplicate merging and unused-key deletion to an agent and pushed its output through the gate.
Detection
Count (one week)
Where it was caught
reverted
3
Before commit (previously surfaced after release in review)
format mismatch
1
Before commit (before the crash)
removed key
2
Before commit
reorder / reformat false positives
0
Dictionary comparison never fires on these
The numbers may look small. But a single format mismatch means a specific screen will always crash for that language's users. Catching it on my machine, before it arrives as a one-line complaint in a Google Play review, was not a small thing. The zero false positives is also why the gate stayed in use.
The same eye for iOS string catalogs
The iOS string catalog (.xcstrings) is JSON, so the same idea ports directly. Each key holds a stringUnit per locale, with state set to translated or needs_review. A string that ships while still needs_review can be caught here too.
import jsondef parse_xcstrings(path, target): """Pull target-locale key->(value, state) out of an xcstrings file.""" data = json.loads(Path(path).read_text("utf-8")) out = {} for key, entry in data.get("strings", {}).items(): unit = entry.get("localizations", {}).get(target, {}).get("stringUnit") if unit: out[key] = (unit.get("value", ""), unit.get("state", "")) return outdef review_pending(path, target): return [k for k, (_, st) in parse_xcstrings(path, target).items() if st == "needs_review"]
Run each locale's value against the source through format_args and the same argument check works on iOS as well. Across platforms, what you are guarding does not change: whether a key has gone missing, whether a translation has reverted to the source, and whether the argument count still matches. Those three points.
If you want to act on this now, take one of your current values-* files and diff it against HEAD just once, to see whether a reverted translation is hiding in it. Finding even one is reason enough to make this gate permanent. Thank you for reading to the end.
Share
Thank You for Reading
Antigravity Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.