The Built-in Guide Skill Is Only Advice — Pair It With a Gate That Mechanically Rejects Antigravity's Output
The v2.2.1 built-in Guide skill raises how often the agent complies, but it is still probabilistic advice. Here is the design for a deterministic gate that reliably stops the violations that slip through, with working code and measured results.
About two weeks into using the v2.2.1 built-in Guide skill, I nearly approved a commit I shouldn't have. The Guide clearly stated "write tables as HTML <table>," and the previous fifteen commits had all obeyed it, so I glanced at the sixteenth and reached for the approve button. That one commit had quietly reverted to Markdown pipe syntax.
The problem wasn't that the agent slipped. It was that compliance had risen high enough that I had stopped looking. The Guide skill genuinely works. But precisely because it works, the few remaining percent of violations hide in the place hardest to see. This article lays out how to combine that "advisory layer" with a "deterministic layer" that reliably stops violations — a design I settled on while running several repositories as an indie developer.
The built-in Guide skill is a probabilistic lift in compliance
The built-in Guide skill introduced in v2.2.1 gives the agent standing, repository-scoped guidance. Rules that used to be scattered across AGENTS.md — "in this repo, write it this way" — can now live in a place the agent is expected to consult every session.
In my own use, the before-and-after was unmistakable. Before placing a Guide, roughly one in five or six commits carried output that drifted from convention. After placing one, that frequency dropped sharply.
But here's the point you must not misread: the Guide skill raises how often the agent complies; it does not guarantee compliance. Model output is probabilistic, so even while reading the same Guide, the agent occasionally misses when context grows long or similar instructions compete. Operationally, the gap between "almost always obeys" and "always obeys" is enormous.
The higher the compliance, the harder the survivors are to see
This is the crux. Counterintuitively, the higher the Guide skill's compliance rate, the lower the reviewer's vigilance falls.
Human review stays sharp when violations are found now and then. But once nine or more out of ten commits sail through cleanly, you unconsciously shift into skimming. My near-miss on the sixteenth commit was exactly this state. The better the Guide performed, the more I — the detector — had dulled.
Framing it as rates makes it clear. Suppose per-commit violation probability falls from 25% to 5%. Violations drop fivefold, but the attention a human devotes to each commit thins more than proportionally as frequency falls. As a result, the probability that "the one that slipped through reaches production" does not necessarily decrease. The reason you can't close your workflow on advice alone isn't that quality is bad — it's the complacency that good quality invites.
✦
Thank you for reading this far.
Continue Reading
What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.
WHAT YOU'LL LEARN
✦Why adopting the Guide skill drops violations from roughly 1-in-4 to 1-in-20, yet raises the risk of missing the survivors
✦A deterministic gate (Python) that catches 100% of residual violations, plus a loop that feeds violations back to the agent
✦A verification matrix comparing Guide-only vs Guide+gate, and a measured ~3x reduction in rework time
Secure payment via Stripe · Cancel anytime
✦
Unlock This Article
Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.
Put "the reasons behind decisions" in the advisory layer
So what belongs in the Guide? I center it on things a machine can't inspect after the fact.
Why this repository uses HTML tables (the renderer doesn't parse GFM)
Why commits stay small (so review units can be reverted independently)
Naming and tone policy (reader-facing prose, avoid hyperbole)
Conversely, mechanically inspectable rules (table syntax, file correspondence, banned words) may go in the Guide, but I place them there on the premise that I won't rely on them. The Guide shares intent; inspection is a separate layer. That division of labor is the whole idea.
# Repository Guide (excerpt)## Tables- Always write tables as HTML <table>. Markdown pipe syntax renders broken.- Background: this project's renderer does not load the GFM table extension.## Commit granularity- One commit = one meaningful change. Don't mix mechanical replacements with semantic ones.- Reason: keep units where a reviewer can choose to revert just this part.## Tone- Reader-facing prose stays polite and plain.- No hyperbole (best-ever, blazing, godlike).
An agent that reads this Guide writes HTML tables almost every time. Turning "almost" into "always" is the job of the next, deterministic layer.
The deterministic layer: stop inspectable rules in code
Rules a machine can decide, I reject in code rather than ask for in prose. The key is that the gate returns only true or false. It admits no ambiguity.
Below is a minimal gate that verifies the two machine-checkable rules among the three in my Guide. It takes the agent's generated or edited output and, on violation, returns exit code 1 along with the offenses.
#!/usr/bin/env python3"""guide_gate.py — deterministically verify the machine-checkable parts of the Guide. exit 0 = pass / exit 1 = violation (offenses printed to stdout)"""import reimport sysfrom pathlib import Path# Rule 1: forbid Markdown pipe tables (outside code fences); force HTML <table>TABLE_SEP = re.compile(r"(?m)^\s*\|?[ :]*-{2,}[ :]*(?:\|[ :]*-{2,}[ :]*)+\|?\s*$")# Rule 2: banned words (hyperbole)BANNED = ["best-ever", "blazing-fast", "godlike", "everything you need"]def strip_code(text: str) -> str: text = re.sub(r"```.*?```", "", text, flags=re.DOTALL) return re.sub(r"`[^`\n]*`", "", text)def check(path: Path): body = strip_code(path.read_text(encoding="utf-8")) issues = [] if TABLE_SEP.search(body): issues.append("markdown-table: pipe-syntax table found; convert to <table>") for w in BANNED: if w in body: issues.append(f"banned-word: found '{w}'") return issuesdef main(): failed = False for arg in sys.argv[1:]: issues = check(Path(arg)) if issues: failed = True print(f"[NG] {arg}") for i in issues: print(f" - {i}") else: print(f"[OK] {arg}") sys.exit(1 if failed else 0)if __name__ == "__main__": main()
The implementation detail that matters is stripping code blocks with strip_code() first, so example tables inside fences aren't falsely flagged. When I first wrote this, the gate kept reacting to the sample tables in an article and looped on endless rework. That "exclude examples from what you inspect" trap is common to verification gates.
The loop that feeds violations back to the agent
When the gate returns a violation, a human doesn't fix it — the violation text is handed straight back to the agent to fix. When the Guide drives intent, the gate decides facts, and the bounce-back drives correction, the workflow runs almost unattended.
#!/usr/bin/env bash# regenerate_until_clean.sh — have the agent fix until the gate passesset -euo pipefailMAX_RETRY=3TARGET="$1" # file to verifyfor attempt in $(seq 1 "$MAX_RETRY"); do if python3 guide_gate.py "$TARGET"; then echo "✅ gate passed (attempt ${attempt})" exit 0 fi echo "↩️ bouncing violations back for regeneration (${attempt}/${MAX_RETRY})" VIOLATIONS="$(python3 guide_gate.py "$TARGET" || true)" agy run --file "$TARGET" --instruction "Fix only the following violations. Touch nothing else:${VIOLATIONS}"doneecho "🛑 did not converge in ${MAX_RETRY} attempts. Escalate to human review."exit 1
agy run is how I invoke it in my environment, so adapt the launch to your own setup. The argument design is what counts. On bounce-back, scope it: "Fix only the following violations. Touch nothing else." Without that line, the agent rewrites unrelated lines while it's in there, and the review surface balloons. Capping retries (three here) and escalating to a human is equally essential — an unbounded loop quietly burns tokens.
Results: advice-only vs advice + gate
Here is the comparison from two weeks across four repositories. The counts are hand-tallied, so this is not rigorous statistics, but the trend was clear.
Configuration
Violation rate
Slips to production
Rework time per case
No Guide
~1 in 4
several per month
long, manual fixes
Guide only (advice)
~1 in 20
never reaches zero
short if caught / long if missed
Guide + deterministic gate
~1 in 20 (same rate)
0 (all caught by gate)
~3x shorter via auto bounce-back
Notice that adding the gate to the Guide doesn't change the violation rate itself. The gate isn't a tool for reducing occurrences; it's a tool for keeping the ones that occur out of production. The roles differ, so neither layer closes the workflow alone. Lower the violation rate with the Guide; drive slips to zero with the gate — keep that division in mind from the start and the design comes out clean.
For me, after switching to this combination, even changes like AdMob configuration files — where a break hits revenue directly but is hard to spot by eye — get stopped by the gate first. The drop in the psychological load of review mattered more than the numbers.
Your next step
Start by sorting the rules currently in your AGENTS.md into two piles: "intent a human should understand the background of" and "rules a machine can decide true or false." The former goes to the Guide skill; the latter to a ten-line gate. Begin your first gate with a rule you've personally let slip at least once before — you'll feel the effect immediately. Whether you can build the paradox "the better the advice works, the more inspection it needs" into your operations is, I think, what decides whether you can run an agent unattended for the long haul.
Share
Thank You for Reading
Antigravity Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.