When the Agent Hands You 1,400 Replacements in One Commit, Ask for Batches Instead
Ask Antigravity to run a large codemod and you can get back one unreviewable commit. Here is a small design — ast-grep rules plus a verified batch driver — that splits a mechanical replacement into the machine's job and the human's check, with working code.
The morning a codemod came back as one giant commit
One morning I set out to move event tracking across my four indie apps onto a consent-aware wrapper. The old Analytics.logEvent("name", props) calls needed to become track("name", props), which checks consent state before sending. A plain mechanical job.
I asked Antigravity to "move everything onto track." It returned a single commit that rewrote 1,400 call sites across 230 files. The diff was endless, and I could not tell which parts were safe and which were risky.
The replacement itself was correct. The problem was that it arrived in a shape no human could verify. A change you cannot review does not ship, even when it happens to be right.
So rather than restate the recipe up front, let me walk through why I sent it back. What I landed on was a small design that re-splits the work into the machine's job and the human's check.
Why "one commit for everything" becomes unreviewable
A huge mechanical diff mixes changes of different natures.
Most of it is purely formulaic. Argument order and meaning are unchanged, and there is nothing worth reading. But buried inside are a few sites where meaning shifts — for example, calls that used to fire before consent now behave differently once the wrapper gates them.
Finding the dozen meaning-changing sites among 1,400 by eye is not realistic. The reviewer loses focus and waves it through with "probably fine." That was the real trap.
What the machine can safely fix should not need review; only what a human must judge should be carved out small. Without that separation, volume drowns quality.
✦
Thank you for reading this far.
Continue Reading
What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.
WHAT YOU'LL LEARN
✦A verification gate that slices 1,400 replacements into 25-file batches, each commit passing typecheck and tests
✦How to write ast-grep rules that separate what the machine can safely fix from what a human must judge
✦How to cap files-per-batch so the agent cannot pile up a giant diff
Secure payment via Stripe · Cancel anytime
✦
Unlock This Article
Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.
The first layer is the safe mechanical replacement that matches on a syntax pattern. A transform that only carries arguments through does not need a human; the agent runs it and the human only watches the tests.
The second layer is the set of sites where meaning might change. Tracking that ran before consent, repeated calls inside a loop, or call sites whose return value is used — patterns cannot fail safe on these. The agent does not rewrite them; it reports them as a list for a human.
I implemented these two layers with ast-grep rules and a verified batch driver. Here they are in order.
Writing the rules in ast-grep
ast-grep matches patterns against the syntax tree. Unlike regex, it is robust to nested arguments and line breaks because it matches the shape of the code itself.
The first-layer "safe replacement" carries arguments through with meta-variables.
# rules/migrate-logevent-safe.ymlid: migrate-logevent-safelanguage: typescriptrule: pattern: Analytics.logEvent(, ) # Exclude init paths that run before consent (defer to layer two) not: inside: kind: function_declaration has: pattern: onBeforeConsentfix: track(, )
The second layer does not replace anything — it only surfaces sites a human should look at. Omit fix and detect only.
# rules/flag-logevent-in-loop.ymlid: flag-logevent-in-looplanguage: typescriptseverity: warningrule: pattern: Analytics.logEvent(, ) inside: any: - kind: for_statement - kind: while_statement - kind: call_expression has: { pattern: $ARR.forEach }note: Tracking inside a loop. A human must check repeated sends and consent state.
Keeping the second layer as its own rule lets ast-grep scan list the warnings for you. Tell the agent: "leave any warning site untouched and report it as-is."
The batch driver: one commit = one directory = verified
To stack mechanical replacements safely, I run replace → verify → commit per directory. The key invariant: every commit has passed typecheck and tests.
#!/usr/bin/env bash# codemod-batch.sh <target directories>set -euo pipefailMAX_FILES_PER_BATCH=25 # files a single batch may touchRULE="rules/migrate-logevent-safe.yml"for dir in "$@"; do echo "▶ batch: $dir" # 1) count matching files in this directory hit=$(ast-grep scan -r "$RULE" "$dir" --json | jq '[.[].file] | unique | length') if [ "$hit" -eq 0 ]; then echo " skip (0 hits)"; continue; fi if [ "$hit" -gt "$MAX_FILES_PER_BATCH" ]; then echo " ✋ $hit files > cap $MAX_FILES_PER_BATCH. Re-run on a smaller path." exit 2 fi # 2) apply the mechanical replacement ast-grep scan -r "$RULE" "$dir" --update-all # 3) verification gate (no commit unless this passes) npm run typecheck npm run test -- --findRelatedTests "$dir" --passWithNoTests # 4) commit only this batch git add "$dir" git commit -m "codemod: migrate logEvent→track in $dir ($hit files)"done# 5) show what remains (including layer-two warnings)echo "── remaining (needs a human) ──"ast-grep scan -r rules/flag-logevent-in-loop.yml . --json | jq 'length'
The driver stops a failing batch while it is still uncommitted. A "commit advancing in a broken state" cannot happen by construction. You isolate the failing directory, investigate, fix, and resume.
Why I scope tests to "related only"
Running the full suite on every batch is not practical at this scale. I run only the tests related to the changed files mid-flight, then run the full suite once at the end of the batch run. Final pre-production confidence comes from the whole suite; speed wins in between.
Dry-run the counts before applying
Before ever running --update-all, I run ast-grep scan with --json only and look at the per-directory hit counts. If a number comes back far higher than expected, that is a signal the rule is too broad and is catching sites you never meant to touch. Once, an expected 230 files ballooned to 400 because the pattern also matched a same-named method from another library. Catching the count anomaly at the dry-run stage avoids a painful rollback after the fact.
The constraint you hand the agent: no giant diffs
This part mattered most. State three rules to the agent explicitly:
Run replacements via codemod-batch.shper directory. Never --update-all the whole repo at once.
If a commit would exceed MAX_FILES_PER_BATCH, split into a smaller path and resubmit.
Do not replace sites flagged by the layer-two rule; report file and line as a list.
With these in place, the agent stops saying "all done" and starts saying "I processed these directories, each commit verified. Seven warnings, list attached." The reviewer confirms only the tests for the formulaic batches and concentrates on the seven warnings.
The more freedom the agent has, the easier it slides back to a giant diff, so I also write these three into AGENTS.md and have it read them every time.
One-commit replacement vs. verified batches
Here is the difference along the axes that actually bit me.
Axis
One-commit replacement
Verified batches
Review
Eyeball 1,400 sites (not feasible)
Focus on a few warnings
Isolating a break
Unclear which batch caused it
Investigate only the failing directory
Rollback
Revert the whole thing
Revert just that commit
Meaning-changing sites
Hidden among formulaic edits
Tracked separately in layer two
Confidence to ship
Too scary to release
Every commit is verified
In my own work, regrouping into batches after the rejection cut my review time by roughly 2.5x. The count did not shrink; what a human had to look at narrowed from "1,380 formulaic sites" to "a dozen sites to judge."
A verification matrix: how to trust the replacement
Confidence in a mechanical replacement is measured by "did not break," not "did change." These are the axes I check on every batch.
Axis
Method
What a failure means
Type integrity
npm run typecheck
Argument types mismatch the wrapper
Related tests
--findRelatedTests
Behavior shifted somewhere
Idempotency
Re-run the rule, expect 0 hits
Missed sites or double replacement
Layer-two backlog
JSON count of the flag rule
Human judgment still pending
The idempotency check is humble but important. Run the same rule again after replacing; if it reports zero, you can at least say there are no "missed sites of the same pattern" and no accidental double-wrapping like track(track(...)).
How I settled here, and the next step
I, too, started out thinking "it is a mechanical edit, just do it all at once." Standing in front of the giant diff taught me that correctness and reviewability are different things. The gap bites hardest exactly when you touch code whose behavior depends on consent state, like the tracking around AdMob.
If you want to try it, run codemod-batch.sh once on your smallest directory first. Once you feel each commit stack up having passed typecheck and tests, the rest is just widening the scope. Tell the agent only this: "verified, small, leave the warnings and report them" — and the giant diff stops coming back.
If you also hand large mechanical replacements to AI, I hope this helps you ship them to production with a little more peace of mind.
Share
Thank You for Reading
Antigravity Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.