Articles/Agents & Manager

◈ Agents & Manager/2026-06-30Advanced

When the Agent Hands You 1,400 Replacements in One Commit, Ask for Batches Instead

Ask Antigravity to run a large codemod and you can get back one unreviewable commit. Here is a small design — ast-grep rules plus a verified batch driver — that splits a mechanical replacement into the machine's job and the human's check, with working code.

Antigravity²⁹² codemod ast-grep refactoring⁷ ai-agent¹⁷ code-review⁸ indie-developer⁵

✦ Premium Article

The morning a codemod came back as one giant commit

One morning I set out to move event tracking across my four indie apps onto a consent-aware wrapper. The old Analytics.logEvent("name", props) calls needed to become track("name", props), which checks consent state before sending. A plain mechanical job.

I asked Antigravity to "move everything onto track." It returned a single commit that rewrote 1,400 call sites across 230 files. The diff was endless, and I could not tell which parts were safe and which were risky.

The replacement itself was correct. The problem was that it arrived in a shape no human could verify. A change you cannot review does not ship, even when it happens to be right.

So rather than restate the recipe up front, let me walk through why I sent it back. What I landed on was a small design that re-splits the work into the machine's job and the human's check.

Why "one commit for everything" becomes unreviewable

A huge mechanical diff mixes changes of different natures.

Most of it is purely formulaic. Argument order and meaning are unchanged, and there is nothing worth reading. But buried inside are a few sites where meaning shifts — for example, calls that used to fire before consent now behave differently once the wrapper gates them.

Finding the dozen meaning-changing sites among 1,400 by eye is not realistic. The reviewer loses focus and waves it through with "probably fine." That was the real trap.

What the machine can safely fix should not need review; only what a human must judge should be carved out small. Without that separation, volume drowns quality.

✦

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN

✦A verification gate that slices 1,400 replacements into 25-file batches, each commit passing typecheck and tests

✦How to write ast-grep rules that separate what the machine can safely fix from what a human must judge

✦How to cap files-per-batch so the agent cannot pile up a giant diff

Secure payment via Stripe · Cancel anytime

✦

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

Unlock all articles with Membership →

The core idea: split the codemod into two layers

The split I adopted has two layers.

The first layer is the safe mechanical replacement that matches on a syntax pattern. A transform that only carries arguments through does not need a human; the agent runs it and the human only watches the tests.

The second layer is the set of sites where meaning might change. Tracking that ran before consent, repeated calls inside a loop, or call sites whose return value is used — patterns cannot fail safe on these. The agent does not rewrite them; it reports them as a list for a human.

I implemented these two layers with ast-grep rules and a verified batch driver. Here they are in order.

Writing the rules in ast-grep

ast-grep matches patterns against the syntax tree. Unlike regex, it is robust to nested arguments and line breaks because it matches the shape of the code itself.

The first-layer "safe replacement" carries arguments through with meta-variables.

# rules/migrate-logevent-safe.yml
id: migrate-logevent-safe
language: typescript
rule:
  pattern: Analytics.logEvent(, )
  # Exclude init paths that run before consent (defer to layer two)
  not:
    inside:
      kind: function_declaration
      has:
        pattern: onBeforeConsent
fix: track(, )

The second layer does not replace anything — it only surfaces sites a human should look at. Omit fix and detect only.

# rules/flag-logevent-in-loop.yml
id: flag-logevent-in-loop
language: typescript
severity: warning
rule:
  pattern: Analytics.logEvent(, )
  inside:
    any:
      - kind: for_statement
      - kind: while_statement
      - kind: call_expression
        has: { pattern: $ARR.forEach }
note: Tracking inside a loop. A human must check repeated sends and consent state.

Keeping the second layer as its own rule lets ast-grep scan list the warnings for you. Tell the agent: "leave any warning site untouched and report it as-is."

The batch driver: one commit = one directory = verified

To stack mechanical replacements safely, I run replace → verify → commit per directory. The key invariant: every commit has passed typecheck and tests.

#!/usr/bin/env bash
# codemod-batch.sh <target directories>
set -euo pipefail
 
MAX_FILES_PER_BATCH=25   # files a single batch may touch
RULE="rules/migrate-logevent-safe.yml"
 
for dir in "$@"; do
  echo "▶ batch: $dir"
 
  # 1) count matching files in this directory
  hit=$(ast-grep scan -r "$RULE" "$dir" --json | jq '[.[].file] | unique | length')
  if [ "$hit" -eq 0 ]; then echo "  skip (0 hits)"; continue; fi
  if [ "$hit" -gt "$MAX_FILES_PER_BATCH" ]; then
    echo "  ✋ $hit files > cap $MAX_FILES_PER_BATCH. Re-run on a smaller path."
    exit 2
  fi
 
  # 2) apply the mechanical replacement
  ast-grep scan -r "$RULE" "$dir" --update-all
 
  # 3) verification gate (no commit unless this passes)
  npm run typecheck
  npm run test -- --findRelatedTests "$dir" --passWithNoTests
 
  # 4) commit only this batch
  git add "$dir"
  git commit -m "codemod: migrate logEvent→track in $dir ($hit files)"
done
 
# 5) show what remains (including layer-two warnings)
echo "── remaining (needs a human) ──"
ast-grep scan -r rules/flag-logevent-in-loop.yml . --json | jq 'length'

The driver stops a failing batch while it is still uncommitted. A "commit advancing in a broken state" cannot happen by construction. You isolate the failing directory, investigate, fix, and resume.

Why I scope tests to "related only"

Running the full suite on every batch is not practical at this scale. I run only the tests related to the changed files mid-flight, then run the full suite once at the end of the batch run. Final pre-production confidence comes from the whole suite; speed wins in between.

Dry-run the counts before applying

Before ever running --update-all, I run ast-grep scan with --json only and look at the per-directory hit counts. If a number comes back far higher than expected, that is a signal the rule is too broad and is catching sites you never meant to touch. Once, an expected 230 files ballooned to 400 because the pattern also matched a same-named method from another library. Catching the count anomaly at the dry-run stage avoids a painful rollback after the fact.

The constraint you hand the agent: no giant diffs

This part mattered most. State three rules to the agent explicitly:

Run replacements via codemod-batch.sh per directory. Never --update-all the whole repo at once.
If a commit would exceed MAX_FILES_PER_BATCH, split into a smaller path and resubmit.
Do not replace sites flagged by the layer-two rule; report file and line as a list.

With these in place, the agent stops saying "all done" and starts saying "I processed these directories, each commit verified. Seven warnings, list attached." The reviewer confirms only the tests for the formulaic batches and concentrates on the seven warnings.

The more freedom the agent has, the easier it slides back to a giant diff, so I also write these three into AGENTS.md and have it read them every time.

One-commit replacement vs. verified batches

Here is the difference along the axes that actually bit me.

Axis	One-commit replacement	Verified batches
Review	Eyeball 1,400 sites (not feasible)	Focus on a few warnings
Isolating a break	Unclear which batch caused it	Investigate only the failing directory
Rollback	Revert the whole thing	Revert just that commit
Meaning-changing sites	Hidden among formulaic edits	Tracked separately in layer two
Confidence to ship	Too scary to release	Every commit is verified

In my own work, regrouping into batches after the rejection cut my review time by roughly 2.5x. The count did not shrink; what a human had to look at narrowed from "1,380 formulaic sites" to "a dozen sites to judge."

A verification matrix: how to trust the replacement

Confidence in a mechanical replacement is measured by "did not break," not "did change." These are the axes I check on every batch.

Axis	Method	What a failure means
Type integrity	npm run typecheck	Argument types mismatch the wrapper
Related tests	--findRelatedTests	Behavior shifted somewhere
Idempotency	Re-run the rule, expect 0 hits	Missed sites or double replacement
Layer-two backlog	JSON count of the flag rule	Human judgment still pending

The idempotency check is humble but important. Run the same rule again after replacing; if it reports zero, you can at least say there are no "missed sites of the same pattern" and no accidental double-wrapping like track(track(...)).

How I settled here, and the next step

I, too, started out thinking "it is a mechanical edit, just do it all at once." Standing in front of the giant diff taught me that correctness and reviewability are different things. The gap bites hardest exactly when you touch code whose behavior depends on consent state, like the tracking around AdMob.

If you want to try it, run codemod-batch.sh once on your smallest directory first. Once you feel each commit stack up having passed typecheck and tests, the rest is just widening the scope. Tell the agent only this: "verified, small, leave the warnings and report them" — and the giant diff stops coming back.

If you also hand large mechanical replacements to AI, I hope this helps you ship them to production with a little more peace of mind.

Thank You for Reading

Antigravity Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.