On Cutover Day to the Antigravity CLI, Verify Production Automation by Side-Effect Equivalence, Not Output
On the day you switch from the Gemini CLI to the Antigravity CLI, verify production automation by the equivalence of side effects — files written, commits, network calls — instead of matching stdout. A sandbox parallel run and a go/no-go cutover gate, with implementation steps.
Today, June 18, the Gemini CLI and the Gemini Code Assist IDE extension end request service for free personal use and for AI Pro / Ultra, consolidating into the successor Antigravity CLI. For anyone who has built automation around the CLI, today is the moment of switching.
Here is what many migration checks reach for: comparing old and new stdout to see "does the same string come out." I myself thought that was enough at first. But automation creates value not in stdout but in side effects — the parts that write files, make commits, and call external APIs. Even if the output is identical to the character, production breaks if a write target shifts by one. So verification should be done by the equivalence of side effects, not output.
Why matching output is not enough
A CLI agent's job is usually not to print something to the screen. It generates files in a repo, commits changes, and calls a deployment API. Stdout is just the log that streams along the way.
In other words, matching output only guarantees "they wrote the same thing to the log." Even if the order or arguments of the tool calls shifted subtly inside the new CLI, similar logs let it slip by. I fell into this trap once and lived through a near-incident where the output was identical but the path of the written file alone differed.
Capture side effects on three surfaces
Side effects are easier to handle split into three observable surfaces. If these three match between old and new, you can say the behavior will be the same in production.
Surface
What to observe
How to capture
Files
Paths and contents created, changed, deleted
Before/after snapshot diff
Git
Diff and message of generated commits
git diff and git log
Network
Destination host, method, count
Egress proxy access log
Of these, the files surface causes the most accidents. Output verification simply cannot catch it.
✦
Thank you for reading this far.
Continue Reading
What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.
WHAT YOU'LL LEARN
✦Why to verify a migration by side-effect equivalence rather than stdout, and its three observable surfaces
✦An implementation that runs old and new CLIs on identical input in a sandbox and diffs what they wrote
✦A gate that checks network-call equivalence and mechanically decides whether to cut over
Secure payment via Stripe · Cancel anytime
✦
Unlock This Article
Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.
Verify in a working directory isolated from production. Give the same input (the prompt and a copy of the target repo) to each of the old and new CLIs, and compare state when done.
The point is to start each run from a pristine copy. Mix in leftovers from a prior run and you can no longer tell whether a diff comes from the CLI difference or from execution order.
Diff what the CLIs wrote
After the runs, compare the two sandboxes wholesale. diff -r is enough. If there is zero diff here, the file-surface side effects are equivalent.
# File-surface equivalence (exclude .git; compared separately)if diff -r --exclude=.git sandbox_old sandbox_new > file_diff.txt; then echo "File side effects: equivalent"else echo "File side effects differ:" head -40 file_diff.txtfi# Git surface: compare the diff of generated commits( cd sandbox_old && git diff HEAD~1 HEAD ) > old.patch 2>/dev/null || true( cd sandbox_new && git diff HEAD~1 HEAD ) > new.patch 2>/dev/null || truediff old.patch new.patch && echo "Git side effects: equivalent" || echo "Commit contents differ"
The first time I ran this diff, I hit a difference where the body matched perfectly but the line endings split between CRLF and LF. Output verification would have missed it for certain. Exactly these details break the build in production.
Confirm network-call equivalence
The third surface is network. If the destinations or counts the old and new CLIs hit change, it bears directly on rate limits and billing. Slipping a single egress proxy in front and comparing access logs is the reliable way.
Check whether the set of destination hosts and methods matches, and whether the call counts have not drifted far. In my experience, a drift of more than 20% in count is a sign that the internal tool-call design changed. Cut over without equivalence and you will step on unexpected billing or rate limits. It is fine to be strict here.
A gate that mechanically decides cutover
Finally, fold the three surfaces into a single verdict. The day's cutover is decided by the gate's pass/fail, not by a person's "probably fine."
Is the file-surface diff -r zero?
Do the Git commit diffs match?
Does the network destination set match, with count drift within 20%?
If all three are green, cutover is allowed. If even one is red, hold, pin down the cause of the diff, then re-verify.
Only what passes this gate gets swapped into the production schedule. I switched the Dolice Labs automation one job at a time with this procedure, and stopped two jobs in advance whose output was identical but whose side effects had split. Had I judged by output alone, those two would have broken straight in production.
Chased by a switching deadline, you are tempted to settle for "it ran, so it's fine." But what protects production is not the feel of it running — it is the evidence that the side effects are equivalent. For today's cutover, run even one job through this gate first. Swap it in only after green: that order is what prevents accidents.
Share
Thank You for Reading
Antigravity Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.