On Cutover Day to the Antigravity CLI, Verify Production Automation by Side-Effect Equivalence, Not Output

On the day you switch from the Gemini CLI to the Antigravity CLI, verify production automation by the equivalence of side effects — files written, commits, network calls — instead of matching stdout. A sandbox parallel run and a go/no-go cutover gate, with implementation steps.

antigravity³⁷⁴ antigravity-cli⁶ migration⁹ ci² automation⁵¹

✦ Premium Article

Today, June 18, the Gemini CLI and the Gemini Code Assist IDE extension end request service for free personal use and for AI Pro / Ultra, consolidating into the successor Antigravity CLI. For anyone who has built automation around the CLI, today is the moment of switching.

Here is what many migration checks reach for: comparing old and new stdout to see "does the same string come out." I myself thought that was enough at first. But automation creates value not in stdout but in side effects — the parts that write files, make commits, and call external APIs. Even if the output is identical to the character, production breaks if a write target shifts by one. So verification should be done by the equivalence of side effects, not output.

Why matching output is not enough

A CLI agent's job is usually not to print something to the screen. It generates files in a repo, commits changes, and calls a deployment API. Stdout is just the log that streams along the way.

In other words, matching output only guarantees "they wrote the same thing to the log." Even if the order or arguments of the tool calls shifted subtly inside the new CLI, similar logs let it slip by. I fell into this trap once and lived through a near-incident where the output was identical but the path of the written file alone differed.

Capture side effects on three surfaces

Side effects are easier to handle split into three observable surfaces. If these three match between old and new, you can say the behavior will be the same in production.

Surface	What to observe	How to capture
Files	Paths and contents created, changed, deleted	Before/after snapshot diff
Git	Diff and message of generated commits	`git diff` and `git log`
Network	Destination host, method, count	Egress proxy access log

Of these, the files surface causes the most accidents. Output verification simply cannot catch it.

✦

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN

✦Why to verify a migration by side-effect equivalence rather than stdout, and its three observable surfaces

✦An implementation that runs old and new CLIs on identical input in a sandbox and diffs what they wrote

✦A gate that checks network-call equivalence and mechanically decides whether to cut over

Secure payment via Stripe · Cancel anytime

✦

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

Unlock all articles with Membership →

Run both CLIs on identical input in a sandbox

Verify in a working directory isolated from production. Give the same input (the prompt and a copy of the target repo) to each of the old and new CLIs, and compare state when done.

#!/usr/bin/env bash
set -euo pipefail
INPUT_PROMPT="$1"            # identical task instruction
SRC="$2"                     # target repository
 
run_one() {
  local cli="$1" outdir="$2"
  rm -rf "$outdir" && cp -r "$SRC" "$outdir"
  ( cd "$outdir" && "$cli" run --prompt "$INPUT_PROMPT" \
      > "../$(basename "$outdir").stdout" 2>&1 )
}
 
run_one gemini    sandbox_old   # old CLI
run_one agy       sandbox_new   # Antigravity CLI (successor)
echo "Both CLI runs finished. Comparing side effects."

The point is to start each run from a pristine copy. Mix in leftovers from a prior run and you can no longer tell whether a diff comes from the CLI difference or from execution order.

Diff what the CLIs wrote

After the runs, compare the two sandboxes wholesale. diff -r is enough. If there is zero diff here, the file-surface side effects are equivalent.

# File-surface equivalence (exclude .git; compared separately)
if diff -r --exclude=.git sandbox_old sandbox_new > file_diff.txt; then
  echo "File side effects: equivalent"
else
  echo "File side effects differ:"
  head -40 file_diff.txt
fi
 
# Git surface: compare the diff of generated commits
( cd sandbox_old && git diff HEAD~1 HEAD ) > old.patch 2>/dev/null || true
( cd sandbox_new && git diff HEAD~1 HEAD ) > new.patch 2>/dev/null || true
diff old.patch new.patch && echo "Git side effects: equivalent" || echo "Commit contents differ"

The first time I ran this diff, I hit a difference where the body matched perfectly but the line endings split between CRLF and LF. Output verification would have missed it for certain. Exactly these details break the build in production.

Confirm network-call equivalence

The third surface is network. If the destinations or counts the old and new CLIs hit change, it bears directly on rate limits and billing. Slipping a single egress proxy in front and comparing access logs is the reliable way.

Check whether the set of destination hosts and methods matches, and whether the call counts have not drifted far. In my experience, a drift of more than 20% in count is a sign that the internal tool-call design changed. Cut over without equivalence and you will step on unexpected billing or rate limits. It is fine to be strict here.

A gate that mechanically decides cutover

Finally, fold the three surfaces into a single verdict. The day's cutover is decided by the gate's pass/fail, not by a person's "probably fine."

Is the file-surface diff -r zero?
Do the Git commit diffs match?
Does the network destination set match, with count drift within 20%?
If all three are green, cutover is allowed. If even one is red, hold, pin down the cause of the diff, then re-verify.

Only what passes this gate gets swapped into the production schedule. I switched the Dolice Labs automation one job at a time with this procedure, and stopped two jobs in advance whose output was identical but whose side effects had split. Had I judged by output alone, those two would have broken straight in production.

Chased by a switching deadline, you are tempted to settle for "it ran, so it's fine." But what protects production is not the feel of it running — it is the evidence that the side effects are equivalent. For today's cutover, run even one job through this gate first. Swap it in only after green: that order is what prevents accidents.

Thank You for Reading

Antigravity Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.