ANTIGRAVITY LABJP
Articles/Antigravity Basics
Antigravity Basics/2026-06-16Advanced

Does the New CLI Do the Same Job? An Output-Parity Gate Before Switching to Antigravity CLI

With Gemini CLI shutting down on June 18, here is how I froze the old CLI's artifacts as a golden baseline and built a parity harness to catch regressions before cutting over to Antigravity CLI — with normalization and a go/no-go gate, in code.

Antigravity CLI6Gemini CLI11migration8CI2automation47

Premium Article

On June 18, Gemini CLI and the Gemini Code Assist IDE extension stop serving requests, and everything moves to the Go-based Antigravity CLI. If your automation was built around Gemini CLI, the real fear is not whether the command runs. You can confirm that in a couple of minutes. The fear is that it runs and quietly produces a slightly different result.

I run record generation and release prep for several sites across my indie projects through CLI-driven agents. Confirming that agy --version succeeds guarantees nothing. What I actually want to know is whether the draft structure has shifted, whether the release-note format still holds, whether asset naming follows the same rule.

So a few days before the deadline, I froze the old CLI's output as a "golden" baseline and built a gate that mechanically compares it against the new CLI. This article records the design of that parity harness, how I normalized nondeterministic output, and how I decided whether to stop or proceed — with the actual code.

"It runs" and "it produces the same result" are different claims

Migration checklists usually end at a startup check: is the binary installed, does auth pass, do the subcommands exist. Those are preconditions, not regression detection.

An agent CLI's output can change in model, prompt interpretation, and execution plan all at once. Antigravity CLI shares the same agent harness as Antigravity 2.0 desktop and is powered by Gemini 3.5 Flash — reportedly about 4x faster than competing frontier models. But being fast is unrelated to "making the same decisions as before." In fact, speed introduces a new risk: a large pile of artifacts accumulates before a human reads any of them.

That is why the unit of verification should be the final artifact, not the command's exit status — the draft itself, the build output, the files that get committed. The thing I, as both reader and operator, ultimately hold in my hands.

Compare only the final artifacts

My first mistake was comparing the full standard output. An agent's stdout mixes progress logs, reasoning summaries, and timings, so the diff is hundreds of lines every run. Regressions drown in it.

Limiting comparison to three axes made it stable.

The three axes to compare

  1. Generated file contents: the drafts, configs, and code the agent writes. This is the heart of it.
  2. Exit code and the set of side effects: whether expected files were created or not. This catches silent failures where empty output sails through with exit code 0.
  3. Structural metadata: for a draft, the heading count, code-block count, and the set of frontmatter keys. The skeleton, not every word.

I keep the stdout logs out of the comparison and save them for debugging only. Even in production publishing, I have seen "an empty page published with exit code 0," so the second axis — verifying side effects independently — is non-negotiable.

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN
How to freeze old Gemini CLI artifacts as a golden baseline and build a harness that flags a regression the instant it appears after cutover
A three-stage normalization that strips timestamps, run IDs, and model phrasing so only real regressions survive the diff
A go/no-go gate that separates mechanical diffs from semantic ones to decide whether to stop or proceed with the switch
Secure payment via Stripe · Cancel anytime

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

or
Unlock all articles with Membership →
Share

Thank You for Reading

Antigravity Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

Related Articles

Antigravity2026-06-14
A Compatibility Shim That Bridges Old Scripts to agy Before Gemini CLI Shuts Down
Ahead of the June 18 Gemini CLI shutdown, instead of rewriting every gemini call scattered across cron and CI at once, a thin compatibility shim can bridge them to the Antigravity CLI (agy). Here is the approach with working shell scripts.
Antigravity2026-06-12
Six Days Until Gemini CLI Shuts Down — Auditing Automation Dependencies and Migrating to Antigravity CLI
With Gemini CLI ending on June 18, here is a practical walkthrough for finding gemini command dependencies hiding in cron, CI, and shell scripts, then migrating and verifying them on Antigravity CLI.
Antigravity2026-05-23
Moving to the Antigravity CLI (agy): Shifting Your Scripts Off Gemini CLI Before the June 18 Shutdown — Without Downtime
A grounded walkthrough of Google's Antigravity CLI (agy): fastest setup, a no-downtime migration off Gemini CLI, a compatibility shim, the Pro vs Ultra cost break-even, and running recurring work with scheduled messages — based on actually moving my own repos across.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →