ANTIGRAVITY LABJP
Articles/Agents & Manager
Agents & Manager/2026-06-27Advanced

Pin Your Agent's Output With Golden Snapshots Before Switching Models

When Antigravity's engine moves to Gemini 3.5 Flash, an agent's output can drift silently. This walks through a golden-snapshot regression gate that catches the drift, with the actual test code and a migration-day checklist.

Antigravity279agent-design9regression2testing14model-migration

Premium Article

The morning the engine moved to a new model, the agent received yesterday's prompt and returned output that was slightly different from yesterday. No errors. Logs clean. Yet one tags entry had quietly vanished from the generated front matter.

I run several sites on my own, with a setup where an agent drafts content overnight. If output breaks, I notice. What I fear is output drifting in a direction that is neither clearly better nor worse. Even moving to a model as fast and capable as Gemini 3.5 Flash, this quiet drift is guaranteed.

Why output changes silently during a model migration

The drift travels by three paths.

First, format wobble. Given the same "return JSON" instruction, a new model may change key ordering or how it treats empty arrays. Second, the habit of omission. Smarter models decide "this is obvious, I'll skip it" and drop fields they used to state explicitly. Third, tone. The length of a summary or the firmness of an assertion shifts, falling outside the length range the downstream step assumed.

Each is small on its own, and the test keeps printing "pass." That is exactly why you need to pin the pre-migration output properly, once.

The idea of a golden snapshot

A golden snapshot is output that a human has, at this point in time, certified as correct, saved as a file. From then on you compare the agent's output against this saved answer.

The key is not to aim for an exact string match. Generative output wobbles by nature. What you pin is not the surface string but the structure and invariants the downstream depends on. For example: "the front matter has all seven required keys," "the body has at least six H2s," "internal links point only to articles that actually exist."

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN
The three paths by which output drifts during a model migration, and which parts to pin with a snapshot
How to write a golden test that judges by structure and invariants rather than exact match, so it survives in production
The concrete pin, diff, approve steps to run on the day you switch to Gemini 3.5 Flash
Secure payment via Stripe · Cancel anytime

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

or
Unlock all articles with Membership →
Share

Thank You for Reading

Antigravity Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

Related Articles

Agents & Manager2026-06-27
Keep a Tamper-Evident Audit Log of Your Autonomous Agent's Actions
To record the decisions and actions an Antigravity agent takes autonomously in a form you can trace and verify later, design an append-only audit log whose hash chain detects tampering. Includes the implementation.
Agents & Manager2026-06-27
Turning a throwaway prompt into a reusable sub-agent
When a one-off prompt to an Antigravity 2.0 dynamic sub-agent works beautifully, it usually vanishes into your chat history. Here is how to capture it as a reusable definition, with the actual file layout and the distillation steps.
Agents & Manager2026-06-27
When Your Agent Automation Breaks: How Many Minutes to Recovery?
As Antigravity 2.0 adds desktop, CLI, and SDK surfaces, the things you must restore after a failure multiply too. As an indie developer running several sites on autopilot, I lay out a three-layer recovery design covering credentials, definitions, and state, plus a monthly restore drill.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →