ANTIGRAVITY LABJP
Articles/Integrations
Integrations/2026-06-15Advanced

Regression-Testing Antigravity Agent Output in CI

Agent output drifts between identical runs and turns CI red for no real reason. Here is how I stabilized snapshot regression testing for Antigravity agents using a normalization layer and pytest golden files, drawn from running it in my own indie developer CI.

antigravity350citesting9agent14pytest

Premium Article

One morning, an agent I had moved onto a schedule produced something subtly off. The prompt was identical to the day before, yet the output had shifted. Running it twice locally gave me two different diffs. My CI snapshot test went red, as expected, but that red told me nothing: was it broken, or had it just drifted?

When you are an indie developer handing several sites to agents, this "drifting red" is the worst kind. It quietly hides real regressions. Here is the setup I built to regression-test Antigravity agent output reliably in CI, step by step.

Two identical runs, two different diffs

My first attempt was the naive one: save the output to a file and compare with git diff. That collapsed within half a day.

Agent output always contains fragments that are semantically equivalent but textually different every time: generation timestamps, run IDs, temp file paths, list ordering, JSON key order. Compare them raw, and a meaningful regression sits in the same diff pile as meaningless drift.

So the problem was never "comparing." It was "removing drift before comparing."

Why agent output fights snapshot testing

Snapshot testing itself is well proven for UI components and API responses. You record an expected value once, then assert against it on later runs.

It looks like a bad fit for agents only because three kinds of non-determinism live together in the output. The first is environmental drift (time, IDs, paths). The second is ordering drift (a set returned as an array). The third is model paraphrasing (the same intent in different words).

The first two are mechanically removable. Only the third needs a different matching strategy. Without this separation, you jump to the wrong conclusion that "snapshots are impossible." In reality you flatten what is flattenable, then apply meaning-based checks only to what remains.

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN
A concrete pytest workflow for regression-testing non-deterministic agent output
A normalization layer that flattens drifting values like timestamps and UUIDs
Operational tactics to cut CI flake rate with 3 retries and a 5% diff threshold
Secure payment via Stripe · Cancel anytime

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

or
Unlock all articles with Membership →
Share

Thank You for Reading

Antigravity Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

Related Articles

Integrations2026-03-27
Build an AI Workflow to Auto-Generate Code from GitHub Issues with Antigravity
Learn how to connect Antigravity's AI agent with GitHub Issues to automatically generate code from issue descriptions. Covers MCP server setup, AGENTS.md configuration, and practical workflow patterns.
Integrations2026-05-30
Handing Crashlytics Stack Traces to Antigravity — Three Weeks Across Four Apps
Paste a Crashlytics stack trace into Antigravity, let it narrow the cause, and drive the fix to the finish. After three weeks across four wallpaper apps, here is what I learned to delegate and what I kept for myself.
Integrations2026-05-28
Recovering Missing iOS dSYMs Across Six Apps with Antigravity's Background Agent — A Four-Week Operations Log
After migrating Firebase iOS SDK from CocoaPods to SPM, six wallpaper apps saw their Crashlytics symbolication rate collapse to as low as 4.2%. This is a four-week operations log of letting Antigravity's Background Agent track, upload, and verify 1,847 missing dSYMs across six repositories — including App Store Connect dSYM polling and AdMob revenue reconciliation.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →