ANTIGRAVITY LABJP
Articles/Agents & Manager
Agents & Manager/2026-07-05Advanced

Protecting Your Agent Stack's Known-Good State with a Single Lockfile — Change-Budget Design for an Era of Simultaneously Moving Parts

When the IDE build, CLI, model, and dependencies all move at once, you can no longer tell which one caused a regression. Here is a change-budget design that pins your known-good state to one lockfile, with working code and operational logs.

antigravity415agents121reproducibility5change-managementproduction70

Premium Article

On July 1st, Antigravity shipped v2.2.1, v2.1.4, and v2.0.11 on the same day — a setup where you can pick between a stable line and a feature line. Around the same time, my local CLI moved from the Gemini CLI to the Antigravity CLI, and the default model quietly updated.

As an indie developer, I run daily article generation for four technical blogs, and I hand that work to agents. One morning, the tables in a generated article came out broken. I went to investigate, and my hand stopped over the keyboard.

Between the last known-good run and that morning, too many parts had moved. The IDE build had bumped, the CLI had swapped to a different implementation, the model had refreshed, and two dependency packages had gone up. Which one broke it? There was nothing to bisect from.

This article is the change-budget design I rebuilt out of that morning's regret. It is not about flashy automation. It is about the quiet foundation that lets you find the cause of a regression quickly, in an environment where the number of moving parts keeps growing.

When moving parts multiply, you lose track of what broke it

An agent development stack becomes, almost without notice, a collection of independently moving axes. At minimum, these four update on their own separate schedules.

Moving axisWho updates itFelt cadence
Antigravity IDE buildOfficial releases (stable and feature lines in parallel)Weekly
Antigravity CLIUpdates tied to tool consolidation and migrationIrregular, bursty during migration
Default modelPlatform-side swapsQuiet, without notice
Dependency packagesYour own npm update or auto-updatesAnytime

The trouble is that these matter in combination. With four axes, the number of possible states grows exponentially. Even if each axis moves just once in a week, the gap from your last known-good run is already four axes wide.

When a regression hits, we try to reconstruct what changed. But when four axes of change are stacked on top of each other at once, the very starting point for a bisection is gone. There are too many suspects, and you fall back on intuition.

In unattended runs, this gets heavier still. If a pipeline you ran overnight produces broken output, and you never captured a snapshot of the environment at that moment, then by morning all you have is "the environment as it is now." The state at the moment it broke is no longer reproducible.

Pin the known-good state to a single lockfile

The first thing to do is write down the environment of the last run that worked, in one file. Take the same idea that an application's package-lock.json uses to pin dependencies, and extend it across the whole stack.

I keep one file named stack.known-good.yml at the root of the repository.

# stack.known-good.yml
# Records the last state that ran to completion correctly with "this combination"
known_good:
  captured_at: "2026-07-01T09:00+09:00"
  note: "Article generation across 4 sites finished with JA=EN parity and no broken tables"
  axes:
    antigravity_ide: "2.0.11"
    antigravity_cli: "0.4.2"
    default_model: "gemini-3.5-flash-2026-06"
    node: "22.14.0"
  deps:
    "@antigravity/sdk": "2.3.1"
    "yaml": "2.6.1"
# Change budget: the max number of axes you may move at once from the last known-good state
change_budget: 1

Three things matter here.

First, record each axis's version in a form a human can read. A hash alone will not help you remember "which model was that." Add a date and a short note.

Second, write the condition that declares "good" in the note. What counts as "good" differs per operation. For me it is "the article counts match across Japanese and English, and it finished without broken tables." Putting that criterion into words keeps your judgment steady the next time you update.

Third, keep the change budget in the same file. When the budget lives alongside the environment, it is visible at a glance during review.

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN
A preflight that pins the IDE build, CLI, model, and dependency versions to one lockfile and counts drifted axes before an unattended run
The idea of a change budget that limits you to moving one axis at a time, plus a bisection procedure to isolate the culprit when a regression appears
Real logs from running daily article generation across four sites, showing how time-to-diagnosis changed
Secure payment via Stripe · Cancel anytime

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

or
Unlock all articles with Membership →
Share

Thank You for Reading

Antigravity Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

Related Articles

Agents & Manager2026-05-29
Supervising Long-Running Antigravity Agents — Watchdog and Tiered Recovery
Eight weeks of running AdMob revenue optimization on Antigravity background agents revealed three quiet failure modes. Here is the watchdog plus tiered recovery design I landed on.
Agents & Manager2026-05-27
Record & Replay for Antigravity Agents — A Production Pattern to Reproduce Failures in 3 Minutes
How to deterministically replay a failed Antigravity Agent run offline, drawn from a month of running it across four production sites. Covers boundary recording, R2 + KV storage costs, PII masking, and a working TypeScript harness.
Agents & Manager2026-05-25
Cost Attribution for Antigravity Agents — A Showback Architecture That Maps Execution Cost Back to Tenants Across Multi-Product Operations
A multi-tenant Showback architecture for Antigravity agents running across multiple products, with the schema, propagation patterns, and seven months of production numbers from running 4 sites and 6 apps in parallel.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →