ANTIGRAVITY LABJP
Articles/Tips & Best Practices
Tips & Best Practices/2026-06-28Advanced

Confirm Your Unattended Runs Survived a Point Release in 30 Seconds

As Antigravity ships point releases at a fast clip, how do you stop scheduled runs from quietly breaking after an update? Here is a lightweight smoke test that detects regressions against a golden output and rolls back automatically when it diverges.

Antigravity285CLI3unattended runsregression testing2indie developer10

"The nightly batch that ran fine yesterday came up empty this morning." During the June week when Antigravity shipped v2.2.1, v2.1.4, and v2.0.11 back to back, I ran straight into exactly this. Interactively, nothing looked different — but my scheduled run, which calls agy headless, was failing downstream because a subtle change in the output format broke the parser.

Reading the official changelog every morning isn't realistic, and even when you do, it won't tell you whether a change affects your script. What you need is a mechanism that decides, mechanically and quickly, whether the output of your own use case changed across an update. Below, we build a lightweight smoke test that detects regressions against a golden output and automatically reverts to the previous version when something breaks.

Why the breakage is so quiet

Point releases are dangerous precisely because you don't brace for them, unlike a major upgrade. They're mostly bug fixes and performance work, so it never occurs to you that your automation might stop. Yet what matters in unattended operation isn't the presence of features but the fine details of the output.

Here's what actually bit me: a progress line appeared at the top of the headless output, JSON formatting shifted so the trailing newline came and went, and the exit code stayed 0 while the body came back empty. None of these are noticeable in the interactive UI. But for a downstream stage consuming the output with jq or a regex, they're fatal. If you judge success by exit code alone, you'll pass an empty body forward as a "success."

So the boundary to defend is not "did the command succeed" but "did output come back in the usual shape." Frame it that way and the fix becomes far more concrete.

Capture a golden output once

The first thing to do is record, just once, the output of a representative input on the "current version" that runs stably. Demanding an exact match breaks on dates and random values every time, so normalize the volatile parts before comparing.

#!/usr/bin/env bash
# capture_golden.sh — save a baseline from the version that works today
set -euo pipefail
 
GOLDEN_DIR="${HOME}/.agy_smoke"
mkdir -p "$GOLDEN_DIR"
 
# reproduce the same headless call as production (swap to match your batch)
run_case() {
  agy run --headless --no-tty \
      --prompt-file "$1" \
      --output json 2>/dev/null
}
 
# mask volatile values (dates, UUIDs, elapsed time) so comparison is stable
normalize() {
  sed -E \
    -e 's/[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9:.]+Z?/<TS>/g' \
    -e 's/[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}/<UUID>/g' \
    -e 's/"elapsed_ms":[0-9]+/"elapsed_ms":<N>/g'
}
 
agy --version > "$GOLDEN_DIR/version.txt"
for f in cases/*.prompt; do
  name="$(basename "$f" .prompt)"
  run_case "$f" | normalize > "$GOLDEN_DIR/${name}.golden"
  echo "captured: $name"
done
echo "golden version: $(cat "$GOLDEN_DIR/version.txt")"

The important thing is to make the smoke inputs cases/*.prompt the same kind as your production batch. I keep only three representative cases — article generation, summarization, and JSON extraction. I don't aim for coverage; the policy is to thinly defend only the paths that hurt most if they break. Too many cases make each check heavy, and you end up never running it.

Smoke before the update, roll back if it fails

With a baseline in place, run the same cases again right before an update and compare against the golden. If a structural difference appears, treat it as a likely sign that the update breaks your use case, and stop.

#!/usr/bin/env bash
# smoke_check.sh — diff current output against the golden. exit 1 on drift
set -euo pipefail
GOLDEN_DIR="${HOME}/.agy_smoke"
 
run_case() { agy run --headless --no-tty --prompt-file "$1" --output json 2>/dev/null; }
normalize() {
  sed -E \
    -e 's/[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9:.]+Z?/<TS>/g' \
    -e 's/[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}/<UUID>/g' \
    -e 's/"elapsed_ms":[0-9]+/"elapsed_ms":<N>/g'
}
 
fail=0
for g in "$GOLDEN_DIR"/*.golden; do
  name="$(basename "$g" .golden)"
  current="$(run_case "cases/${name}.prompt" | normalize)"
 
  # 1) body must be non-empty (reject empty returns even on exit code 0)
  if [ -z "$(echo "$current" | jq -r '.text // empty' 2>/dev/null)" ]; then
    echo "✗ ${name}: empty body"
    fail=1; continue
  fi
  # 2) structural diff (detect key-set changes; ignore value contents)
  ref_keys="$(cat "$g" | jq -S 'keys' 2>/dev/null || echo '[]')"
  cur_keys="$(echo "$current" | jq -S 'keys' 2>/dev/null || echo '[]')"
  if [ "$ref_keys" != "$cur_keys" ]; then
    echo "✗ ${name}: schema drift"
    diff <(echo "$ref_keys") <(echo "$cur_keys") || true
    fail=1; continue
  fi
  echo "✓ ${name}"
done
exit "$fail"

There's a reason I narrow the comparison to "key-set changes." The body text naturally drifts with each update, so checking for an exact match turns the smoke red every time and erodes trust in it. What the downstream parser actually depends on is the output's structure — which keys exist — so I'm strict only there. The empty-body check sits separately to catch the quietest failure mode of all: exit code 0 with empty content.

Make update and verification one flow

Finally, fold version pinning, the update, the smoke, and automatic rollback into one script. You're turning the rule of thumb "phased migration is safer" into a fixed procedure.

#!/usr/bin/env bash
# guarded_upgrade.sh — commit the update only if smoke passes, else revert
set -euo pipefail
GOLDEN_DIR="${HOME}/.agy_smoke"
 
prev="$(agy --version | awk '{print $NF}')"
echo "current: $prev"
 
# 1) pre-update smoke (prove the current environment is healthy first)
if ! ./smoke_check.sh; then
  echo "Already red before updating. Don't update until you re-baseline."; exit 1
fi
 
# 2) update (pin the version explicitly; avoid auto-tracking latest)
target="${1:?usage: guarded_upgrade.sh <version>}"
agy self update --version "$target"
 
# 3) post-update smoke. roll back immediately if it fails
if ./smoke_check.sh; then
  echo "✅ $target passed smoke. Committing."
  agy --version > "$GOLDEN_DIR/version.txt"
else
  echo "🚨 $target failed smoke. Reverting to $prev."
  agy self update --version "$prev"
  ./smoke_check.sh && echo "rollback complete (healthy on $prev)"
  exit 1
fi

The option names and subcommands for agy self update vary by environment and version, so confirm the actual syntax with agy self update --help before wiring it in. The core idea is "don't auto-track latest; explicitly pin only the version that passed verification" — adapt the command details and it works as is.

Since putting this in place, my attitude toward point releases changed. Instead of postponing updates out of fear or jumping to the latest unguarded, I take it in if a 30-second smoke passes and revert if it fails. Running the several Dolice Labs sites alone, I can't spend time deliberating over each update, so this "proceed if it passes, revert if it fails" automation paid off.

Shift to defending small

Against the uncertainty of point releases, reading the changelog and bracing yourself doesn't last. Instead, baseline a single point — the output of your own use case — and compare it mechanically across the update. Just shifting what you defend from "did the command succeed" to "the usual output shape" catches most quiet regressions.

As a next step, pick the one path that hurts most if it breaks and capture a baseline with capture_golden.sh. Even a single case gives you a foothold to decide, in seconds, "should I take this in?" when the next point release lands. I hope it helps anyone caught between unattended operation and a steady stream of updates.

Share

Thank You for Reading

Antigravity Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

If you found this article helpful, a small tip ($1.50) would mean a lot to us. Your support helps keep this site ad-free and covers server and hosting costs.

Related Articles

Tips2026-06-28
Hearing the Audio an Agent Made, Right Inside the Conversation
A recent Antigravity point release added inline audio rendering in the conversation view. Here is how playing agent-made audio in place changes the way I audition sound assets for my apps.
Tips2026-06-27
Before Desktop and CLI Drift Apart: Put Agent Steps in One Versioned File
As Antigravity 2.0 multiplies entry points across desktop, CLI, and SDK, the instructions for the same task slowly diverge per surface. As an indie developer running several sites on autopilot, I lay out a design that consolidates the steps into a single versioned file each surface merely reads.
Tips2026-06-22
Strip Secrets Out of the Agent Logs You Keep: Designing a Redaction Layer
Once you start keeping logs from unattended agents, a token or API key eventually lands in them in plaintext. Rotating the key doesn't unmake the leaked log. This designs a redaction layer that reliably drops secrets right before the write, going beyond regex to register known secrets and mask them for certain, with working Python and field notes.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →