TAG

Reliability

12 articles

Antigravity⁶ agents⁵ antigravity⁴ Agents³ production³ idempotency² operations² Rate Limit¹ Backoff¹ unattended execution¹ file writes¹ Scheduling¹

⚙ AI Tools/2026-06-20Advanced

A Schedule That Survives 429s: Backoff and Jitter for Agent Automation

Run agents in parallel and rate-limit 429s can cascade until everything dies. Here is how to design exponential backoff and jitter so the retries themselves don't create new congestion, from an indie developer's automation setup.

◈ Agents/2026-06-20Advanced

When a Timed-Out Unattended Agent Leaves a Half-Written File Behind

When a scheduled agent gets killed on timeout, it can leave a half-written file that silently poisons the next stage. Here is the atomic write, stale-temp cleanup, and post-write content assertion I use to keep unattended pipelines from breaking.

◈ Agents/2026-06-20Advanced

Don't Lose Failed Agent Jobs: Designing a Dead-Letter and Requeue Path

Scheduled agents fail silently overnight and the work simply vanishes. Here is how to catch those failures with a dead-letter store and a staged requeue, drawn from running four sites on autopilot as an indie developer.

◈ Agents/2026-06-17Advanced

Making Managed Agent Batches Safe to Re-run: Idempotency and Checkpoints

Running overnight batches on the Antigravity 2.0 Managed Agents API makes recovery from partial failure unavoidable. Starting from a duplicate-post incident, I share the implementation of idempotency keys, a checkpoint store, and resume logic, with real numbers from solo operations.

◈ Agents/2026-06-16Advanced

When Your Antigravity Agent Eval Gate Keeps Flickering — Build Notes on Pass/Fail That Survives Non-Determinism

Same code, yet the eval passes in the morning and fails by noon. The first thing that breaks when you put agent evaluation into CI on Antigravity is the stability of the verdict. Here's how I separate noise from real regression and lock down pass/fail in code.

◈ Agents/2026-06-13Advanced

When a Scheduled Agent Runs Twice — Designing for Idempotency Against Overlap and Retry

A scheduled agent can do the same work twice when the next run triggers before the last one finishes. Here is a design with an overlap lock and an idempotency guard that survives mid-run failures, drawn from a double-publish incident I ran into in production.

◈ Agents/2026-06-03Advanced

Delegate the Undoable, Guard the Irreversible — Tiering Agent Autonomy by Reversibility

When you hand production work to an Antigravity agent, the thing that bites first isn't intelligence — it's whether the operation can be undone. Here is a design that sorts every operation into three reversibility tiers and routes each to auto-execution, checkpointed execution, or a human gate, with TypeScript implementations and real numbers from running six apps in parallel.

◈ Agents/2026-06-02Advanced

Rehearsing an Agent's Actions Before They Touch Production — Designing a Zero-Side-Effect Dry-Run Layer

Some accidents survive shadow mode and canaries: the very first time an agent touches an external API. This is the design and TypeScript implementation of a zero-side-effect dry-run layer you can bolt onto Antigravity's parallel agents, with the real numbers from running six sites autonomously.

◈ Agents/2026-04-29Advanced

Reliability

A Schedule That Survives 429s: Backoff and Jitter for Agent Automation

When a Timed-Out Unattended Agent Leaves a Half-Written File Behind

Don't Lose Failed Agent Jobs: Designing a Dead-Letter and Requeue Path

Making Managed Agent Batches Safe to Re-run: Idempotency and Checkpoints

When Your Antigravity Agent Eval Gate Keeps Flickering — Build Notes on Pass/Fail That Survives Non-Determinism

When a Scheduled Agent Runs Twice — Designing for Idempotency Against Overlap and Retry

Delegate the Undoable, Guard the Irreversible — Tiering Agent Autonomy by Reversibility

Rehearsing an Agent's Actions Before They Touch Production — Designing a Zero-Side-Effect Dry-Run Layer

Teaching Antigravity Agents to Learn from Failure — A Solo Developer's Loop for Reusing Failure History

SRE for Antigravity Agents — Taming Probabilistic Systems with SLOs and Error Budgets

Solving the Reliability Problem in Vibe Coding — Antigravity Artifacts Verification Guide

Production-Ready AI Agent Design: Orchestration Patterns and Reliability