ANTIGRAVITY LABJP
Articles/Agents & Manager
Agents & Manager/2026-07-02Advanced

Turning Last Night's Failed Runs into Tomorrow's Prevention — Designing a Postmortem Feedback Loop

Stop letting unattended failures end at a notification. A concrete design for classifying failures and feeding fixes back into Guide skills, gates, and schedules, with measured recurrence rates.

antigravity409agents117postmortemautomation-operationsunattended-runs

Premium Article

An agent run scheduled for 2 a.m. fails, and all that greets you in the morning is a notification. You skim the log, mutter "timeout again," patch something, rerun it, and move on. A few days later a suspiciously similar failure shows up in a different task. As an indie developer running several unattended jobs every night, I lived in that loop longer than I'd like to admit.

The diagnosis is simple. I was responding to failures, but nothing carried the lesson back into my prompts, gates, or schedules. There was no return path.

This article is about building that return path as a fixed piece of machinery. Incident response itself — detection, mitigation, recovery — is covered in Designing Production Incident Runbooks for Antigravity Agents: A Practical Framework from Detection to Recovery, so here I focus strictly on what happens after recovery: making sure the same failure cannot come back unchanged.

Why response and review must be separated

Right after a failure, you are in "just make it pass" mode. Once the rerun succeeds, the motivation to record a root cause evaporates.

So I split the roles by time of day. At night, the only automated reactions are a retry and a notification. Classification and correction happen in a fixed five-minute slot the next morning. Since adopting that separation, the quality of my follow-ups stopped depending on how sleepy I was.

Thinning out the immediate response only works if every run leaves machine-readable evidence behind. That is the foundation.

Recording evidence as a run record

Every run, pass or fail, writes one JSON file on exit. Mine looks like this:

{
  "task": "site-a-premium-article",
  "startedAt": "2026-07-02T02:00:11+09:00",
  "endedAt": "2026-07-02T02:14:52+09:00",
  "exitCode": 1,
  "phase": "quality-gate",
  "lastOutputTail": "templating_gate: duplicated paragraph detected ...",
  "configHash": "9f2c31a",
  "modelUsed": "gemini-3.5-flash",
  "retryCount": 1
}

The field that earns its keep is phase. Slice the run into stages — prepare, generate, quality gate, push, log — and record where it died. Most of the classification below falls out of that one field.

configHash is a hash over the prompt and config files together. It exists to answer "did failures spike right after I changed the config?" — a question it has settled for me twice already.

The record is written by a wrapper script using a trap:

#!/usr/bin/env bash
# run-with-record.sh <task-name> <command...>
TASK="$1"; shift
REC_DIR="$HOME/.agent-runs/$(date +%Y-%m-%d)"
mkdir -p "$REC_DIR"
START="$(date -Iseconds)"
LOG="$(mktemp)"
 
finish() {
  local code=$?
  jq -n \
    --arg task "$TASK" --arg started "$START" \
    --arg ended "$(date -Iseconds)" \
    --arg tail "$(tail -c 800 "$LOG")" \
    --arg phase "${AGENT_PHASE:-unknown}" \
    --argjson code $code \
    '{task:$task, startedAt:$started, endedAt:$ended,
      exitCode:$code, phase:$phase, lastOutputTail:$tail}' \
    > "$REC_DIR/${TASK}-$(date +%H%M%S).json"
}
trap finish EXIT
 
"$@" 2>&1 | tee "$LOG"
exit "${PIPESTATUS[0]}"

The task itself only needs to update export AGENT_PHASE=generation as it moves between stages. Existing tasks barely change.

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN
A five-way failure taxonomy where each class maps to exactly one place to fix
A run-record JSON schema plus a script that turns yesterday's failures into a five-minute morning digest
Field data from cutting same-cause recurrence from roughly 40% to 12% over six weeks
Secure payment via Stripe · Cancel anytime

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

or
Unlock all articles with Membership →
Share

Thank You for Reading

Antigravity Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

Related Articles

Agents & Manager2026-07-01
When the Tech-Debt Score Drops but the Same Files Keep Breaking — Field Notes on Instrumenting Fan-in and Churn
Letting Antigravity's architecture agent score technical debt is not enough — bugs often recur in the same files after refactoring. Here is how we instrumented the fan-in times churn that static complexity misses, and reconciled the score against real incidents.
Agents & Manager2026-07-01
It Worked Interactively but Went Silent Overnight — Making an Antigravity Agent Behave the Same in the Desktop and the CLI
An agent that runs perfectly in the Antigravity desktop app but does nothing when you schedule it through the CLI. This walks through absorbing the gap between interactive and unattended runs across four points — approvals, context, secrets, and runtime — with working code and a preflight check, so one definition behaves identically on both.
Agents & Manager2026-06-29
When Parallel Sub-Agents Fight Over One API's Rate Limit: A Shared Token Bucket That Caps the Aggregate
Run Antigravity 2.0 dynamic sub-agents in parallel and each one hits the same external API independently, pushing the aggregate rate over the limit and triggering cascades of 429s. Here is a shared token bucket that caps the aggregate proactively, with working code through a Redis version.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →