Turning Last Night's Failed Runs into Tomorrow's Prevention — Designing a Postmortem Feedback Loop

Stop letting unattended failures end at a notification. A concrete design for classifying failures and feeding fixes back into Guide skills, gates, and schedules, with measured recurrence rates.

antigravity⁴⁰⁹ agents¹¹⁷ postmortem automation-operations unattended-runs

✦ Premium Article

An agent run scheduled for 2 a.m. fails, and all that greets you in the morning is a notification. You skim the log, mutter "timeout again," patch something, rerun it, and move on. A few days later a suspiciously similar failure shows up in a different task. As an indie developer running several unattended jobs every night, I lived in that loop longer than I'd like to admit.

The diagnosis is simple. I was responding to failures, but nothing carried the lesson back into my prompts, gates, or schedules. There was no return path.

This article is about building that return path as a fixed piece of machinery. Incident response itself — detection, mitigation, recovery — is covered in Designing Production Incident Runbooks for Antigravity Agents: A Practical Framework from Detection to Recovery, so here I focus strictly on what happens after recovery: making sure the same failure cannot come back unchanged.

Why response and review must be separated

Right after a failure, you are in "just make it pass" mode. Once the rerun succeeds, the motivation to record a root cause evaporates.

So I split the roles by time of day. At night, the only automated reactions are a retry and a notification. Classification and correction happen in a fixed five-minute slot the next morning. Since adopting that separation, the quality of my follow-ups stopped depending on how sleepy I was.

Thinning out the immediate response only works if every run leaves machine-readable evidence behind. That is the foundation.

Recording evidence as a run record

Every run, pass or fail, writes one JSON file on exit. Mine looks like this:

{
  "task": "site-a-premium-article",
  "startedAt": "2026-07-02T02:00:11+09:00",
  "endedAt": "2026-07-02T02:14:52+09:00",
  "exitCode": 1,
  "phase": "quality-gate",
  "lastOutputTail": "templating_gate: duplicated paragraph detected ...",
  "configHash": "9f2c31a",
  "modelUsed": "gemini-3.5-flash",
  "retryCount": 1
}

The field that earns its keep is phase. Slice the run into stages — prepare, generate, quality gate, push, log — and record where it died. Most of the classification below falls out of that one field.

configHash is a hash over the prompt and config files together. It exists to answer "did failures spike right after I changed the config?" — a question it has settled for me twice already.

The record is written by a wrapper script using a trap:

#!/usr/bin/env bash
# run-with-record.sh <task-name> <command...>
TASK="$1"; shift
REC_DIR="$HOME/.agent-runs/$(date +%Y-%m-%d)"
mkdir -p "$REC_DIR"
START="$(date -Iseconds)"
LOG="$(mktemp)"
 
finish() {
  local code=$?
  jq -n \
    --arg task "$TASK" --arg started "$START" \
    --arg ended "$(date -Iseconds)" \
    --arg tail "$(tail -c 800 "$LOG")" \
    --arg phase "${AGENT_PHASE:-unknown}" \
    --argjson code $code \
    '{task:$task, startedAt:$started, endedAt:$ended,
      exitCode:$code, phase:$phase, lastOutputTail:$tail}' \
    > "$REC_DIR/${TASK}-$(date +%H%M%S).json"
}
trap finish EXIT
 
"$@" 2>&1 | tee "$LOG"
exit "${PIPESTATUS[0]}"

The task itself only needs to update export AGENT_PHASE=generation as it moves between stages. Existing tasks barely change.

✦

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN

✦A five-way failure taxonomy where each class maps to exactly one place to fix

✦A run-record JSON schema plus a script that turns yesterday's failures into a five-minute morning digest

✦Field data from cutting same-cause recurrence from roughly 40% to 12% over six weeks

Secure payment via Stripe · Cancel anytime

✦

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

Unlock all articles with Membership →

The five-way taxonomy and its return paths

The morning review assigns each failed record to one of five classes. The crucial design decision is that each class has exactly one predetermined place where the fix goes.

Class	Typical example	Return path
Environment	Disk full, expired auth, network outage	Add a preflight check item
Context	Missing reference file, stale repository	Preconditions section of the Guide skill
Prompt	Output format drift from ambiguous instructions	The task prompt itself
Upstream change	CLI update or model switch altering behavior	Version pinning and canary settings
By design	A gate correctly rejected output, schedule collision	Gate threshold or schedule definition

Why exactly one return path? Because if you fix three things at once, you never learn which one worked. One failure, one correction; if it does not stick, revisit the classification the following week. Unglamorous, but it converged faster than anything else I tried.

Note that the fifth class includes cases where a quality gate did its job. Those are closed as "no fix needed." Accepting up front that not every failure demands a correction keeps the review light.

The digest script that makes five minutes real

Yesterday's records get collapsed into one screen:

#!/usr/bin/env bash
# morning-digest.sh — aggregate yesterday's failed runs
DAY="${1:-$(date -d yesterday +%Y-%m-%d)}"
DIR="$HOME/.agent-runs/$DAY"
[ -d "$DIR" ] || { echo "no records for $DAY"; exit 0; }
 
echo "== $DAY failed runs =="
for f in "$DIR"/*.json; do
  jq -r 'select(.exitCode != 0) |
    "\(.task)\t phase=\(.phase)\t \(.lastOutputTail | gsub("\n"; " ") | .[0:80])"' "$f"
done | sort | uniq -c | sort -rn

The uniq -c is deliberate: if the same task failed at the same phase more than once, it floats to the top as the obvious priority.

The routine itself is fixed. Read the digest, classify each failure, edit exactly one file at the designated return path, and leave a one-line note pairing the class with the fix. That fits in five minutes. Anything that will not fit gets promoted to regular working hours instead of bloating the review.

Measuring whether the loop works

The health metric is same-cause recurrence: the share of this week's failures that match a class-and-task pair already recorded in the previous four weeks.

In my own operation — four content sites run unattended as a solo Dolice Labs project — recurrence stood at roughly 40% when I started. Four out of ten failures were reruns of something I had already seen. Six weeks after fixing the return path, it hovered around 12%, and absolute failures dropped from about nine per week to just under four.

More valuable than the numbers is the qualitative shift: eventually only novel failures occur. Novel failures point at genuine design gaps, which turns the morning review into something I almost look forward to.

If collisions between nighttime runs are the suspected culprit, the return path is the scheduling layer — the approach in Don't Build Your Own Peak: Time-Spreading Background Agent Schedules pairs well with this loop.

The next step

Start by wiring up the run record tonight and nothing else. Classification and reviews can wait until a week of evidence has accumulated. Evidence first, process second — in that order, adoption costs almost nothing.

I hope this helps if you are growing an unattended fleet of your own. Thanks for reading.

Thank You for Reading

Antigravity Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.