ANTIGRAVITY LABJP
Articles/Agents & Manager
Agents & Manager/2026-06-29Advanced

Spotting Agents That Are Alive but Stuck — Designing a Progress Heartbeat and Watchdog

The process is alive but the work isn't moving — the nastiest state for a background agent. Here is how to switch from liveness to progress monitoring to detect it, and how to stop it safely, with working code.

background agents3Antigravity288watchdog2timeout designoperations19

Premium Article

In the morning, the dashboard was still green. The process was running, it was eating a little CPU, the last log line read "querying the model" — and yet not a single character had advanced in six hours.

As an indie developer, I run an article-generation agent overnight. That night an external API never responded, never even entered retry, and just kept waiting. The process was genuinely alive. So the monitoring that watched "is it alive" kept returning green the whole time. The mistake was treating liveness and progress as the same thing.

"Alive" and "progressing" are different questions

Out of server-monitoring habit, we reach for "does the process exist" and "does the port respond" as health signals. For short requests that is enough. But a background agent can spend minutes to tens of minutes on a single job, and throughout it can be "perfectly present while advancing nothing."

The classic stalls are an external call that never returns, an infinite reasoning loop in front of an unsolvable dependency, and a deadlock where it waits on itself. In all of them the process is alive. Liveness monitoring catches none of them. What you need is progress monitoring.

Kind of monitoringQuestion it answersCatches a stall?
Liveness (process exists, port responds)Is it running?No
Log output presentIs it talking?Fooled by endless "waiting…"
Progress (amount advanced)Did work move forward?Yes

Judging by whether logs are flowing is also dangerous. Code that keeps printing "waiting…" is talking but not advancing. Watch the amount of forward motion, not the chattiness.

Emitting progress as a marker

To watch by progress, the agent itself must leave "I got this far" in an externally observable form. Write a monotonically increasing step number and the time it last advanced to a shared location.

# progress.py — stamp a progress marker into a shared file
import json, os, tempfile, time
 
def mark_progress(path: str, step: int, note: str = "") -> None:
    """step is monotonic. Call only when you advanced (not while waiting)."""
    payload = {"step": step, "note": note, "ts": int(time.time())}
    fd, tmp = tempfile.mkstemp(dir=os.path.dirname(path) or ".")
    with os.fdopen(fd, "w") as f:
        json.dump(payload, f)
    os.replace(tmp, path)        # atomic swap

The discipline that matters: call mark_progress only when you advanced. If you design the heartbeat to "always fire on a timer," the pulse continues even while stalled, and you are back to liveness monitoring. A pulse must mean "moved forward," not "still alive."

In the agent body, advance the step at meaningful boundaries.

mark_progress(P, 1, "collection done")
# ... external API call (this is where it tends to freeze) ...
mark_progress(P, 2, "draft generated")
# ... formatting ...
mark_progress(P, 3, "verification passed")

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN
Understand that 'is it alive' and 'is it progressing' are entirely different questions, and why liveness monitoring alone misses a silent stall
A complete implementation from progress heartbeat to no-progress timeout to safe termination (a Python watchdog plus a bash integration). Drop it straight into your own job
From the real experience of running an agent overnight as an indie developer and finding it frozen silently until morning, a clear rule for where to place progress markers
Secure payment via Stripe · Cancel anytime

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

or
Unlock all articles with Membership →
Share

Thank You for Reading

Antigravity Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

Related Articles

Agents & Manager2026-06-29
When Background Agents Run Twice — Stopping Double Execution with Leases and Fencing Tokens
The same scheduled job fires from two machines at once and they overwrite each other's output. Here is how to stop that failure mode at the root in Antigravity 2.0 background agents, using leases and fencing tokens, with working code.
Agents & Manager2026-06-28
Treating Built-in Guide Skills as Design Assets, Not Throwaway Prompts
Antigravity v2.2.1 added built-in Guide skills. Here is a concrete structure and set of judgment calls for running them as version-controlled, shared design assets instead of one-off instructions.
Agents & Manager2026-06-27
Keep a Tamper-Evident Audit Log of Your Autonomous Agent's Actions
To record the decisions and actions an Antigravity agent takes autonomously in a form you can trace and verify later, design an append-only audit log whose hash chain detects tampering. Includes the implementation.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →