ANTIGRAVITY LABJP
Articles/Agents & Manager
Agents & Manager/2026-06-17Advanced

Tracing Parallel Agents After the Fact: Observability with Structured Logs and Spans

Running multiple agents in parallel on the Antigravity 2.0 desktop makes it impossible to tell which one is doing what. I share an observability design that drops tangled print debugging for run_ids and spans you can trace afterward, with a solo-operator implementation and numbers.

antigravity365multi-agent44observability14logging2tracing3operations14debugging14

Premium Article

One morning I had six agents running in parallel, and one of them was rewriting the wrong file. The problem was that even reading the logs, I couldn't tell which agent, at which stage, had done it. Output from all six poured into the same console in time order, and I couldn't isolate the one broken process. Triage took 40 minutes, most of it spent just hunting for the log lines worth reading.

The Antigravity 2.0 desktop runs multiple agents in parallel and schedules them in the background. As an indie developer running several apps and sites alone, that parallelism lifts productivity a lot. But the moment work goes parallel, logs stop being a linear story. Unless you design observability — the ability to trace a run after the fact — up front, parallelism turns directly into "untraceability."

Why print debugging breaks under parallelism

While you run one agent at a time, you can follow the story by reading the log top to bottom. Go parallel, and several stories interleave on the same screen. If you can't tell which agent the line you're reading belongs to, the log is noise, not information.

Worse, a failure doesn't necessarily appear at the end of the log. When a dead agent's last line sits next to a healthy agent's line, you misread an unrelated line as the cause. I wasted time on wrong fixes this way more than once.

There's only one direction out: attach to every log line, in machine-readable form, which run, which agent, and which stage it belongs to.

Tag every log with run_id and agent_id

Assign one run_id to the whole batch run and one agent_id to each agent, and include both in every log. Treat them not as human-readable text but as keys for mechanical filtering later.

interface LogContext {
  run_id: string;    // one per batch run
  agent_id: string;  // one per agent
  span?: string;     // current phase name
}
 
function createLogger(ctx: LogContext) {
  const emit = (level: string, msg: string, extra: Record<string, unknown> = {}) => {
    // One JSON per line: filterable by machine, not just grep
    console.log(JSON.stringify({
      ts: new Date().toISOString(),
      level,
      ...ctx,
      msg,
      ...extra,
    }));
  };
  return {
    info: (m: string, e?: Record<string, unknown>) => emit("info", m, e),
    error: (m: string, e?: Record<string, unknown>) => emit("error", m, e),
    child: (agent_id: string) => createLogger({ ...ctx, agent_id }),
  };
}

With one JSON per line, you can filter completely by run_id and agent_id afterward. The "six tangled" problem dissolves: filter by agent_id and only that agent's lines remain. Swapping plain-text print for JSON structured logs is the single move that changes how triage feels.

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN
Get the full TypeScript implementation of a structured logger tagged with run_id, agent_id, and span
Learn to propagate a correlation ID from parent to child agent and pinpoint the source of a failure in one minute
See the input-snapshot and dashboard rules that cut incident triage time from 40 minutes to 5 on average
Secure payment via Stripe · Cancel anytime

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

or
Unlock all articles with Membership →
Share

Thank You for Reading

Antigravity Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

Related Articles

Agents & Manager2026-06-17
Accounting for Which Agent Spent What: A Cost Attribution Design by Task
Your month-end bill is one number, but running multiple agents on Gemini 3.5 Flash hides which task ate the cost. Separate from a budget guard, I share a cost-attribution accounting design that maps usage to per-task and per-site cost, with a solo-operator implementation and numbers.
Agents & Manager2026-06-17
Making Managed Agent Batches Safe to Re-run: Idempotency and Checkpoints
Running overnight batches on the Antigravity 2.0 Managed Agents API makes recovery from partial failure unavoidable. Starting from a duplicate-post incident, I share the implementation of idempotency keys, a checkpoint store, and resume logic, with real numbers from solo operations.
Agents & Manager2026-06-15
Containing Failure in Antigravity Multi-Agent Systems: Three Boundaries That Stop Cascades
Antigravity multi-agent setups run beautifully in isolation but cascade in production, where one small failure drags the whole orchestration down. These notes organize the fix around three boundaries—layered control, trust separation, and observability with idempotency—down to the TOML and the correlation-ID wrapper.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →