Before Your dynamic sub-agents Branch Out Too Far — Designing a Depth Budget and Fan-out Cap

Antigravity 2.0's dynamic sub-agents can spawn their own sub-agents at runtime. Handy, but without depth and fan-out control they can burn through your quota overnight. Here are three guards, with concrete code.

Antigravity 2.0⁸ dynamic sub-agents multi-agent⁴⁸ autonomous runs cost control

✦ Premium Article

One night I put the updates for four blog sites onto a single scheduled run. The next morning, the quota screen showed nearly three times the consumption I had expected. The jobs all succeeded; only the spend had ballooned. Tracing the logs, I found that a dynamic sub-agent spawned by the parent had spawned another sub-agent, whose child had spawned a grandchild. Nobody had entered an infinite loop. The decision "this is a bit involved, let me split it one more level" had simply repeated itself, quietly, at every node of the tree.

Antigravity 2.0's dynamic sub-agents let a parent agent spin up child agents on demand at runtime. Operationally, that is a completely different beast from static parallelism where you fix the degree of concurrency up front. The upside is flexibility; the catch is that you hand the "when and how much to branch" decision to the agent itself, so left alone the tree grows deep and wide. Starting from the trap I stepped on in my own off-peak automation, this article shares a concrete design for controlling three things: depth, width, and cancellation.

Treat static parallelism and dynamic sub-agents as different things

The "parallel agents" we are used to had a fixed width — like firing five tasks at once with Promise.all. A human decides the maximum concurrency in advance. In tree terms, it's a shallow thicket of depth 1 and width 5.

dynamic sub-agents differ at the root. If a child agent decides mid-task that "this refactor splits cleanly into three independent modules," it can stand up three grandchildren on the spot. The grandchildren make the same call. As a result, both the depth and the width of the tree are decided at runtime and cannot be read ahead of time.

If you grasp only "it's parallel, so it should be fast" without understanding this property, you will misjudge both spend and latency. With depth 3 and a fan-out of 3 per node, the number of leaf nodes is 3 to the power of 3 — 27 branches. Each sub-agent calls the model, so token consumption scales with the count. The reason I burned nearly three times the quota in one night was precisely that I had overlooked this exponential spread.

So the first premise to hold is simple: when you use dynamic sub-agents, assume the tree will grow on its own, and put explicit ceilings on depth and width.

Three guards: depth budget, fan-out cap, cancellation propagation

The axes worth controlling break down into three.

First, the depth budget: the maximum number of levels of sub-agents you can spawn, counting from the parent. Keep the depth shallow and the tree can never grow exponentially.

Second, the fan-out cap: the maximum number of sub-agents running concurrently across the whole tree. Even if you allow depth, capping the simultaneous count keeps the peak spend down.

Third, cancellation propagation: a mechanism to tell every descendant to stop when a branch stalls or is no longer needed. Without it, grandchildren keep calling the model long after the parent has given up.

These look independent, but operationally they are easier to handle if you fold them into a single orchestration layer. Below I show each implementation in Node.js (TypeScript), assuming a thin wrapper around the Antigravity SDK's sub-agent launch.

✦

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN

✦An implementation that embeds a depth budget into the agent context to stop sub-agents from multiplying recursively

✦A fan-out cap with queuing that manages how many branches run at once from a single place

✦A structure that propagates cancellation to descendants via AbortSignal, so a stuck branch doesn't drag the whole job down

Secure payment via Stripe · Cancel anytime

✦

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

Unlock all articles with Membership →

Embed the depth budget into the context

The key to a depth budget is carrying the "current depth" as part of the context you pass to sub-agents. When a parent spawns a child, it passes its own depth plus one. Once the ceiling is reached, the agent stops spawning and finishes the work itself.

type SpawnContext = {
  depth: number;       // current depth (root is 0)
  maxDepth: number;    // depth budget
  traceId: string;     // ID for tracing the whole tree
};
 
// Thin wrapper around sub-agent launch
async function runSubAgent(
  prompt: string,
  ctx: SpawnContext,
  spawnChildren: (childCtx: SpawnContext) => Promise<void>,
) {
  // At the depth ceiling, instruct it to complete solo without spawning
  const canSpawn = ctx.depth < ctx.maxDepth;
  const guardedPrompt = canSpawn
    ? prompt
    : `${prompt}\n\n[Constraint] Do not launch any further sub-agents. ` +
      `Complete this work yourself.`;
 
  const result = await antigravity.agents.run({
    prompt: guardedPrompt,
    metadata: { traceId: ctx.traceId, depth: ctx.depth },
  });
 
  if (canSpawn) {
    // Always pass depth+1 when spawning children
    await spawnChildren({ ...ctx, depth: ctx.depth + 1 });
  }
  return result;
}

The important part is to not turn hitting the depth ceiling into an error. My first implementation rejected over-limit spawns with an exception, which failed legitimate work that "just wanted one more split." Instead, I add a constraint to the prompt of any sub-agent that reaches the ceiling: "complete it yourself from here." The behavior tips toward "stop splitting at the leaves and do the work."

In my own runs, maxDepth settled at 2. The root (0) carves up the big picture, its child (1) splits individual tasks, and the grandchild (2) focuses on the actual work. I experimented with allowing a third level, but the debugging pain did not justify the extra parallelism it bought.

Fan-out cap and queuing

Even with depth capped, the concurrent count spikes if every node spawns many children at once. So manage "how many are running right now" with a shared counter across the whole tree, and queue anything over the limit.

class FanOutLimiter {
  private running = 0;
  private queue: (() => void)[] = [];
 
  constructor(private readonly maxConcurrent: number) {}
 
  async acquire(): Promise<void> {
    if (this.running < this.maxConcurrent) {
      this.running++;
      return;
    }
    // Wait until a slot frees up
    await new Promise<void>((resolve) => this.queue.push(resolve));
    this.running++;
  }
 
  release(): void {
    this.running--;
    const next = this.queue.shift();
    if (next) next();
  }
 
  get inFlight(): number {
    return this.running;
  }
}
 
// Share one limiter across the whole tree
const limiter = new FanOutLimiter(6);
 
async function runGuarded(prompt: string, ctx: SpawnContext) {
  await limiter.acquire();
  try {
    return await runSubAgent(prompt, ctx, async (childCtx) => {
      // Child launches go through the same limiter
      await runGuarded(childPrompt, childCtx);
    });
  } finally {
    limiter.release();
  }
}

The point is to share exactly one limiter across the whole tree. Give each node its own limiter and every node respects its own cap while the total goes uncapped. The ceiling on concurrency belongs "one per tree."

The value of maxConcurrent, alongside the depth budget, is the main dial for cost. I base mine on 6, working backward from the model's rate limits and remaining quota. Tightening to 4 during daytime manual work and pushing to 8 for the late-night batch nobody is watching — that time-of-day split takes effect by editing this one spot.

Cancellation propagation — so a stalled branch doesn't take everything with it

Of the three, cancellation propagation was the one that paid off most in production. When a sub-agent hung on an external API timeout, its grandchildren never received a stop signal and kept running — eating quota long after the parent had given up.

The fix is to place a single AbortController at the root, distribute its signal across the whole tree, and have every sub-agent launch honor it. Call abort at any one point and it propagates to all descendants.

type SpawnContext = {
  depth: number;
  maxDepth: number;
  traceId: string;
  signal: AbortSignal;  // stop signal shared across the tree
};
 
async function runSubAgent(prompt: string, ctx: SpawnContext, spawn) {
  // Bail immediately if already cancelled
  if (ctx.signal.aborted) return { skipped: true };
 
  const result = await antigravity.agents.run({
    prompt,
    signal: ctx.signal,           // hand the stop signal to the SDK
    metadata: { traceId: ctx.traceId, depth: ctx.depth },
  });
  return result;
}
 
// Launch the tree at the root
const controller = new AbortController();
 
// Stop every branch once the whole-tree budget (e.g. 20 min) is exceeded
const deadline = setTimeout(() => controller.abort("deadline"), 20 * 60 * 1000);
 
try {
  await runGuarded(rootPrompt, {
    depth: 0,
    maxDepth: 2,
    traceId: crypto.randomUUID(),
    signal: controller.signal,
  });
} finally {
  clearTimeout(deadline);
}

Two pitfalls to watch here. One: check signal.aborted right before every sub-agent launch. The whole tree can be aborted while a node waits in the queue, and if it runs without checking on wake-up, one supposedly-stopped branch will still fire. Two: always put post-abort cleanup in finally. Forget to call the limiter's release and the counter drifts, breaking concurrency management from then on. In production I learned to suspect these two on every abort.

Being able to set a single deadline for the entire tree is another benefit of this structure. My scheduled task caps at 20 minutes, and any branch past that is stopped regardless of reason. The reassurance that "even if something hangs, quota leakage stops within 20 minutes at worst" mattered more than anything for running unattended at night.

Thresholds and observation points that worked in production

The numbers vary by environment, but here are the values I settled on as a starting point — at the scale of one indie developer running four sites plus a set of apps.

Dial	Value (baseline)	Intent
maxDepth (depth budget)	2	Structurally forbids exponential growth; leaves do the work without splitting
maxConcurrent (fan-out)	6 (8 at night / 4 by day)	Ceiling on concurrency; derived from quota and rate limits
Whole-tree deadline	20 min	Caps quota leakage from a stalled branch at 20 minutes worst case
Per-node retries	1 max	A retry is also one sub-agent; count it in the tree

What I always watch is the "total number of sub-agents launched" and the "peak concurrent count" per job. Thread a traceId through every node's metadata and you can later reconstruct the tree and list "which node spawned how many." On a day when the total jumps to twice the usual, it's almost always one particular node branching more than expected. Add a constraint to just that node's prompt — "proceed with this work solo, without splitting" — and the total returns to normal the next day.

Put differently: as long as you hold depth, width, and the deadline as numbers, isolating the cause is fast even when spend balloons. The night I burned 3x, I was observing none of the three. You cannot control what you cannot observe — a truism the quota screen drove home for me in hard figures.

What to do next

If you are about to fold dynamic sub-agents into automation, start by fixing maxDepth at 1 or 2 and routing everything through a single shared FanOutLimiter. Cancellation propagation can come later, but put the ceilings on depth and fan-out in place from the very first run, and the "3x overnight" accident I hit is unlikely to happen at all. The more convenient the mechanism, the more quietly it grows when left alone. Decide the ceiling before you let it run — keeping that order is, for me, the conclusion that makes unattended operation sustainable.

Thank You for Reading

Antigravity Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.