ANTIGRAVITY LABJP
Articles/Agents & Manager
Agents & Manager/2026-06-29Advanced

When Parallel Sub-Agents Fight Over One API's Rate Limit: A Shared Token Bucket That Caps the Aggregate

Run Antigravity 2.0 dynamic sub-agents in parallel and each one hits the same external API independently, pushing the aggregate rate over the limit and triggering cascades of 429s. Here is a shared token bucket that caps the aggregate proactively, with working code through a Redis version.

antigravity402agents111rate-limit4token-bucketparallel2

Premium Article

Once the convenience of Antigravity 2.0 dynamic sub-agents settles in — four tasks moving at once — the next wall is rarely speed. It is the external API limit. As an indie developer running several blogs on an automated pipeline, I once watched every sub-agent push a commit to GitHub at the same moment and collectively trip a 403 secondary rate limit. Running them one at a time, I had never seen that error.

The cause is simple. Line up N sub-agents and the send rate as observed from the outside becomes N times higher. Each agent may believe it is backing off politely, but to the service that owns the shared limit, N agents are simply arriving all at once.

What follows is how to solve that N-times problem not with "each agent's good manners" but with "one faucet every agent must pass through" — with working code and measured numbers. The example targets GitHub's secondary rate limit, but the same design applies anywhere a single limit is shared by multiple actors: Stripe, AdMob reporting APIs, your own backend.

Why per-agent backoff breaks under concurrency

The first thing I tried was the obvious fix: wrap each sub-agent's HTTP call in a retrier that backs off exponentially when it sees a 429.

// Looks correct, but falls apart under concurrency: per-agent backoff
async function callWithBackoff(fn: () => Promise<Response>): Promise<Response> {
  let delay = 500;
  for (let attempt = 0; attempt < 6; attempt++) {
    const res = await fn();
    if (res.status !== 429 && res.status !== 403) return res;
    await sleep(delay);     // every agent waits the same delay at nearly the same time
    delay *= 2;
  }
  throw new Error("rate limit: gave up");
}

For a single agent this works. But run six sub-agents in parallel and you get this chain:

  1. Six fire almost simultaneously; the aggregate rate exceeds the limit
  2. Almost simultaneously, all six receive a 429
  3. All six wait the same initial delay (500ms)
  4. After 500ms, all six retry simultaneously again — and all six get 429 again

This is the thundering herd. The same spike is regenerated on every retry, and backoff only widens the gap between spikes; it never flattens the spike itself. Adding jitter spreads things out a little, but that only lowers the collision probability. It is not a mechanism that guarantees the aggregate stays at or below the limit.

So change the framing. Stop apologizing after you send (reactive); ask permission before you send (proactive). Concentrate the permission-granting into one place, and by definition the aggregate rate can never exceed that place's issue rate. That is what a token bucket is for.

A shared token bucket in a single process

A token bucket is a pail of capacity tokens refilled at refillPerSec per second; each API call consumes one token. If no token is available, you wait for a refill. The crucial part is that all sub-agents share one and the same bucket.

// shared-limiter.ts — a fair FIFO async token bucket
type Waiter = { cost: number; resolve: () => void };
 
export class TokenBucket {
  private tokens: number;
  private last: number;
  private waiters: Waiter[] = [];
  private timer: ReturnType<typeof setInterval> | null = null;
 
  constructor(
    private readonly capacity: number,    // burst allowance
    private readonly refillPerSec: number // steady-state rate (issued per second)
  ) {
    this.tokens = capacity;
    this.last = Date.now();
  }
 
  private refill(): void {
    const now = Date.now();
    const elapsed = (now - this.last) / 1000;
    if (elapsed <= 0) return;
    this.tokens = Math.min(this.capacity, this.tokens + elapsed * this.refillPerSec);
    this.last = now;
  }
 
  // Always await before the call. The wait delays sending so the aggregate stays under the limit.
  async acquire(cost = 1): Promise<void> {
    if (cost > this.capacity) {
      throw new Error("cost exceeds capacity: it can never be acquired");
    }
    this.refill();
    // Take immediately only if nobody is queued (no jumping the line = fairness)
    if (this.waiters.length === 0 && this.tokens >= cost) {
      this.tokens -= cost;
      return;
    }
    return new Promise<void>((resolve) => {
      this.waiters.push({ cost, resolve });
      this.startDraining();
    });
  }
 
  private startDraining(): void {
    if (this.timer) return;
    this.timer = setInterval(() => {
      this.refill();
      // Release from the head of the queue while tokens suffice (FIFO)
      while (this.waiters.length > 0 && this.tokens >= this.waiters[0].cost) {
        const w = this.waiters.shift()!;
        this.tokens -= w.cost;
        w.resolve();
      }
      if (this.waiters.length === 0 && this.timer) {
        clearInterval(this.timer);
        this.timer = null;
      }
    }, 50); // re-evaluate refill and release every 50ms
  }
}

Using it is just a matter of slipping acquire() in immediately before the external call.

// Keep GitHub content-creating calls conservative: 1.0/sec, burst 5
const github = new TokenBucket(5, 1.0);
 
async function commitViaSubAgent(agentId: string, change: FileChange): Promise<void> {
  await github.acquire(1);          // this is where queueing happens
  await githubApi.createCommit(change);
}
 
// Six sub-agents share the same github bucket and run in parallel
await Promise.all(
  subAgents.map((a) => commitViaSubAgent(a.id, a.pendingChange))
);

Three things matter here. First, while one agent waits on acquire(), the other sub-agents keep computing, so the only throughput cost is the part that was pinned to the limit anyway. Second, the queue is FIFO, so no single agent is starved by being pushed back forever. Third, capacity represents the burst allowance, so the smaller you make it, the more you suppress instantaneous spikes.

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN
Stop the 429 cascades that happen when parallel sub-agents share one external API limit, by capping the aggregate rate with a shared token bucket
Understand why retry-plus-backoff alone produces a thundering herd, and replace it with proactive cooperative throttling
Extend a single-process token bucket into an atomic Redis + Lua acquire so sub-agents across separate processes and machines stay under one shared limit
Secure payment via Stripe · Cancel anytime

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

or
Unlock all articles with Membership →
Share

Thank You for Reading

Antigravity Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

Related Articles

Agents & Manager2026-06-19
Parallel or Keep It Serial: The Break-Even Point When Orchestrating Multiple Agents
Should you run agents in parallel or keep them serial? A simple way to estimate the break-even between coordination cost and saved wall-clock time, plus how I actually split parallel vs serial across four scheduled sites.
Agents & Manager2026-06-28
It Did Things I Never Asked For — Binding an Agent's Task Scope With a Contract
Ask it to fix a button color and you get a refactor, renames, and a dependency bump too. This is a scope problem, not a permission one. Here is a contract that stops at the scope boundary and asks.
Agents & Manager2026-06-28
The Day the Article I Asked It to Format Became the Agent's Instructions
When you run an unattended content-formatting pipeline with Antigravity CLI, instruction-like text buried in the file you are processing can hijack the agent. Here is how I separate the instruction channel from the data channel and add an output-scope acceptance gate to reject anything out of bounds.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →