Parallel Agents Quietly Burn Through Your Quota — A Self-Defense Circuit Breaker When Limits Are Invisible
Even on AI Ultra's high ceiling, running parallel agents can exhaust your allowance without warning and leave later runs half-failing. Assuming the limit is invisible from outside, here is a circuit breaker that records consumption on your side and applies the brakes, drawn from real operation.
I once watched the third through fifth of five parallel agents all stall out with half-finished output.
As an indie developer running several sites and apps in parallel, I upgraded to AI Ultra thinking I could finally run agents side by side without worrying about limits. Then on a day when I threw a batch of heavy jobs in the morning, only the later agents ran partway and returned thin results without any clear error. At first I blamed the model; but moving to a different time of day, it did not reproduce. I had used up the allowance in a short window.
The tricky part was that this exhaustion did not arrive in the easy-to-read form of "returns a 429 and stops." This article lays out a design for building a brake on your own side, assuming the limit is invisible from outside.
Quota exhaustion does not arrive as a clear error
When we hear "rate limit," we picture an HTTP 429 that a retry fixes. With agentic tools, that is not always how it goes.
In operation, exhaustion showed up in roughly three patterns. The first is mid-run quality degradation: in a multi-step job, the early steps proceed normally and only the later steps produce short, shallow output. The second is silent truncation: the agent reports "complete," yet the artifact is only partway done. The third is perceptible slowdown: responses get extremely slow and fail at the timeout boundary.
What they share is that none carries an explicit "you used up your quota" signal. Top-tier plans like Ultra have a high ceiling, but exactly how much remains is not visible from outside. That is precisely why a reactive approach relying on error codes is too slow. Rather than stopping after you detect exhaustion, stop yourself before exhaustion. That is the circuit-breaker idea.
Estimate and record consumption on your side
If you can't see the remaining balance from outside, you have to count what you dispatch. You cannot get exact token counts, but an approximation is enough. I append a rough estimate of "one agent run = approximate tokens" to a file on each run.
# record_usage.sh <weight># Accumulate the day's consumption in approximate-token weights (reset per day)DAY="$(TZ=Asia/Tokyo date +%F)"LEDGER="$HOME/.agent_quota/${DAY}.tally"mkdir -p "$HOME/.agent_quota"WEIGHT="${1:-1000}" # approximate weight of this run (heavier generation = larger)echo "$WEIGHT" >> "$LEDGER"TOTAL="$(awk '{s+=$1} END{print s+0}' "$LEDGER")"echo "Today's running total (approx): $TOTAL"
The weights need not be exact. Decide relative magnitudes — light triage 500, a normal article generation 3000, a heavy job with a browser subagent 8000 — and you can grasp how tall a day's consumption peak is. I fix the date with TZ=Asia/Tokyo to avoid the date boundary slipping and overwriting the previous day's ledger.
✦
Thank you for reading this far.
Continue Reading
What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.
WHAT YOU'LL LEARN
✦Understand why quota exhaustion shows up not as a clear error but as mid-run quality degradation and silent failure
✦Build a self-defense circuit breaker in bash that estimates and records consumption and halts agent dispatch at a threshold
✦Learn how to draw a token budget that spreads a day's work without hitting exhaustion in an environment with opaque limits
Secure payment via Stripe · Cancel anytime
✦
Unlock This Article
Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.
Once the ledger fills, check before dispatching a new agent whether "today's total exceeds the daily budget." If it does, do not run that job — defer it to tomorrow. This is the body of the breaker.
#!/usr/bin/env bash# quota_breaker.sh <weight-of-this-run># Do not start this agent if the daily budget is exceededset -euo pipefailDAY="$(TZ=Asia/Tokyo date +%F)"LEDGER="$HOME/.agent_quota/${DAY}.tally"DAILY_BUDGET=60000 # daily approximate budget (tune with measurement)SOFT_RATIO=80 # above this ratio, stop heavy jobs only (%)WEIGHT="${1:-3000}"TOTAL="$(awk '{s+=$1} END{print s+0}' "$LEDGER" 2>/dev/null || echo 0)"PROJECTED=$(( TOTAL + WEIGHT ))if [ "$PROJECTED" -ge "$DAILY_BUDGET" ]; then echo "⛔ Breaker tripped: projected $PROJECTED ≥ budget $DAILY_BUDGET. Deferring to tomorrow." exit 75 # EX_TEMPFAIL: temporary failure (safe to retry later)fiSOFT_LINE=$(( DAILY_BUDGET * SOFT_RATIO / 100 ))if [ "$TOTAL" -ge "$SOFT_LINE" ] && [ "$WEIGHT" -ge 5000 ]; then echo "⚠️ Soft-limit zone ($TOTAL ≥ $SOFT_LINE). Defer heavy jobs, allow light ones only." exit 75fiecho "✅ Dispatch allowed (projected $PROJECTED / budget $DAILY_BUDGET)"
The two-stage design is the point. Set a "soft-limit zone" before fully exceeding the budget, and once you enter that zone, stop only the heavy jobs (generation with a browser subagent and the like) first. Light triage and small fixes still pass. This avoids the worst pattern, where a heavy job devours the allowance right before exhaustion and drags down the lighter work that follows.
I return exit code 75 (EX_TEMPFAIL) because this is not a permanent failure but a temporary "busy now, later" signal. The scheduler can read it and naturally re-dispatch on the next cycle.
Split heavy and light jobs to decide dispatch order
With the breaker in place, one more lever helps: dispatch-order design. The daily budget is the same, but the order in which you throw jobs changes how exhaustion plays out.
I stopped front-loading heavy generation in the early morning and switched to interleaving heavy and light jobs. By letting light jobs through before heavy ones build the budget peak, even after you enter the soft-limit zone the light work still runs to completion.
In my setup, where light periodic jobs like AdMob report aggregation share the same allowance as heavy generation of articles and App Store assets, capping heavy jobs at three per day and spacing each about four hours apart made the thin late-day output all but disappear. With the same number of runs, simply spreading dispatch over time stretched the margin before exhaustion by roughly 1.5x in my experience.
Moving from observe-only to production
Deploying straight into a stop-everything configuration risks halting work you actually need when your budget estimate is off. The rollout I follow has three stages:
First run the breaker in observe-only mode — record but never stop — for one to two weeks, and measure the running total at which later agents start going thin.
Set DAILY_BUDGET to about 80% of that measured value and enable the soft-limit zone (which stops heavy jobs only) first. Only heavy jobs stop here, so light work is never dragged down.
After a week of confirming that needed work is not unfairly blocked, enable the hard limit (stop all jobs).
Widening in this order avoids the pitfall of an over-eager breaker killing work you should be doing. I recommend treating the budget not as a fixed value but as something you revisit monthly against measurements.
How to draw a daily token budget
You do not need to nail the budget value (DAILY_BUDGET) on the first try. Run the breaker in an observe-only mode — record but do not stop — for a week or two, and measure the running total at which later agents start going thin. Set the budget a little below that value and the brake engages before you reach exhaustion.
In my experience, concentrating heavy generation in the morning hastened exhaustion, while spreading it on roughly four-hour intervals made the same number of runs less prone to it. That tells me the ledger total bites on the daily aggregate, not an hourly ceiling. Which is exactly why capping the daily total with a budget works better than throttling the instantaneous rate.
In an environment with invisible limits, the more you wait for an external signal, the more your response lags. Count your own consumption, and stop yourself before exhaustion. Inserting this one self-defense mechanism shrinks the "later agents quietly going thin" phenomenon dramatically. Start with the observe-only ledger.
Share
Thank You for Reading
Antigravity Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.