Keep an Agent Running on a Nearly Empty Quota — Designing Graceful Degradation
When the monthly quota is almost gone, stopping the agent entirely is not the only option. Here is how to design graceful degradation — dropping capability one tier at a time while still producing valuable output — with policy code.
As month-end approaches, the AI Ultra usage cap creeps closer. Facing a thinning quota, I used to run a circuit breaker: "stop when the cap is hit." It is safe. But the instant it stops, the output that should have shipped that night becomes zero.
Stop, or run at full power. Having only those two options was the mistake. What I actually needed was the middle: keep producing valuable output while dropping capability one tier at a time. This design idea is called graceful degradation, close to the way emergency lights stay on during a blackout.
"Stop," "allocate," and "drop" are different designs
There are three quota designs with different goals. Confuse them and you stop when you should not, or try to allocate when you should drop, and it falls apart.
Design
What it does
Where it fits
Circuit breaker
Halts execution at the cap
When you must cut off runaway damage
Budget allocation
Pre-assigns shares to multiple jobs
When you must prevent contention across parallel jobs
Graceful degradation
Lowers capability and keeps running
When the remainder is thin but output must not be zero
The three are not exclusive. In my setup, a breaker handles runaway detection, budget allocation handles parallel jobs, and graceful degradation works inside a single job that is about to exhaust its budget. This article digs into the last one.
Define degradation as discrete tiers
The core of graceful degradation is deciding discrete capability tiers by remaining quota ahead of time. Rather than shaving continuously, clear tiers make behavior predictable and easy to verify.
I run with four tiers.
Full — Plenty left. Run all subtasks on the high-capability model.
Demoted — Low remainder. Downgrade to a light, fast model and pass only essential subtasks.
Minimal — A sliver left. Stop new generation and focus solely on finishing and saving what is already in progress.
The nice thing about tiers: log "which tier are we in" and you can later tell at a glance that "that night was the demoted tier." Continuous throttling loses this after-the-fact explainability.
✦
Thank you for reading this far.
Continue Reading
What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.
WHAT YOU'LL LEARN
✦See that handling quota exhaustion is more than 'stop,' and learn to apply circuit breaker, budget allocation, and graceful degradation to the situations each fits
✦Complete policy code combining model downgrade, deferral of non-essential subtasks, and batching, switching capability tiers automatically by remaining quota
✦From the real experience of sharing a $100/month tier across several operational jobs as an indie developer, a way to decide what to protect to the end and what to cut first
Secure payment via Stripe · Cancel anytime
✦
Unlock This Article
Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.
Put the tier logic in one place, separate from the agent body. The trick is to look at both the remaining ratio and the days left in the month: at the same remainder, with more days to go, you should enter thrift earlier.
# degrade.py — pick a capability tier from remainder and days leftfrom dataclasses import dataclass@dataclassclass Tier: name: str model: str # model to use skip_optional: bool # skip non-essential subtasks? new_work: bool # allow new generation?TIERS = { "full": Tier("full", "gemini-3.5-pro", False, True), "thrifty": Tier("thrifty", "gemini-3.5-pro", True, True), "demoted": Tier("demoted", "gemini-3.5-flash", True, True), "minimal": Tier("minimal", "gemini-3.5-flash", True, False),}def choose_tier(remaining_ratio: float, days_left: int) -> Tier: """remaining_ratio: fraction of monthly quota left 0.0-1.0 / days_left: days to month-end""" # Normalize: how much can we spend per day if we even it out to month-end daily_budget = remaining_ratio / max(days_left, 1) if daily_budget >= 0.04: # comfortable (pace that would last well past 25 days) return TIERS["full"] if daily_budget >= 0.02: return TIERS["thrifty"] if remaining_ratio >= 0.03: # thin, but essentials still pass return TIERS["demoted"] return TIERS["minimal"] # near empty -> only finish what's in progress
The daily_budget normalization is what works. Looking at an evened-out pace instead of the raw remainder naturally separates early-month overspending from end-of-month thrift.
Reflecting the tier in the job body
Translate the chosen tier into the agent's execution. The key is to handle skipping non-essentials separately from making essentials lighter.
tier = choose_tier(get_remaining_ratio(), days_until_month_end())log(f"running at tier={tier.name} model={tier.model}")if not tier.new_work: finalize_in_progress_only() # finish and save work-in-progress, then exitelse: run_required_subtasks(model=tier.model) if not tier.skip_optional: run_optional_subtasks(model=tier.model) # only when there's room
What counts as a non-essential subtask — that line determines operational quality. In my article-generation job, generating and saving the body is essential, while a second pass on phrasing and extra metadata optimization are non-essential. The thrifty tier drops the polish first; the demoted tier drops the model too. The minimum value that reaches the reader (a readable body) is protected all the way down to the minimal tier.
Make degradation observable
The pitfall of graceful degradation is that it works so quietly you do not notice. You can end up running for weeks in the demoted tier, shipping low-quality output. So always record tier transitions and make them viewable across months.
# record only the moment a tier changes (the transition, not every run)def log_tier_change(prev: str, cur: str): if prev != cur: append_jsonl("tier_history.jsonl", {"from": prev, "to": cur, "ts": now()})
Review this history weekly and habits emerge, like "the demoted tier kicks in for just three days each month-end." If that is acceptable, leave it; if it hurts quality, move to the real fix — raising the cap or revisiting output volume. I strongly recommend keeping the premise that degradation is a buffer, never a tool to justify permanently low-quality operation.
Where to start and where to stop
For a first step, you do not need all four tiers. What worked in real experience was the minimal version: just "full" and "minimal," switching to finishing work-in-progress once the remainder drops below a line. Even that avoids "everything is zero at month-end."
From there, if the quality drop in production feels too steep, add thrifty and demoted in between. More tiers smooth the curve but add verification effort. At an indie developer scale, anywhere from two to four tiers was plenty.
Stop treating it as stop-or-finish, and yield capability one tier at a time according to what is left. Quota is a constraint, but if you design how you yield, you can avoid letting that constraint take your output to zero.
Share
Thank You for Reading
Antigravity Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.