A Schedule That Survives 429s: Backoff and Jitter for Agent Automation
Run agents in parallel and rate-limit 429s can cascade until everything dies. Here is how to design exponential backoff and jitter so the retries themselves don't create new congestion, from an indie developer's automation setup.
The night I ran three agents at once, the morning log was wall-to-wall 429s. One hit the limit, retried at the same instant, collided with the other two requests, and tripped the limit again — a textbook case of retries making the jam worse. As an indie developer I update four sites in parallel on autopilot, so I've spent plenty of nights with this "retries strangling themselves" phenomenon.
A rate limit isn't an error; it's a congestion signal. If everyone reacts to that signal at the same moment, the congestion never clears. Backoff and jitter are a design for deliberately staggering retry timing so the congestion resolves on its own.
Why fixed-interval retries are dangerous
The first thing most people write is a fixed-interval loop: "on failure, wait 5 seconds and retry." That looks fine for a single job. But when several jobs hit the limit at once, they all retry exactly 5 seconds later, in lockstep. The same number of requests floods in at the same instant, and they all fail again.
This "marching in step" is the thundering herd. Fixed intervals lock the failed jobs into the same rhythm, so instead of clearing congestion, they cement it. Calling Antigravity's Managed Agents in parallel hits the same trap, because multiple agents share a common API quota.
Exponential backoff alone isn't enough
The next improvement is exponential backoff, doubling the wait each attempt: 1s, 2s, 4s. Because the wait grows exponentially, runaway retries really are tamed.
But exponential backoff by itself still has a pitfall. Several jobs that failed together compute the same wait from the same formula. One second, two, four — they all climb the same staircase at the same time. The intervals widen, but the lockstep remains. Breaking that is jitter's job.
Strategy
Wait on attempt 3
Lockstep
Fixed interval
Always 5s
Fully aligned (worst)
Exponential only
Always 4s
Still aligned
Exponential + Full Jitter
Random 0–4s
Naturally spread
✦
Thank you for reading this far.
Continue Reading
What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.
WHAT YOU'LL LEARN
✦Why fixed-interval retries amplify 429s, shown with a concrete no-jitter vs jitter example
✦A 20-line Full Jitter implementation plus how to choose retry caps and an overall deadline
✦The settings that tamed 429 cascades and noticeably improved overnight completion across four parallel pipelines
Secure payment via Stripe · Cancel anytime
✦
Unlock This Article
Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.
There are several jitter schools, but the one I use in my indie automation is Full Jitter: pick the wait as a uniform random value within the exponential ceiling. The formula is simple — just wrap it in random.uniform(0, cap).
import random, timedef backoff_sleep(attempt: int, base: float = 1.0, cap: float = 30.0) -> float: # Grow the ceiling exponentially, then pick uniformly within it (Full Jitter) ceiling = min(cap, base * (2 ** attempt)) delay = random.uniform(0, ceiling) time.sleep(delay) return delay
Setting a cap is the key. Forget it, and as attempts climb the wait balloons to minutes, eating into the next scheduled cycle. For overnight jobs I fix cap=30 seconds and refuse to wait longer.
Waiting isn't the whole story — respect what the server tells you, too. Many APIs return a Retry-After header alongside the 429. When it's present, prefer the server's instruction over your own math.
def next_delay(attempt: int, retry_after: float | None) -> float: if retry_after is not None: # Honor the server hint, add just a touch of jitter return retry_after + random.uniform(0, 1.0) ceiling = min(30.0, 1.0 * (2 ** attempt)) return random.uniform(0, ceiling)
Hold both a retry cap and an overall deadline
Even with jitter spreading the timing, allowing infinite retries creates a different problem: you wait endlessly on a non-transient failure and the whole schedule slips backward. So I hold two caps.
One is the attempt count. For parallel jobs I cap it at six. The other is the job's overall deadline — a wall drawn in time: "give up if this job hasn't finished within ten minutes of starting." Count alone can over-persist when Retry-After is long, so the time wall earns its keep.
import time, randomdef run_with_retry(call, max_attempts: int = 6, deadline_s: float = 600.0): start = time.monotonic() for attempt in range(max_attempts): try: return call() except RateLimitError as e: if time.monotonic() - start > deadline_s: raise # the time wall; stop persisting, send to the dead letter time.sleep(next_delay(attempt, getattr(e, "retry_after", None))) raise RuntimeError("max attempts exceeded")
Throwing the abandoned job upward and diverting it to the dead letter pattern connects retry and divert cleanly. Deciding how long to persist actually raises the overall completion rate.
In parallel runs, scatter the "simultaneous launch"
It's worth scattering not just retry timing but the initial launch timing too. Running four sites, the change with the biggest payoff was dead simple: offset each site's job start time by tens of minutes. Antigravity 2.0 puts parallel agent execution front and center, so it's tempting to fire everything at once — but since they share a common quota, aligning launches makes you trip the limit on the very first call.
As a practical rule, I keep three things. First, don't get greedy with parallelism given the shared quota. Second, spread start times across off-peak hours so job peaks don't overlap. Third, always add jitter to retries and never use a fixed interval. Those three alone make 429 cascades far less likely.
Rate limits are an unavoidable counterpart, but how you move decides between "wiped out" and "a mild delay." Building one idea — deliberately scattering timing — into the design visibly changes the completion rate of agents running in parallel while you sleep. Start by swapping a single fixed-interval retry for Full Jitter. I hope it helps anyone stuck in the same 429 traffic jam.
Share
Thank You for Reading
Antigravity Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.