Spending Less on Failure Without Swallowing It: A Retry Budget for Agents Built Around Gemini 3.5 Flash
A design that separates an agent's retries from quietly swallowing errors: classify the failure first, then retry within a budget. Grounded in the speed and price of Gemini 3.5 Flash, with per-task caps, logging, and a weekly tightening routine.
When you hand work to an agent, it does not always succeed on the first shot. A test fails, a tool call errors out, the output is malformed. Telling it to "try again" is the natural reflex. But if you allow retries without thinking, the agent repeats the same failure at high speed, and before you notice, only your quota and your bill have grown.
I run four blogs as an indie developer, and most of the automation I run overnight is handed to agents. What that taught me is that a retry is one step away from swallowing a failure. Precisely because a fast, cheap model like Gemini 3.5 Flash sits at the core, the cost of retrying has dropped — which makes it easier to drift into the sloppy habit of "just keep it running." That is exactly why retries need a budget around them.
Swallowing and retrying are not the same
The first thing to separate is swallowing versus retrying. Swallowing means "pretend the failure never happened and move on"; retrying means "acknowledge the failure, change a condition, and try once more." Mix the two and errors keep spinning without ever landing in a log, and you lose the ability to trace the cause afterward.
I enforce this distinction at the code level. Before any retry, I classify why it failed, and a failure I cannot classify does not get retried. If it cannot be classified, throwing it back under the same conditions is unlikely to change the outcome.
Sort failures into three kinds first
In practice, agent failures settle into roughly three kinds. As a rule, only the transient ones earn a retry.
Sending permanent failures into a retry is the most typical way to waste a budget. An agent will not say "I can't"; it fails plausibly, over and over. Just stopping it here visibly lowers the bill.
✦
Thank you for reading this far.
Continue Reading
What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.
WHAT YOU'LL LEARN
✦How to budget retries by separating them from error-swallowing and classifying failures first
✦How to set a per-task retry cap grounded in the speed and price of Gemini 3.5 Flash
✦A weekly routine for spotting wasted retries in your logs and tightening the budget
Secure payment via Stripe · Cancel anytime
✦
Unlock This Article
Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.
Make the budget a concrete number on each launch, not an abstract policy. What I use is a controller that watches three things at once — attempt count, cumulative cost, and cumulative time. The moment any one of them hits its cap, it stops.
import timefrom dataclasses import dataclass, field@dataclassclass RetryBudget: max_attempts: int = 3 # at most 3 even for transient max_cost_usd: float = 0.15 # cap this task may spend max_seconds: float = 90.0 # stop runaways by time spent_usd: float = 0.0 started: float = field(default_factory=time.monotonic) attempts: int = 0 def allow(self) -> bool: if self.attempts >= self.max_attempts: return False if self.spent_usd >= self.max_cost_usd: return False if time.monotonic() - self.started >= self.max_seconds: return False return Truedef run_with_budget(run_once, classify, budget: RetryBudget): last_error = None while budget.allow(): budget.attempts += 1 result = run_once() # launch the agent once budget.spent_usd += result.cost_usd if result.ok: return result kind = classify(result.error) # transient / input / permanent if kind == "permanent": raise PermanentFailure(result.error) # do not retry if kind == "transient": time.sleep(min(2 ** budget.attempts, 20)) # exponential backoff last_error = result.error raise BudgetExhausted(last_error)
The key is to always pass through classify. Spin on while attempts < 3 alone and you will throw even permanent failures three times. Inserting classification alone erases those two wasted attempts.
Use Flash's speed as the basis for "spending less"
Why can the caps sit at the values above? The basis is the speed and price of Gemini 3.5 Flash. Flash is built around being fast and inexpensive, so a single attempt is cheap and short. That is exactly what lets you keep the cap low while still affording enough retries.
The reverse is also true: with an expensive, slow model at the core, the same budget buys only one or two attempts. The obvious fact that model choice is inseparable from retry design bites here. For this reason I deliberately assign Flash to the unstable stages where retries cluster (external scraping, format normalization), and hand only the single final pass to a higher-tier model — a two-stage setup.
Measuring one stage of an overnight batch, simply stopping the swallowing and adding classification cut the calls spent on retries by roughly 40% by feel. The three throws I had been wasting on permanent failures vanished entirely.
Log retries to make wasted shots visible
Even with a budget, you cannot tighten it unless you can see where it is being spent. On every retry, leave a one-line record of the classification, cost, and elapsed time.
With this log you can tally "which task succeeds on which attempt." What first surprised me was that one specific task succeeded on the second attempt every single time. In other words the first attempt was structurally guaranteed to fail. That was not a retry problem but an input-driven failure I should have fixed in the prompt. Without the log, I would have kept absorbing it through retries forever.
Tighten the budget weekly
A retry budget is not set once and forgotten. Once a week I read the log and tighten the caps. The procedure is simple.
Compute the average attempt count per task. Anything close to 1.0 has room to lower its cap.
Count the permanent rows. Many of them mean failures that should never be absorbed by retries are slipping in.
Surface tasks that only succeed on the second attempt or later, and fix the prompt or the context you supply.
Run a week on the tightened caps and check that BudgetExhausted does not climb too far.
Run those four steps and retries shift from "a device that hides failure" to "a device that measures failure." In the Dolice Labs automation, making this weekly tightening a habit nearly eliminated the unreadable spikes in quota.
Allowing retries is not the sin. The problem is allowing them without a numeric boundary and without classifying the failure. Start by attaching just one cap — attempts and cost — to an agent you already run. From there, the log will tell you the next move.
Share
Thank You for Reading
Antigravity Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.