Spending Less on Failure Without Swallowing It: A Retry Budget for Agents Built Around Gemini 3.5 Flash

A design that separates an agent's retries from quietly swallowing errors: classify the failure first, then retry within a budget. Grounded in the speed and price of Gemini 3.5 Flash, with per-task caps, logging, and a weekly tightening routine.

antigravity³⁷⁴ agent-design⁵ retry⁸ gemini-3-5-flash cost-control⁵

✦ Premium Article

When you hand work to an agent, it does not always succeed on the first shot. A test fails, a tool call errors out, the output is malformed. Telling it to "try again" is the natural reflex. But if you allow retries without thinking, the agent repeats the same failure at high speed, and before you notice, only your quota and your bill have grown.

I run four blogs as an indie developer, and most of the automation I run overnight is handed to agents. What that taught me is that a retry is one step away from swallowing a failure. Precisely because a fast, cheap model like Gemini 3.5 Flash sits at the core, the cost of retrying has dropped — which makes it easier to drift into the sloppy habit of "just keep it running." That is exactly why retries need a budget around them.

Swallowing and retrying are not the same

The first thing to separate is swallowing versus retrying. Swallowing means "pretend the failure never happened and move on"; retrying means "acknowledge the failure, change a condition, and try once more." Mix the two and errors keep spinning without ever landing in a log, and you lose the ability to trace the cause afterward.

I enforce this distinction at the code level. Before any retry, I classify why it failed, and a failure I cannot classify does not get retried. If it cannot be classified, throwing it back under the same conditions is unlikely to change the outcome.

Sort failures into three kinds first

In practice, agent failures settle into roughly three kinds. As a rule, only the transient ones earn a retry.

Kind	Example	Retry	Condition to change
Transient	Rate limit, timeout, brief network drop	Yes	Wait time (exponential backoff)
Input-driven	Broken JSON, missing context, vague instructions	Conditional	Prompt and supplied context
Permanent	Missing permission, nonexistent API, logically impossible	No	Hold until a human steps in

Sending permanent failures into a retry is the most typical way to waste a budget. An agent will not say "I can't"; it fails plausibly, over and over. Just stopping it here visibly lowers the bill.

✦

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN

✦How to budget retries by separating them from error-swallowing and classifying failures first

✦How to set a per-task retry cap grounded in the speed and price of Gemini 3.5 Flash

✦A weekly routine for spotting wasted retries in your logs and tightening the budget

Secure payment via Stripe · Cancel anytime

✦

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

Unlock all articles with Membership →

Confine the retry cap to a single task

Make the budget a concrete number on each launch, not an abstract policy. What I use is a controller that watches three things at once — attempt count, cumulative cost, and cumulative time. The moment any one of them hits its cap, it stops.

import time
from dataclasses import dataclass, field
 
@dataclass
class RetryBudget:
    max_attempts: int = 3        # at most 3 even for transient
    max_cost_usd: float = 0.15   # cap this task may spend
    max_seconds: float = 90.0    # stop runaways by time
    spent_usd: float = 0.0
    started: float = field(default_factory=time.monotonic)
    attempts: int = 0
 
    def allow(self) -> bool:
        if self.attempts >= self.max_attempts:
            return False
        if self.spent_usd >= self.max_cost_usd:
            return False
        if time.monotonic() - self.started >= self.max_seconds:
            return False
        return True
 
def run_with_budget(run_once, classify, budget: RetryBudget):
    last_error = None
    while budget.allow():
        budget.attempts += 1
        result = run_once()           # launch the agent once
        budget.spent_usd += result.cost_usd
        if result.ok:
            return result
        kind = classify(result.error)  # transient / input / permanent
        if kind == "permanent":
            raise PermanentFailure(result.error)  # do not retry
        if kind == "transient":
            time.sleep(min(2 ** budget.attempts, 20))  # exponential backoff
        last_error = result.error
    raise BudgetExhausted(last_error)

The key is to always pass through classify. Spin on while attempts < 3 alone and you will throw even permanent failures three times. Inserting classification alone erases those two wasted attempts.

Use Flash's speed as the basis for "spending less"

Why can the caps sit at the values above? The basis is the speed and price of Gemini 3.5 Flash. Flash is built around being fast and inexpensive, so a single attempt is cheap and short. That is exactly what lets you keep the cap low while still affording enough retries.

The reverse is also true: with an expensive, slow model at the core, the same budget buys only one or two attempts. The obvious fact that model choice is inseparable from retry design bites here. For this reason I deliberately assign Flash to the unstable stages where retries cluster (external scraping, format normalization), and hand only the single final pass to a higher-tier model — a two-stage setup.

Measuring one stage of an overnight batch, simply stopping the swallowing and adding classification cut the calls spent on retries by roughly 40% by feel. The three throws I had been wasting on permanent failures vanished entirely.

Log retries to make wasted shots visible

Even with a budget, you cannot tighten it unless you can see where it is being spent. On every retry, leave a one-line record of the classification, cost, and elapsed time.

import json, time
 
def log_attempt(task_id, attempt, kind, cost, ok):
    line = {
        "ts": time.strftime("%Y-%m-%dT%H:%M:%S"),
        "task": task_id,
        "attempt": attempt,
        "kind": kind,        # transient / input / permanent
        "cost_usd": round(cost, 4),
        "ok": ok,
    }
    with open("retry_log.jsonl", "a") as f:
        f.write(json.dumps(line, ensure_ascii=False) + "\n")

With this log you can tally "which task succeeds on which attempt." What first surprised me was that one specific task succeeded on the second attempt every single time. In other words the first attempt was structurally guaranteed to fail. That was not a retry problem but an input-driven failure I should have fixed in the prompt. Without the log, I would have kept absorbing it through retries forever.

Tighten the budget weekly

A retry budget is not set once and forgotten. Once a week I read the log and tighten the caps. The procedure is simple.

Compute the average attempt count per task. Anything close to 1.0 has room to lower its cap.
Count the permanent rows. Many of them mean failures that should never be absorbed by retries are slipping in.
Surface tasks that only succeed on the second attempt or later, and fix the prompt or the context you supply.
Run a week on the tightened caps and check that BudgetExhausted does not climb too far.

Run those four steps and retries shift from "a device that hides failure" to "a device that measures failure." In the Dolice Labs automation, making this weekly tightening a habit nearly eliminated the unreadable spikes in quota.

Allowing retries is not the sin. The problem is allowing them without a numeric boundary and without classifying the failure. Start by attaching just one cap — attempts and cost — to an agent you already run. From there, the log will tell you the next move.

Thank You for Reading

Antigravity Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.