Treating the Managed Agent as a Cost-Capped Throwaway Worker: Isolating Untrusted Input from Production
How to use the Managed Antigravity Agent, now in Gemini API public preview, as a throwaway worker that is born and discarded per request. Cost caps, isolation, and idempotency with implementation steps.
A Managed Agent called antigravity-preview-05-2026 has reached public preview in the Gemini API. Inside a sandbox it can plan, reason, run code, manipulate files, and even browse the web autonomously. When I first tried it, I reached for it as a "resident partner I could hand anything to" — and quickly changed my mind. Keeping it resident meant cost, permissions, and state all crept up bit by bit until they became unmanageable.
What I switched to was using it so that it is born for one request and discarded once it returns a result. Not a persistent assistant but a throwaway worker. Once I framed it that way, the Managed Agent actually became easier to handle. Here is why throwaway suits it, and how to implement it.
Why "throwaway" suits untrusted input
The appeal of the Managed Agent is that it runs in a sandbox isolated from your own environment. That means it is ideal for processing untrusted external input.
Summarizing the contents of a URL a user sent you, reshaping JSON of unknown provenance pulled from an external API — running these directly in your own production environment is frightening. Hand them to a throwaway worker running in a sandbox, and whatever happens stays one-off, leaving no trace in production.
Conversely, keeping it resident instead of throwaway raises the concern that state from the previous request leaks into the next. The more you handle untrusted input, the more being stateless acts as a safety device. I try to see this statelessness not as a feature limitation but as a design advantage.
Confine the cap to one request
The crux of a throwaway worker is always attaching a cost and time ceiling to each launch. With a resident agent, adding caps after the fact is hard; with throwaway you can state them explicitly on every launch.
from google import genaiclient = genai.Client()def run_ephemeral(task_prompt: str, untrusted_input: str) -> dict: # 1 request = 1 worker. Drop the reference when done resp = client.agents.run( model="antigravity-preview-05-2026", instructions=task_prompt, input=untrusted_input, config={ "max_cost_usd": 0.20, # ceiling this worker may spend "timeout_s": 120, # stop runaways by time "tools": ["web_fetch"], # hand over only the tools needed "sandbox": "isolated", # cut off access to production files }, ) return {"ok": resp.status == "completed", "output": resp.output}
What works here is the two-stage guard of max_cost_usd and timeout_s. The cost cap stops token runaways; the timeout stops infinite loops and stuck waiting states. In my operation, before I attached these two there were a few requests a month that cost several times what I expected; after adding the caps, that went to zero.
Narrowing tools to only what's needed matters too. There's no reason to hand file-write access to a worker that only fetches the web. Don't hand it over, and even if the instructions get hijacked the damage doesn't spread.
✦
Thank you for reading this far.
Continue Reading
What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.
WHAT YOU'LL LEARN
✦An implementation pattern for a throwaway worker that is discarded after a single run
✦How to confine a cost cap, timeout, and least privilege to one request
✦An idempotent intake flow that isolates untrusted input and keeps production clean
Secure payment via Stripe · Cancel anytime
✦
Unlock This Article
Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.
Don't write the result a throwaway worker returns straight into your production database. Even though the worker is isolated, its output is a derivative of untrusted external input.
What I insert is these three stages.
Place the worker's output in a quarantine area first
Mechanically check the format with schema validation (reject unexpected keys or types)
Bring only the validated items into production
def adopt_result(raw: dict) -> bool: # Don't let anything that fails the expected schema into production required = {"title", "summary", "source_url"} if not required.issubset(raw.keys()): return False if len(raw["summary"]) > 2000: # reject abnormally long output return False commit_to_production(raw) return True
Just inserting this quarantine lowers the damage when the worker returns something strange from "production gets dirtied" to "one item gets rejected." Isolation is needed not only on the input side but on the output side too — that's my felt sense from running it.
Make the same input return the same result
A throwaway worker can be re-run casually on failure, but there's no point if re-running causes side effects twice. So make intake idempotent.
Concretely, build a hash key from the external input and use that key to decide whether it's already been taken in.
import hashlibdef idempotency_key(untrusted_input: str) -> str: return hashlib.sha256(untrusted_input.encode()).hexdigest()[:16]def adopt_once(key: str, raw: dict) -> bool: if already_adopted(key): # never take in the same input twice return True return adopt_result(raw)
With this in place, even if a worker times out and restarts, you avoid accidents like summarizing the same URL twice and saving it in duplicate. Because it's throwaway, re-running becomes the premise; because re-running is the premise, idempotency starts to matter.
Pitfalls and how to avoid them
Some cautions I hit while operating it.
First, setting the cost cap too low gets work cut off midway. Squeeze max_cost_usd down to something like 0.05 and a complex task stops before completion, returning incomplete output. I set it in the 0.10–0.30 range depending on the weight of the work, and only raise the cap for tasks where cut-offs happen often.
Second, the preview's rate limits. At the public preview stage, launching many in a short window gets throttled. If you design throwaway workers to run in parallel at scale, you need a queue on the launching side to throttle the flow. Neglecting this once made about half of a production batch fail.
Third, the sandbox startup cost. Since a sandbox is spun up per request, throwing very light work at a throwaway worker makes the startup overhead larger than the work itself. For reshaping that finishes in a second, I judge it faster and cheaper to handle it locally rather than send it to the Managed Agent. As an indie developer watching every dollar of API spend, that distinction matters.
What to try next
Start by picturing one process you currently feel is "a little scary to run directly in production." You almost certainly have something that handles URLs or JSON coming from outside. Carve that out into a throwaway worker, and you'll feel where the Managed Agent fits.
I'm still refining the design as I live with the preview, but committing to throwaway rather than persistence cleared up the outlook on both cost and permissions. I hope it helps anyone else wrestling with how to handle untrusted input.
Share
Thank You for Reading
Antigravity Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.