Running Gemini's Managed Agents API: Where Cloud Execution Ends and My Local Agents Begin
A hands-on record of launching Gemini's Managed Agents (public preview) from Python — polling, artifact retrieval, and a cost guard — plus five criteria I use to decide what stays on my local CLI agents.
With the June 18 shutdown of Gemini CLI approaching, I had been steadily migrating my local scripts over to Antigravity CLI. Partway through that work, something else caught my attention: Managed Agents, the public-preview layer that landed on the Gemini API side.
One API call. Behind it, reasoning, tool use, and code execution — all inside an isolated Linux environment. I knew the outline from the I/O 2026 announcements, but actually running it produced two competing reactions: "this overlaps with my local agent setup" and "no, this deserves to be treated as something else entirely."
After a few days of moving real jobs across, I have settled on a third view: draw the boundary properly and you will not want to give up either side. This article is the record of that experiment. All code reflects the public preview as of this writing (June 12, 2026); field names and behavior may change before GA.
What Managed Agents actually takes off your plate
Antigravity 2.0 spans five surfaces: the desktop app, the CLI, the SDK, the Managed Agents API, and the enterprise path. The first three are entry points for running agents on your own machine. Managed Agents is different in kind — the execution environment itself lives on Google's infrastructure.
You make one call. On the other side, Gemini 3.5 Flash reasons, uses tools as needed, executes code in an isolated Linux sandbox, and hands back the results.
Building that yourself means owning Docker containers, per-run cleanup, privilege separation, and network restrictions. As an indie developer I ran an isolated execution setup for my publishing pipeline in self-managed containers for a while, and the scaffolding for "don't break anything, don't leak anything" ended up larger than the actual job logic. Moving that entire perimeter to the far side of an API is, to me, the real point of this feature.
What it is not suited for is interactive work that reads deeply through a local repository. I will come back to that boundary later.
The minimal launch: a run is a job, not a request
This was the first mental shift. Unlike a synchronous generateContent call, a Managed Agents execution is an asynchronous job. You submit it, get back a run ID, and the state moves through QUEUED → RUNNING → SUCCEEDED (or FAILED / TIMED_OUT).
Here is a small job that audits dependency upgrades:
import osfrom google import genaiclient = genai.Client(api_key=os.environ["GOOGLE_API_KEY"])run = client.agents.runs.create( model="gemini-3.5-flash", instructions=( "Read the uploaded package.json and produce a Markdown list of " "major-version upgrade candidates, with brief compatibility notes." ), input_files=["./package.json"], sandbox={"timeout_seconds": 600}, metadata={"job": "dep-audit-2026-06-12"},)print(run.id, run.state) # e.g. runs/8f3c... QUEUED
What this solves: launching a code-execution-backed research job with zero environment setup. sandbox.timeout_seconds is the server-side execution cap. The metadata job name is there for idempotency, which I will get to shortly.
In my environment, the time from submission to sandbox acquisition (the transition to RUNNING) averaged around 8 seconds. That is comparable to spinning up a local Docker container. I had braced for cold-start pain and was pleasantly surprised.
✦
Thank you for reading this far.
Continue Reading
What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.
WHAT YOU'LL LEARN
✦Follow a working Python implementation from launching a Managed Agents run to polling and artifact retrieval
✦See how I handled the 3 operational essentials in code: timeouts, idempotency, and a token budget guard
✦Get the 5 criteria I use to split jobs between Google's isolated sandbox and my local CLI agents
Secure payment via Stripe · Cancel anytime
✦
Unlock This Article
Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.
Collect outputs and artifacts immediately — not later
Once the run finishes, you collect the text output and any artifacts (files generated inside the sandbox):
result = client.agents.runs.get(run.id)if result.state == "SUCCEEDED": print(result.output_text) # the agent's final response for artifact in result.artifacts: data = client.agents.artifacts.download(artifact.id) with open(artifact.filename, "wb") as f: f.write(data) print(f"saved: {artifact.filename}")elif result.state == "FAILED": print(result.error.message)
One caution. The sandbox is destroyed when the run ends, and artifact retention is not indefinite — in my preview testing, artifacts from runs older than 48 hours were no longer retrievable. If retrieval is something you plan to do "later," your outputs will quietly disappear. My recommendation: the moment you detect completion, download everything in the same code path.
I also wavered on when to use output_text versus artifacts. My current rule: text is for humans, files are for downstream processing. Even a Markdown report is easier to handle as an artifact if another job will consume it.
Designing the wait: polling, deadlines, idempotency
As of the preview I could not find a completion webhook, so I settled on polling. If you intend to put this on a schedule, this section is effectively the real implementation.
import timedef wait_for_run(client, run_id, deadline_seconds=900): """Wait for run completion with exponential backoff. The deadline is a client-side failsafe.""" started = time.monotonic() interval = 2 while True: run = client.agents.runs.get(run_id) if run.state in ("SUCCEEDED", "FAILED", "TIMED_OUT", "CANCELLED"): return run if time.monotonic() - started > deadline_seconds: client.agents.runs.cancel(run_id) raise TimeoutError(f"client deadline exceeded: {run_id}") time.sleep(interval) interval = min(interval * 1.5, 30) # 2s → 3s → ... capped at 30s
Three design decisions are baked in here:
Hold two timeouts.sandbox.timeout_seconds (server side) and deadline_seconds (client side) do different jobs. The former stops a runaway agent; the latter stops your own polling loop from hanging forever if status checks keep failing. Rely only on the server side and your script may simply never return.
Back off exponentially. Polling at a fixed 2 seconds just piles up status requests on long jobs. Easing up to a 30-second ceiling was plenty.
Search by metadata before resubmitting. If a network drop loses you the run ID, you end up in the unpleasant state of "billed, but the result is unfindable." I now write the job name to a local log one line before calling create, and on any retry I first search existing runs by metadata.job before submitting again. A poor man's write-ahead log.
That third one is not hypothetical — a Wi-Fi drop cost me an ID and I double-submitted the same job. The damage was small, but I was glad it happened before anything reached cron.
Put a budget cap in front of any scheduled run
On cost: in my trials, a light job like the package.json audit came to roughly $0.08–0.12 per run, and a job that ran an npm build plus tests landed around $0.30–0.45 (token usage plus sandbox time, at preview-era rates — treat these as order-of-magnitude only).
import jsonfrom datetime import datefrom pathlib import PathLEDGER = Path.home() / ".agent_budget.json"DAILY_CAP_USD = 2.00 # daily ceiling; past this, do not submitdef spend_or_block(estimated_usd: float) -> bool: today = date.today().isoformat() ledger = json.loads(LEDGER.read_text()) if LEDGER.exists() else {} used = ledger.get(today, 0.0) if used + estimated_usd > DAILY_CAP_USD: return False # skip; defer to tomorrow or rethink the job ledger[today] = used + estimated_usd LEDGER.write_text(json.dumps(ledger)) return True
A median of past actuals is a perfectly serviceable estimate. Precision is not the point — the principle "never put an uncapped job on cron" is.
Cloud jobs versus local jobs: my five criteria
After a few days of sorting real work, my decision rules settled into five questions:
Do the inputs and outputs close over files? If yes, it leans cloud. Jobs that take files in and hand files back are a natural fit for Managed Agents.
Does it need my local credentials or secrets? If yes, it stays local. You choose which files enter the sandbox, but I am not willing to ship API keys or signing keys off my machine.
Has environment reproducibility been a recurring pain? The longer a job has suffered from "works on my machine," the more it benefits from a pristine sandbox on every run.
Will scheduled parallelism grow? If so, cloud — but only after the budget guard from the previous section is in place.
In my own split, the jobs I moved to Managed Agents were release-note formatting for the App Store and Google Play, and the monthly AdMob report aggregation — exactly the file-in, file-out routine work that quietly accumulates when you run multiple apps and sites solo, as I do at Dolice Labs. Repository edits and reviews stay with local agy. Once the line was drawn, it became clear the two are a division of labor, not a replacement.
The first job to move
If you are about to try this, pick exactly one script whose inputs and outputs close over files, and port it using the create → wait_for_run → collect pattern above. Once one job runs end to end, the instinct for sorting the rest comes naturally.
The preview is still a moving target, and I expect to keep redrawing this boundary myself. If you are navigating the same migration window, I hope these notes save you a detour.
Share
Thank You for Reading
Antigravity Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.