Running Gemini's Managed Agents API: Where Cloud Execution Ends and My Local Agents Begin

A hands-on record of launching Gemini's Managed Agents (public preview) from Python — polling, artifact retrieval, and a cost guard — plus five criteria I use to decide what stays on my local CLI agents.

Antigravity³⁴¹ Managed Agents⁴ Gemini API⁴ Agents²² Sandbox Python¹⁴

✦ Premium Article

With the June 18 shutdown of Gemini CLI approaching, I had been steadily migrating my local scripts over to Antigravity CLI. Partway through that work, something else caught my attention: Managed Agents, the public-preview layer that landed on the Gemini API side.

One API call. Behind it, reasoning, tool use, and code execution — all inside an isolated Linux environment. I knew the outline from the I/O 2026 announcements, but actually running it produced two competing reactions: "this overlaps with my local agent setup" and "no, this deserves to be treated as something else entirely."

After a few days of moving real jobs across, I have settled on a third view: draw the boundary properly and you will not want to give up either side. This article is the record of that experiment. All code reflects the public preview as of this writing (June 12, 2026); field names and behavior may change before GA.

What Managed Agents actually takes off your plate

Antigravity 2.0 spans five surfaces: the desktop app, the CLI, the SDK, the Managed Agents API, and the enterprise path. The first three are entry points for running agents on your own machine. Managed Agents is different in kind — the execution environment itself lives on Google's infrastructure.

You make one call. On the other side, Gemini 3.5 Flash reasons, uses tools as needed, executes code in an isolated Linux sandbox, and hands back the results.

Building that yourself means owning Docker containers, per-run cleanup, privilege separation, and network restrictions. As an indie developer I ran an isolated execution setup for my publishing pipeline in self-managed containers for a while, and the scaffolding for "don't break anything, don't leak anything" ended up larger than the actual job logic. Moving that entire perimeter to the far side of an API is, to me, the real point of this feature.

What it is not suited for is interactive work that reads deeply through a local repository. I will come back to that boundary later.

The minimal launch: a run is a job, not a request

This was the first mental shift. Unlike a synchronous generateContent call, a Managed Agents execution is an asynchronous job. You submit it, get back a run ID, and the state moves through QUEUED → RUNNING → SUCCEEDED (or FAILED / TIMED_OUT).

Here is a small job that audits dependency upgrades:

import os
from google import genai
 
client = genai.Client(api_key=os.environ["GOOGLE_API_KEY"])
 
run = client.agents.runs.create(
    model="gemini-3.5-flash",
    instructions=(
        "Read the uploaded package.json and produce a Markdown list of "
        "major-version upgrade candidates, with brief compatibility notes."
    ),
    input_files=["./package.json"],
    sandbox={"timeout_seconds": 600},
    metadata={"job": "dep-audit-2026-06-12"},
)
 
print(run.id, run.state)  # e.g. runs/8f3c... QUEUED

What this solves: launching a code-execution-backed research job with zero environment setup. sandbox.timeout_seconds is the server-side execution cap. The metadata job name is there for idempotency, which I will get to shortly.

In my environment, the time from submission to sandbox acquisition (the transition to RUNNING) averaged around 8 seconds. That is comparable to spinning up a local Docker container. I had braced for cold-start pain and was pleasantly surprised.

✦

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN

✦Follow a working Python implementation from launching a Managed Agents run to polling and artifact retrieval

✦See how I handled the 3 operational essentials in code: timeouts, idempotency, and a token budget guard

✦Get the 5 criteria I use to split jobs between Google's isolated sandbox and my local CLI agents

Secure payment via Stripe · Cancel anytime

✦

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

Unlock all articles with Membership →

Collect outputs and artifacts immediately — not later

Once the run finishes, you collect the text output and any artifacts (files generated inside the sandbox):

result = client.agents.runs.get(run.id)
 
if result.state == "SUCCEEDED":
    print(result.output_text)  # the agent's final response
    for artifact in result.artifacts:
        data = client.agents.artifacts.download(artifact.id)
        with open(artifact.filename, "wb") as f:
            f.write(data)
        print(f"saved: {artifact.filename}")
elif result.state == "FAILED":
    print(result.error.message)

One caution. The sandbox is destroyed when the run ends, and artifact retention is not indefinite — in my preview testing, artifacts from runs older than 48 hours were no longer retrievable. If retrieval is something you plan to do "later," your outputs will quietly disappear. My recommendation: the moment you detect completion, download everything in the same code path.

I also wavered on when to use output_text versus artifacts. My current rule: text is for humans, files are for downstream processing. Even a Markdown report is easier to handle as an artifact if another job will consume it.

Designing the wait: polling, deadlines, idempotency

As of the preview I could not find a completion webhook, so I settled on polling. If you intend to put this on a schedule, this section is effectively the real implementation.

import time
 
def wait_for_run(client, run_id, deadline_seconds=900):
    """Wait for run completion with exponential backoff. The deadline is a client-side failsafe."""
    started = time.monotonic()
    interval = 2
    while True:
        run = client.agents.runs.get(run_id)
        if run.state in ("SUCCEEDED", "FAILED", "TIMED_OUT", "CANCELLED"):
            return run
        if time.monotonic() - started > deadline_seconds:
            client.agents.runs.cancel(run_id)
            raise TimeoutError(f"client deadline exceeded: {run_id}")
        time.sleep(interval)
        interval = min(interval * 1.5, 30)  # 2s → 3s → ... capped at 30s

Three design decisions are baked in here:

Hold two timeouts. sandbox.timeout_seconds (server side) and deadline_seconds (client side) do different jobs. The former stops a runaway agent; the latter stops your own polling loop from hanging forever if status checks keep failing. Rely only on the server side and your script may simply never return.
Back off exponentially. Polling at a fixed 2 seconds just piles up status requests on long jobs. Easing up to a 30-second ceiling was plenty.
Search by metadata before resubmitting. If a network drop loses you the run ID, you end up in the unpleasant state of "billed, but the result is unfindable." I now write the job name to a local log one line before calling create, and on any retry I first search existing runs by metadata.job before submitting again. A poor man's write-ahead log.

That third one is not hypothetical — a Wi-Fi drop cost me an ID and I double-submitted the same job. The damage was small, but I was glad it happened before anything reached cron.

Put a budget cap in front of any scheduled run

On cost: in my trials, a light job like the package.json audit came to roughly $0.08–0.12 per run, and a job that ran an npm build plus tests landed around $0.30–0.45 (token usage plus sandbox time, at preview-era rates — treat these as order-of-magnitude only).

One-off runs are noise. The danger is the scheduled job you forget about. I wrote about adding a budget guard to my local parallel agents in Capping Parallel Agents With a Token Budget — Designing a Guard That Stops Runaway Cost, and the same thinking transfers directly: block before execution.

import json
from datetime import date
from pathlib import Path
 
LEDGER = Path.home() / ".agent_budget.json"
DAILY_CAP_USD = 2.00  # daily ceiling; past this, do not submit
 
def spend_or_block(estimated_usd: float) -> bool:
    today = date.today().isoformat()
    ledger = json.loads(LEDGER.read_text()) if LEDGER.exists() else {}
    used = ledger.get(today, 0.0)
    if used + estimated_usd > DAILY_CAP_USD:
        return False  # skip; defer to tomorrow or rethink the job
    ledger[today] = used + estimated_usd
    LEDGER.write_text(json.dumps(ledger))
    return True

A median of past actuals is a perfectly serviceable estimate. Precision is not the point — the principle "never put an uncapped job on cron" is.

Cloud jobs versus local jobs: my five criteria

After a few days of sorting real work, my decision rules settled into five questions:

Do the inputs and outputs close over files? If yes, it leans cloud. Jobs that take files in and hand files back are a natural fit for Managed Agents.
Does it need my local credentials or secrets? If yes, it stays local. You choose which files enter the sandbox, but I am not willing to ship API keys or signing keys off my machine.
Will I want to inspect intermediate state on failure? Work I want to poke at midway — deep repository changes, mostly — stays on the local CLI. The interactive style I described in Antigravity CLI (agy) First Look: Migrating from Gemini CLI and Reading the Slash Commands belongs on this side.
Has environment reproducibility been a recurring pain? The longer a job has suffered from "works on my machine," the more it benefits from a pristine sandbox on every run.
Will scheduled parallelism grow? If so, cloud — but only after the budget guard from the previous section is in place.

In my own split, the jobs I moved to Managed Agents were release-note formatting for the App Store and Google Play, and the monthly AdMob report aggregation — exactly the file-in, file-out routine work that quietly accumulates when you run multiple apps and sites solo, as I do at Dolice Labs. Repository edits and reviews stay with local agy. Once the line was drawn, it became clear the two are a division of labor, not a replacement.

The first job to move

If you are about to try this, pick exactly one script whose inputs and outputs close over files, and port it using the create → wait_for_run → collect pattern above. Once one job runs end to end, the instinct for sorting the rest comes naturally.

The preview is still a moving target, and I expect to keep redrawing this boundary myself. If you are navigating the same migration window, I hope these notes save you a detour.

Thank You for Reading

Antigravity Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.