Calling a Managed Antigravity Agent from the Gemini API: Design Notes on the Preview Model

antigravity-preview-05-2026, now in public preview on the Gemini API, is a Managed Agent that plans, runs code, edits files, and browses the web autonomously inside a sandbox. Here is how it differs from rolling your own orchestration, and where to draw the line.

managed-agent gemini-api⁵ antigravity³⁵⁰ agent-design² automation⁴²

✦ Premium Article

Until now, as an indie developer, whenever I handed work to an agent, I built the plan-act-verify loop in my own code. Send a prompt, interpret the tool calls that come back, return results, send again. It works, but the burden of state management and retries is entirely mine.

antigravity-preview-05-2026, which entered public preview on the Gemini API in June, is the option that shoves most of that burden onto the server side. It is a Managed Agent that autonomously runs planning, reasoning, code execution, file edits, and web browsing inside a sandbox. The caller hands over a goal and waits for the result.

Convenient as that is, it demands a design decision: what to entrust to the Managed side, and what to keep in hand. Leave that vague and you end up with both cost and control half-baked. This article organizes that line, with implementation alongside.

The responsibility boundary

First, the two are not competitors; they sit at different layers. My breakdown is this.

Keep in hand: when to start (scheduling), what goal to hand over, and how to verify and absorb the result. This is business logic and cannot leave your side.
Entrust to the Managed side: the intermediate steps toward the goal. Writing files, trying commands, switching tactics on failure: the trial-and-error loop itself.

Put another way, the Managed Agent takes on "how," while you focus on "what, when, and how to receive it." The thing I sweated most when writing tool loops by hand was the intermediate state management, so losing that is significant.

That said, you should not entrust everything. Output verification stays in hand. The agent saying "done" is one thing; trusting that and pushing it to production is another.

Start from the smallest call

Begin by specifying the preview model name and handing over a single goal. Here is a Node example.

import { GoogleGenerativeAI } from "@google/generative-ai";
 
const genai = new GoogleGenerativeAI(process.env.GEMINI_API_KEY);
 
// A Managed Agent starts as a long-running, sandboxed job
async function startAgentJob(goal) {
  const model = genai.getGenerativeModel({
    model: "antigravity-preview-05-2026",
  });
 
  const job = await model.startAgentTask({
    goal,
    sandbox: { filesystem: true, network: "restricted" },
    maxSteps: 24,            // always set a ceiling to stop runaways
  });
 
  return job.id;             // do not wait synchronously; take the job ID
}

Two points matter here. Always set a ceiling with maxSteps. And do not wait for the result synchronously; take a job ID and fetch it later. A Managed Agent can run for minutes, and holding an HTTP request open that long is not realistic.

✦

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN

✦The responsibility boundary between a Managed Agent and your own orchestration, and how to decide what goes where

✦An implementation pattern for calling antigravity-preview-05-2026 and polling long-running tasks

✦The sandbox, cost, and idempotency traps you actually hit, and how I avoid them in operation

Secure payment via Stripe · Cancel anytime

✦

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

Unlock all articles with Membership →

Poll long-running tasks

Once you hold a job ID, poll the state and wait for completion. I run this shape from background scheduled tasks.

async function waitForJob(model, jobId, { intervalMs = 5000, timeoutMs = 600000 } = {}) {
  const deadline = Date.now() + timeoutMs;
 
  while (Date.now() < deadline) {
    const status = await model.getAgentTask(jobId);
 
    if (status.state === "succeeded") return status.result;
    if (status.state === "failed") {
      // log the reason; never retry silently
      throw new Error(`agent failed: ${status.error?.reason ?? "unknown"}`);
    }
    // while running, wait. You may grow the interval exponentially
    await new Promise((r) => setTimeout(r, intervalMs));
  }
  throw new Error("agent timeout");
}

Not retrying silently on failed comes from experience. A Managed Agent's failures mix the transient kind with the kind where the goal itself is vague and unachievable. Retrying the latter mechanically just keeps paying cost for the same failure. Logging the reason and revising how you phrase the goal is faster in the end.

Designing the sandbox and network

Setting sandbox.network to restricted is deliberate. A Managed Agent can browse the web, but if the goal needs no external access, keep it closed. Two reasons.

First, reproducibility. Open the network and the same goal gets dragged by external state on every run, so results waver. Second, safety. Granting an autonomous agent free network access and file writes at the same time departs from least privilege.

In my operation, tasks that need external information fetch it in hand first, fold it into the goal's context, and leave only the processing to the Managed Agent in a closed sandbox. Splitting fetch from processing this way also made debugging easier, because it is simpler to isolate where something happened.

Cost and idempotency traps

Cost bites first in real operation. A Managed Agent runs inference per intermediate step, so a loose maxSteps burns more tokens per job than you would guess. I first ran with a ceiling of 64, and on a day even light tasks wandered near the limit, I shaved several days off my monthly API budget.

The second trap is idempotency. A Managed Agent writes files and runs commands. Start the same job twice and it writes twice. If you call it from a scheduled task, either state "do nothing if it already exists" in the goal, or hold an executed flag on your side.

Start maxSteps at 12 to 24 for light tasks, and keep heavy ones under 48
Classify failures before responding: auto-retry only transient ones; let a human revisit goal ambiguity
Before starting a job, check on your side whether you recently sent the same goal
Keep the network closed by default, opening it explicitly only when needed

Where to choose Managed, and where to keep your own

Finally, my split. A Managed Agent suits tasks with many intermediate steps, where trial and error is the essence and the result can be verified in one pass at the end. For example, lifting structured data out of messy input, or merging fragmented notes into one draft.

Conversely, I keep my own tool loop for tasks where I want human judgment or strict verification at each move. For production deploys and billing operations, I still stop in the middle every time. In those areas, being able to stop is worth more than being fast through autonomy.

A Managed Agent gives the old question of "how far to entrust autonomy" a new granularity. Start with a closed sandbox and a small maxSteps, on a task you can afford to throw away. The range you can safely entrust only becomes tangible once you run it.

Thank You for Reading

Antigravity Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.