Receiving Managed Agent Async Jobs Through a Propose, Verify, Adopt Pipeline
The Managed Antigravity Agent, now in public preview via the Gemini API, autonomously plans, executes, and verifies inside a sandbox. Here is a design for catching its async deliverables through three stages — propose, verify, adopt — before they reach production, with implementation code and operational pitfalls.
The Managed Antigravity Agent reached public preview on the Gemini API, handling everything from planning to execution autonomously inside a sandbox.
The convenience makes your pulse quicken at first. But operate it a little and another worry arrives. Running autonomously means deliverables come out without anyone's review. Should you really flow those straight to production?
As I started handing multi-app operations to this mechanism, I paused right here. What I want to share is a design for catching an autonomous agent's async deliverables through three stages: propose, verify, and adopt.
Don't "adopt on the spot"
The crux of the problem is that an autonomous agent returns "plausible but wrong" deliverables with full confidence.
It edits files, browses the web, and runs code. Each individual action may work correctly, yet the final judgment can be off. Adopt that into production unverified and the error reaches your users intact.
So I always treat autonomous output as a "proposal." A proposal is not yet an adoption. Insert a verification stage in between, and only let what passes proceed to adoption. Splitting into three stages alone sharply lowers the chance that a runaway autonomy reaches production.
Stage one: Propose
In the first stage, you submit an async job to the Managed Agent and receive the result. This output never touches production; it sits in an isolated place.
import timefrom google import genaiclient = genai.Client()def propose(task: str, poll_interval: float = 3.0, timeout: float = 300.0): """Submit an async job to the Managed Agent and receive its deliverable as a proposal.""" job = client.agents.create_run( agent="antigravity-preview-05-2026", input=task, ) deadline = time.monotonic() + timeout while True: status = client.agents.get_run(job.id) if status.state in ("succeeded", "failed"): break if time.monotonic() > deadline: client.agents.cancel_run(job.id) # never leave it running raise TimeoutError(f"job {job.id} timed out") time.sleep(poll_interval) if status.state == "failed": raise RuntimeError(f"job {job.id} failed: {status.error}") return {"job_id": job.id, "artifact": status.output}
The key is to always call cancel_run on timeout.
Fire an async job and forget it, and it keeps running on the sandbox side, racking up charges. I once went pale the next morning seeing the cost of a job that ran all night. I strongly recommend preparing a path that always stops what you start.
✦
Thank you for reading this far.
Continue Reading
What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.
WHAT YOU'LL LEARN
✦An implementation of a three-stage gate so autonomous deliverables are never adopted unverified
✦Where to draw the line between what the verify phase checks mechanically and what it escalates to a human
✦The pitfalls of polling async jobs, and how to handle them in production
Secure payment via Stripe · Cancel anytime
✦
Unlock This Article
Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.
Once you receive the proposal, verify it before it touches production. This is the heart of the three-stage design.
Split verification into two layers. First, automatically reject what can be checked mechanically; escalate only what machines cannot judge to a human.
def verify(artifact: dict) -> dict: """Verify the proposal mechanically. Return one of three: approved, needs_review, rejected.""" issues = [] # 1. Format check: does it satisfy the expected structure? if not artifact.get("files"): return {"verdict": "rejected", "reason": "empty deliverable"} # 2. Safety check: did it reach into areas it must not touch? for f in artifact["files"]: if f["path"].startswith((".env", "secrets/", ".git/")): issues.append(f"write to protected area: {f['path']}") # 3. Regression check: do existing tests pass (run in isolation)? if not run_tests_in_sandbox(artifact): issues.append("existing tests failed") if any("protected area" in i for i in issues): return {"verdict": "rejected", "reason": "; ".join(issues)} if issues: return {"verdict": "needs_review", "reason": "; ".join(issues)} return {"verdict": "approved", "reason": "passed all automated checks"}
What matters decisively here is that "a subject other than the adopter" performs verification.
Let the proposing agent grade itself and it will plausibly justify its mistakes. Verification is carried by independent logic not involved in generation. I consider this separation the precondition for using autonomous execution safely.
Stage three: Adopt
Only what the verification marks approved is reflected into production.
In the adopt stage, keep a form you can undo later. Concretely, leave a record per adoption so you can always step back to the previous one.
def adopt(artifact: dict, verdict: dict) -> str: if verdict["verdict"] != "approved": raise PermissionError(f"cannot adopt unapproved deliverable: {verdict['reason']}") rev = commit_to_production(artifact) # one commit = one adoption; never mix record_adoption(job_id=artifact["job_id"], revision=rev) return rev
Keep adoption to one commit each and, when something goes wrong, you can isolate "which adoption caused it." Adopt several proposals together and that isolation suddenly gets hard. Plain as it is, this principle ties directly to production peace of mind.
Pitfalls you easily hit while polling
Receiving async jobs has a few classic pitfalls.
First, polling too frequently and hitting rate limits. I start at a three-second interval and tune it against the job's average duration.
Second, dropping intermediate states other than succeeded and failed. Enumerate the state machine explicitly, and stop if an unexpected state arrives.
Third, not setting a timeout ceiling. Autonomous agents sometimes think for longer than you'd imagine. Without a ceiling, a single job keeps eating cost and resources indefinitely.
I learned all three only after actually hitting them in production. Build the handling in ahead of time and you'll stop bolting awake at midnight to a cost alert.
The effect of three stages, in numbers
My operations changed clearly before and after introducing the three-stage gate.
Before the verify stage, roughly 70% of the autonomous agent's proposals were fine to adopt as-is. The remaining 30% mixed in writes to protected areas or failing existing tests. After automating verification, that 30% was all rejected before adoption, and errors reaching production dropped to nearly zero.
I also tuned the polling interval against measurements. Widening a one-second interval that frequently hit rate limits to three seconds cut limit errors by more than 90%. For jobs averaging around 40 seconds, a three-second interval was just right.
Turning it into numbers makes the effect of three stages easy to explain. Clever prompts raise the quality of deliverables; the three-stage gate lowers the probability of incidents. The two play different roles — something that sank in as I kept operating.
Why three stages are worth it
Once you decide to use an autonomous agent, you're tempted to spend your time crafting clever prompts.
But in my experience, building the vessel that catches the output beats polishing the output. Isolate it in propose, reject it with independent verification, adopt it in an undoable form. These three stages don't depend on the agent's cleverness; they lower the very probability that a mistake reaches production.
As someone running both the App Store and Google Play as an indie developer, an autonomous agent that works quietly through the night is a reassuring ally. To keep using that ally with confidence, making an early investment in the design that receives its work is, I feel, well worth it.
I hope this helps anyone considering the same mechanism.
Share
Thank You for Reading
Antigravity Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.