ANTIGRAVITY LABJP
Articles/Agents & Manager
Agents & Manager/2026-06-18Advanced

Three Boundaries I Draw Before Handing Work to an Antigravity 2.0 Agent

What to hand a background agent, and what to keep in your own hands. The three boundaries I actually drew while running solo-dev automation in parallel, and how to encode them so the lines hold.

antigravity370agents93automation50workflow42

Premium Article

On June 18, Gemini CLI stopped serving requests for individual and AI Pro / Ultra users. I had a few routine jobs running on Gemini CLI, so I spent the day moving them onto Antigravity 2.0 background agents.

The first thing I ran into wasn't "how do I move this." It was "should this run unattended at all?" Antigravity 2.0 can take a single prompt all the way from plan to implementation, test, and deploy, and it runs several agents in parallel. The wider the surface you can delegate, the more deliberately you have to mark the surface you must not — otherwise convenience quietly becomes your failure rate.

On the very first day I had a near miss. One agent, while "cleaning up artifacts that were no longer needed," lined up a cache that should not have been touched as a deletion candidate. I had a human checkpoint just before execution, so nothing broke — but if that step had been automatic, I would never have noticed. That moment convinced me to write down, in just three lines, what I would not delegate before tuning how I delegate the rest. This article is that record.

Why the "don't delegate" line comes before the "how to delegate" one

The better your automation runs, the less you look at it. After ten clean runs in a row, you want to wave the eleventh through unseen. But the incident always happens on that unseen eleventh run.

So the safety of automation isn't set by its success rate on a good day. It's set by where it stops on a bad one, and by who notices. That's why the starting point of the design should be "where does this hand back to a human," not "how do I run it faster." Draw the boundaries first, and you can widen the automated surface with confidence. Do it in the wrong order and you tend to get scared after the fact and roll everything back to manual — the long way around.

Boundary 1 — Let the agent prepare irreversible actions, but keep the trigger

The first line splits actions by whether they can be undone. Running tests, building, drafting output — you can redo those if they go wrong. Pushing, deploying to production, deleting, publishing — once those run, rolling back costs you separately.

In my setup the agent owns everything up to and including reversible actions. For irreversible ones it stops at "prepared." It builds the diff, tidies the commit message, lays out the output for review — and a human pulls the trigger. Stated as a verbal rule this always erodes, so I pin it in code.

# Classify an action by whether it can be undone,
# and never let the agent run irreversible ones itself.
IRREVERSIBLE = {"push", "deploy", "delete", "drop", "publish", "purge"}
 
def classify(action: str) -> str:
    verb = action.strip().split()[0].lower()
    return "irreversible" if verb in IRREVERSIBLE else "reversible"
 
def gate(action: str) -> dict:
    kind = classify(action)
    if kind == "irreversible":
        # The agent prepares; the human triggers the final run.
        return {"action": action, "auto_run": False, "needs_human": True}
    return {"action": action, "auto_run": True, "needs_human": False}
 
if __name__ == "__main__":
    for a in ["build site", "git push origin main", "run tests", "deploy production"]:
        r = gate(a)
        flag = "needs-human" if r["needs_human"] else "auto"
        print(f"{flag:12} {a}")
 
# Output:
# auto         build site
# needs-human  git push origin main
# auto         run tests
# needs-human  deploy production

The point is not to make the classifier perfect. When a verb is ambiguous, the ambiguity itself sends it to the irreversible side. Erring safe costs you one extra confirmation; erring unsafe costs you a broken production. Those aren't the same size. For untangling the dependencies the Gemini CLI shutdown leaves behind, I wrote up an audit of automation dependencies you can read alongside the migration.

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN
If you've been unsure what to delegate to a background agent and what to keep, you'll be able to draw the line with three concrete criteria
You can drop a 20-line guard into your own pipeline that stops irreversible actions just before they run
You'll avoid the failure mode of trusting an agent's self-reported 'done' by pinning completion to observable stop conditions in code
Secure payment via Stripe · Cancel anytime

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

or
Unlock all articles with Membership →
Share

Thank You for Reading

Antigravity Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

Related Articles

Agents & Manager2026-06-16
Generating Multilingual Release Notes with the Managed Antigravity Agent via the Gemini API
A hands-on record of building a pipeline that turns git commit logs into multilingual App Store and Google Play release notes using the Managed Antigravity Agent, now in public preview through the Gemini API.
Agents & Manager2026-06-12
Handing Dependency Updates to Antigravity Agents — Risk Tiers, Verification, and Rollback
How far can you trust Antigravity agents with dependency updates? A four-tier risk model that corrects semver optimism, worktree-isolated lots, a fixed verification script, and a rollback-first ledger — the operations design I settled on while maintaining multiple apps.
Agents & Manager2026-05-10
Defining 'Done' with Antigravity Agents: Writing Acceptance Criteria into Your Prompts
When Antigravity returns code that is only halfway working, the usual cause is a missing Definition of Done. Here is the three-layer fix.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →