Matching Antigravity 2.0's Three Layers to Development Phases: Explore, Iterate, Operate
How I assign Antigravity 2.0's desktop, CLI, and SDK to development phases instead of features, with concrete handoff patterns between layers and the production pitfalls I hit.
On June 15, Google signalled that it would consolidate its AI coding tools into Antigravity. What remains in front of me are three entry points: the Antigravity 2.0 desktop app, the CLI, and the SDK. For the first few days I opened them more or less by mood, ran the same task in both the desktop and the terminal, and repeatedly lost track of which run was the source of truth.
The cause was not overlapping features. All three share the same agent harness, so they can do nearly the same things. The real problem was that I had never decided when to open which one.
Once I reassigned the three layers by development phase rather than by feature, the hesitation disappeared. Here is the split, and how I hand work off between layers.
The desktop is for exploration
Checking how a new library behaves, prototyping a feature whose spec isn't settled, deciding direction while watching the agent's suggestions — exploration belongs on the desktop app.
The reason is that you can see the output and change course immediately. The code preview, the browser actions, and the diff all sit on one screen, so the moment you realize "this direction is wrong," you can stop.
In this phase I deliberately leave my instructions vague. I'll say something like "this screen feels cramped, please tidy the spacing" and work backward from the suggestions to figure out the actual requirements. The point of exploration is not to produce the right answer but to discover the requirements.
The CLI is for iteration and verification
Once the direction is fixed, I move to the CLI. Running the same process over many inputs, judging results mechanically, applying a change in bulk — for this iteration phase, being able to invoke the agent non-interactively from the terminal is what matters.
For example, when I want to run a quality gate across 30 articles at once, checking them one by one on the desktop isn't realistic. The CLI drops straight into a shell loop.
#!/usr/bin/env bashset -euo pipefail# Run a consistency check only over articles that changedchanged=$(git diff --name-only HEAD~1 -- 'content/**/*.mdx')fail=0for f in $changed; do # Start in headless mode and read pass/fail from the exit code if ! agy run --headless --prompt "$(cat checks/consistency.md)" --input "$f"; then echo "FAIL: $f" fail=$((fail + 1)) fidoneecho "Done: ${fail} need fixing"[ "$fail" -eq 0 ]
What matters here is pinning the instructions to a file. Writing the prompt you refined interactively on the desktop into something like checks/consistency.md keeps it from drifting across iterations. In my experience, simply moving the instructions from inline strings to an external file noticeably reduced the variance in how the same input was judged.
✦
Thank you for reading this far.
Continue Reading
What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.
WHAT YOU'LL LEARN
✦A decision rule for assigning desktop, CLI, and SDK to the explore / iterate / operate phases
✦How to hand off state between layers without losing context
✦Production pitfalls from unifying all three layers into one workflow
Secure payment via Stripe · Cancel anytime
✦
Unlock This Article
Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.
A process that runs at a fixed time every day, a task that fires while no human is watching — this steady-state phase is the SDK's domain. You can hold scheduled execution and sub-agent assembly as part of your own code.
from antigravity import Agent, Schedule# Every morning, aggregate and summarize the previous day's crash reportstriage = Agent( name="crash-triage", instructions=open("agents/crash_triage.md").read(), tools=["fetch_crashlytics", "read_repo"], max_cost_usd=0.50, # always set a per-run ceiling)@Schedule.daily(at="07:00", tz="Asia/Tokyo")def morning_triage(): result = triage.run( input={"window_hours": 24}, timeout_s=600, ) if result.severity >= 3: notify_me(result.summary) # only notify when it's serious
My top priority in steady-state operation is to always attach a cost ceiling and a timeout. On the desktop you'd notice a runaway, but nobody watches an unattended task. I once ran a schedule without max_cost_usd and it burned through nearly 3x the tokens I expected; since then I never omit these two.
Handing off between layers
Once you split the three layers, passing work between them becomes the next challenge. You carry insight from exploration into iteration, and a hardened process from iteration into operation — and if you carry that state in your head or in scattered notes, something always goes missing.
What I do is write each layer's output to a file in the repository.
When desktop exploration ends, write the settled instructions into agents/<task>.md
The CLI iteration script reads that .md so the prompt isn't maintained twice
The SDK schedule definition references the same .md via open()
This way, the canonical instruction lives in one place. Change direction on the desktop, update the .md, and both the CLI and SDK follow automatically. Conversely, if each layer keeps its own copy of the instruction, you fix one and leave the other stale. That was exactly my first mistake as an indie developer, and it took me half a day to track down.
How to decide which layer to open
When I'm unsure, I sort it with these questions.
Do I want to decide direction while watching the output? If yes, desktop
Will I run the same process at least twice, over different inputs? If yes, CLI
Will it run while no human is watching? If yes, SDK
If two or more apply, I treat it as a phase in transition. When exploration starts picking up a little repetition, that's the signal it's about time to move to the CLI. Rather than forcing the work into a single layer, it felt more natural to read it as a moment of transition.
You also don't need to use all three. Throwaway verification finishes on the desktop alone; a one-off batch is fine in the CLI. I only reach for the SDK once I'm confident the same process will run tomorrow and the day after. Automating too early has the side effect of freezing requirements that are still moving.
Production pitfalls I hit
A few cautions that mattered most while running the three layers together.
First, don't wire desktop exploration straight into steady-state operation. Exploration-phase instructions are written to tolerate ambiguity, so judgments wobble when you put them on unattended runs. I always insert CLI iteration in between, run it a dozen-plus times to confirm the output is stable, and only then promote it to the SDK.
Second, measure cost per layer. Even with the same agent, interactive desktop use and unattended SDK runs differ by an order of magnitude in consumption. In my case steady-state operation accounted for about 70% of total cost, so the SDK side was clearly where reductions paid off. Optimizing without knowing where cost concentrates is wasted effort.
Third, the legacy CLI reaches end of service on June 18. If your iteration-phase scripts assume the old CLI, they'll simply stop working one morning. Keeping the command name in a single place inside your shell scripts means there's only one spot to fix when you migrate.
A next step
Start by putting into words whether the work in front of you is exploration, iteration, or operation. That alone tends to decide which entry point to open.
I'm still redrawing the boundaries as I go, but the view of assigning layers by phase rather than by feature has greatly reduced my hesitation in front of these three entry points. I hope it helps anyone wrestling with the same split.
Share
Thank You for Reading
Antigravity Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.