Ever since Antigravity 2.0 started spinning up a real Chrome during a build, clicking buttons, filling forms, taking screenshots, and debugging itself, my UI-related rework has dropped noticeably. But there was one moment in the first few days that made my stomach drop. During a self-debug run, the agent typed a made-up email into a sign-up form and completed the registration, and behind the scenes a real "welcome" email job actually fired.
The address was fictional, so there was no real harm. But imagining that pointed at a production shared database and mail platform was chilling. The value of real-browser self-debug lies in doing real actions in a real browser. The flip side is that the more real the target, the more real the side effects that run. This article is about locking that aim onto something that is neither localhost nor production: a per-branch throwaway preview.
Why localhost is the most dangerous target
This may sound backwards, but in my experience the easiest place to have an accident is not production — it is localhost. The reason is simple: in a lot of indie setups, the local dev server is wired to a shared development database, or to production keys for a mail service that is not actually running in sandbox mode.
One of the apps I run as an indie developer has a screen where sign-up is immediately followed by AdMob consent and a welcome email, and that exact sequence ran in one breath during a self-debug pass. A self-debugging agent completes actions to the end even in situations where a human would think "this is just a test, I won't press submit." Form submission, finalizing a checkout flow, account deletion, firing a webhook. To the agent these are all just "buttons on a screen," and whether they are destructive is not visible from appearance alone. If you relax because "it's only localhost," the real external services wired behind it still run.
That is exactly why switching the aim at the environment level is the most reliable move. Rather than finely training the agent's manners, pointing it at a place that is fine to break is a simpler and more robust design.
Aim at a per-branch throwaway preview
The ideal is an ephemeral preview environment that comes up automatically when you push a working branch and disappears automatically on merge or close. Cloudflare Pages and Vercel preview deploys fit this. You point self-debug at that URL.
Throwaway previews have three advantages. First, since the data is assumed to reset every time, whatever the agent registers or deletes is gone by the next day. Second, because URLs are separated per branch, running multiple agents in parallel won't have them stepping on each other's data. Third, you can give them a separate environment-variable set from production, which is where you can stop the side effects themselves.
On the Antigravity side, the key is to state the verification target URL explicitly in the task instructions you hand the agent, and never let localhost be the implicit default.
# Verification instructions for the agent (excerpt)
Always verify in the following environment.
- Target URL: $PREVIEW_URL (e.g. https://feat-consent-flow.example-preview.pages.dev)
- Do not access localhost or the production domain at all.
- State-changing actions such as sign-up, billing, or account deletion
may be performed only on the target URL.Simply spelling out "do not use localhost" sharply reduces accidents where the agent connects to a handy dev server on its own. I also cover how to phrase prohibitions in designing a zero-side-effect dry-run layer so agents rehearse before touching production, which pairs well with this.
Neutralize side effects on the preview side
Moving the aim is not enough if the preview environment actually sends email or finalizes charges. This is the most important part in practice: you don't stop side effects by training the agent — you neutralize them at the root with environment variables.
Concretely, read a flag that exists only in the preview environment and swap every outbound side effect for a safe substitute. Here is a thin guard that suppresses email and pins payments to test mode.
// lib/side-effect-guard.ts
// Neutralize outbound side effects in the preview environment
const isPreview = process.env.SELF_DEBUG_ENV === "preview";
export async function sendEmail(to: string, body: string) {
if (isPreview) {
// Don't actually send; just record it. The agent's trace still shows the attempt.
console.log(`[preview] email suppressed -> ${to}`);
return { suppressed: true };
}
return realMailer.send(to, body);
}
export function paymentApiKey(): string {
// Never load the live key into preview. Hand over the test key only.
if (isPreview) return process.env.STRIPE_TEST_KEY!;
return process.env.STRIPE_LIVE_KEY!;
}The point is to log the fact that you suppressed it. The agent can still complete the action to the end, so self-debug verification holds, and yet no real email flies. I also swap the webhook destination via the same flag to a preview receiver (a throwaway endpoint like RequestBin), which lets me confirm only whether it fired.
One more thing that matters: never load any production credentials into the preview environment. If only test keys exist, then even if the agent finalizes a payment, no real charge occurs. You guarantee a state where "even if the agent makes a mistake, no damage results" through the design of permissions. This line of thinking is continuous with an operations note that isolation in the sandbox is less effective than you'd expect when running multiple agents.
Keep seed data identical every time to stabilize verification
Another benefit of a throwaway environment is that you can rebuild data from scratch every time. If you make the seed data deterministic here, the self-debug screenshots stabilize.
If usernames and creation timestamps are random each run, the agent's captures change each time too, and visual diff review falls apart. If you seed just a few records at preview initialization with fixed IDs, fixed display names, and fixed timestamps, you secure the assumption that "the same input yields the same screen as last time." This pays off when a human reviews the real-browser self-debug trace. For how to place the trace itself, where to put the trace and approval when an agent self-debugs in a real browser goes into detail.
Seed data does not need to be large. I insert just three fixed records — a new user, a paying user, and a user pending cancellation — and let the agent walk screens that include state transitions. Covering state variety with a small set raises verification density more than increasing the row count.
Next step
First, check where your self-debug is aimed right now. If localhost has quietly become the default, I recommend starting on the next branch with just one thing: hand the agent an explicit preview URL. You can add the environment-variable guard and fixed seed data later. Moving the aim from the real thing to a throwaway — this single move turns real-browser self-debug from a "useful but slightly scary feature" into "verification you can run with peace of mind."