Antigravity 2.0 opens a real Chrome partway through a build. It loads the UI it just generated, clicks buttons, types into forms, takes screenshots, and tries to find and fix defects without a human ever opening DevTools. The first time I watched it, I will admit it unsettled me a little. It is fast. Within a few minutes you get back something that looks like it works.
The problem is what "looks" contains. If you cannot see what the agent checked, which buttons it pressed, and where it made the fix, you cannot make a call about shipping. Trading verification transparency for speed defeats the purpose. This article designs, concretely, how to keep evidence and where to draw the approval boundary so real-browser self-debugging can live in daily operations.
Why real-browser self-debug is fast, and why it is scary as-is
Older agents tended to stop at "this probably works" after writing code. Real-browser self-debug continues into "actually open it and confirm" as one motion. Broken layouts, buttons that do not respond to clicks, exceptions in the console — static analysis cannot catch these. An agent that can surface and repair defects that only appear when you actually touch the page is a real step forward.
But a real browser has side effects. Submitting a form fires an actual request; a link navigates to an actual page. If self-debug runs while dev and production environments are still blurred together, a run meant as a test can mutate production data. And if the repair process is not recorded, you can verify neither why it was fixed nor whether it was truly fixed.
So the need is to add two things without killing the speed: evidence, and an approval boundary.
Keep evidence in three layers
Call one execution of self-debug a "run," and cut a timestamped directory per run. Inside it, keep three kinds of proof. Screenshots let a human grasp state at a glance, DOM snapshots enable diffing, and network logs confirm what side effects occurred.
| Layer | What to keep | When it helps |
|---|---|---|
| Screenshots | PNGs before/after each step | Spot broken layouts, blank states, error screens by eye |
| DOM snapshots | Formatted outerHTML text | Diff between runs to see "what changed" mechanically |
| Network logs | Method, URL, status, target host | Detect dangerous side effects like POSTs to production |
The script below wraps a browser session with Playwright and saves all three layers each time the agent acts. The premise is that the agent's browser actions run through this thin wrapper.
// evidence-session.mjs — a thin wrapper that captures evidence around real browser actions
import { chromium } from 'playwright';
import { mkdir, writeFile, appendFile } from 'node:fs/promises';
import { join } from 'node:path';
const RUN_ID = new Date().toISOString().replace(/[:.]/g, '-');
const RUN_DIR = join('evidence', RUN_ID);
export async function openEvidenceSession() {
await mkdir(RUN_DIR, { recursive: true });
const browser = await chromium.launch({ headless: false });
const context = await browser.newContext();
const page = await context.newPage();
// Network layer: append every response as one JSON line
page.on('response', async (res) => {
const req = res.request();
const line = JSON.stringify({
t: Date.now(),
method: req.method(),
url: res.url(),
status: res.status(),
host: new URL(res.url()).host,
});
await appendFile(join(RUN_DIR, 'network.jsonl'), line + '\n');
});
let step = 0;
// Screenshot + DOM layers, saved per step
async function capture(label) {
const n = String(++step).padStart(3, '0');
await page.screenshot({ path: join(RUN_DIR, `${n}-${label}.png`) });
const html = await page.evaluate(() => document.documentElement.outerHTML);
await writeFile(join(RUN_DIR, `${n}-${label}.html`), html);
return n;
}
return { browser, page, capture, RUN_DIR };
}The point of this wrapper is not to stop the agent. Instead of a human checking each move, it quietly stacks up material you can review afterward. Speed stays; you only add verifiability.