ANTIGRAVITY LABJP
Articles/Agents & Manager
Agents & Manager/2026-07-04Advanced

Where to Put Evidence and Approval When Your Agent Self-Debugs in a Real Browser

Antigravity 2.0 launches a real Chrome mid-build, clicking buttons and taking screenshots to self-heal. It is fast, but shipping that as-is is risky. Here is how to capture evidence and draw the approval boundary.

Antigravity 2.014browser agentself-debuggingverification designagent operations5

Premium Article

Antigravity 2.0 opens a real Chrome partway through a build. It loads the UI it just generated, clicks buttons, types into forms, takes screenshots, and tries to find and fix defects without a human ever opening DevTools. The first time I watched it, I will admit it unsettled me a little. It is fast. Within a few minutes you get back something that looks like it works.

The problem is what "looks" contains. If you cannot see what the agent checked, which buttons it pressed, and where it made the fix, you cannot make a call about shipping. Trading verification transparency for speed defeats the purpose. This article designs, concretely, how to keep evidence and where to draw the approval boundary so real-browser self-debugging can live in daily operations.

Why real-browser self-debug is fast, and why it is scary as-is

Older agents tended to stop at "this probably works" after writing code. Real-browser self-debug continues into "actually open it and confirm" as one motion. Broken layouts, buttons that do not respond to clicks, exceptions in the console — static analysis cannot catch these. An agent that can surface and repair defects that only appear when you actually touch the page is a real step forward.

But a real browser has side effects. Submitting a form fires an actual request; a link navigates to an actual page. If self-debug runs while dev and production environments are still blurred together, a run meant as a test can mutate production data. And if the repair process is not recorded, you can verify neither why it was fixed nor whether it was truly fixed.

So the need is to add two things without killing the speed: evidence, and an approval boundary.

Keep evidence in three layers

Call one execution of self-debug a "run," and cut a timestamped directory per run. Inside it, keep three kinds of proof. Screenshots let a human grasp state at a glance, DOM snapshots enable diffing, and network logs confirm what side effects occurred.

LayerWhat to keepWhen it helps
ScreenshotsPNGs before/after each stepSpot broken layouts, blank states, error screens by eye
DOM snapshotsFormatted outerHTML textDiff between runs to see "what changed" mechanically
Network logsMethod, URL, status, target hostDetect dangerous side effects like POSTs to production

The script below wraps a browser session with Playwright and saves all three layers each time the agent acts. The premise is that the agent's browser actions run through this thin wrapper.

// evidence-session.mjs — a thin wrapper that captures evidence around real browser actions
import { chromium } from 'playwright';
import { mkdir, writeFile, appendFile } from 'node:fs/promises';
import { join } from 'node:path';
 
const RUN_ID = new Date().toISOString().replace(/[:.]/g, '-');
const RUN_DIR = join('evidence', RUN_ID);
 
export async function openEvidenceSession() {
  await mkdir(RUN_DIR, { recursive: true });
  const browser = await chromium.launch({ headless: false });
  const context = await browser.newContext();
  const page = await context.newPage();
 
  // Network layer: append every response as one JSON line
  page.on('response', async (res) => {
    const req = res.request();
    const line = JSON.stringify({
      t: Date.now(),
      method: req.method(),
      url: res.url(),
      status: res.status(),
      host: new URL(res.url()).host,
    });
    await appendFile(join(RUN_DIR, 'network.jsonl'), line + '\n');
  });
 
  let step = 0;
  // Screenshot + DOM layers, saved per step
  async function capture(label) {
    const n = String(++step).padStart(3, '0');
    await page.screenshot({ path: join(RUN_DIR, `${n}-${label}.png`) });
    const html = await page.evaluate(() => document.documentElement.outerHTML);
    await writeFile(join(RUN_DIR, `${n}-${label}.html`), html);
    return n;
  }
 
  return { browser, page, capture, RUN_DIR };
}

The point of this wrapper is not to stop the agent. Instead of a human checking each move, it quietly stacks up material you can review afterward. Speed stays; you only add verifiability.

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN
A concrete layout for capturing self-debug results as three evidence layers: screenshots, DOM snapshots, and network logs
A guard that enforces reads-auto / writes-and-production-require-a-human in code, not in convention
How to make the same fix reproducible with a fixed run-id and clock, reducing browser non-determinism
Secure payment via Stripe · Cancel anytime

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

or
Unlock all articles with Membership →
Share

Thank You for Reading

Antigravity Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

Related Articles

Agents & Manager2026-06-27
Before Your dynamic sub-agents Branch Out Too Far — Designing a Depth Budget and Fan-out Cap
Antigravity 2.0's dynamic sub-agents can spawn their own sub-agents at runtime. Handy, but without depth and fan-out control they can burn through your quota overnight. Here are three guards, with concrete code.
Agents & Manager2026-06-13
Designing Parallel Agent Changes So You Can Trace Them Later
Antigravity 2.0 became a control tower for many agents. Here is how to build an audit trail that lets you trace who changed what and why, designed from real operational failures.
Agents & Manager2026-05-20
Prompt Caching and Context Strategy for Antigravity Agents — Cutting 60-80% Off Monthly API Costs in Long-Running Production
The longer you keep agents running, the more the monthly invoice quietly piles up. Running Antigravity agents alongside an AdMob-monetized indie app business (50M cumulative downloads), I managed to cut API costs by 60-80% by rebuilding prompt caching and context strategy. This article shares the three-layer cache, context compression, and TTL design I now run in production — with the code and numbers behind them.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →