ANTIGRAVITY LABJP
Articles/Agents & Manager
Agents & Manager/2026-06-15Advanced

Stop Letting Antigravity Agents Self-Report 'Done' — Completion Contracts and External Verification

An Antigravity agent reporting 'done' when the work was not actually finished is a failure mode I kept hitting. Moving the completion decision out of the agent and into code fixed it. Here is the contract, a three-layer verifier, and how it holds up under unattended, scheduled runs.

Antigravity234agent design5completion verificationquality assurance3unattended ops

Premium Article

I asked Antigravity to move a payment module onto a new SDK. It came back with "Done — build and tests pass." The diff did swap in the new SDK calls, and the unit tests were green. But when I ran the production-like integration suite with retries, one timeout path was still catching the old SDK's exception type, so recovery never fired.

The agent did not lie. Its definition of "done" and mine were simply different from the start. This article is about pinning the definition of completion outside the agent — a "completion contract" — and the automated verification that enforces it, with the code I actually run. As an indie developer running several sites in parallel, I lean on agents heavily; in 2026, with scheduled and unattended runs now routine, whether you have this design directly shapes operating cost.

Why "done" drifts structurally

When you let the agent judge completion, three layers quietly blur together: formal completion (files changed, functions added as instructed), functional completion (it behaves as expected at runtime), and intentional completion (the real goal is met and nothing else broke). When the instruction is vague, the agent stops at the fastest reachable point — formal completion. Build passes, tests go green, the prompt's bullet list is filled in, and that becomes the stopping point.

The real problem is that the stopping point lives inside the agent. No matter how capable the model gets, as long as the yardstick is held by the other party, the mismatch is structural. The fix is simple: take the right to declare completion away from the agent so only an external verifier can emit "done." The agent's job becomes "turn the verifier green," and I am the one who writes the verifier.

Fix the completion contract first

For each task I write what counts as done as a machine-readable contract, before starting, in the same commit. This matters: if you write it after the agent produces a diff, the contract inevitably gets loosened to match that diff.

# tasks/payment-sdk-migration/contract.yaml
task_id: payment-sdk-migration
# only this file holds the right to declare completion
formal:
  - "pnpm lint"
  - "pnpm tsc --noEmit"
  - "! grep -rn 'legacy-pay' src --include='*.ts'"   # no old SDK name left
functional:
  - "pnpm test src/payment"
  - "pnpm test:contract -- --grep 'timeout-recovery'" # the path that broke, now required
intent:
  questions:
    - id: side_effects
      ask: "Which screens outside payment relied on the old SDK types, and in which file did you confirm it"
    - id: error_paths
      ask: "For timeout, 5xx, and network-drop, which test proves recovery fires"
    - id: untouched
      ask: "Did you change any file you should not have? List all of them"
budget:
  max_files_changed: 18        # over this = possible intent drift, needs review
  forbid_paths: ["infra/", "src/auth/"]  # boundaries the agent must not cross

When I hand the contract to the agent at the top of the task, I nail down the stop condition in one line: "This task is done only when scripts/done_check.py returns exit code 0. Even if build and lint pass mid-way, do not report done until done_check passes. Answer each intent question in tasks/<id>/intent.md, one per question." That single sentence fixes what "done" means.

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN
How to write a machine-readable completion contract that takes the 'done' decision away from the agent, with meaningful exit codes
A three-layer verifier (formal, functional, intent) and intent questions that force the agent to write concrete evidence
Running completion checks under scheduled and unattended agents: retry limits, diff audits, and quarantine on failure
Secure payment via Stripe · Cancel anytime

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

or
Unlock all articles with Membership →
Share

Thank You for Reading

Antigravity Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

Related Articles

Agents & Manager2026-05-06
Giving AI Agents an Aesthetic Sense — Building a UI Quality Evaluation Pipeline with Antigravity × Gemini Vision
Explore how to encode the vague judgment of 'is this UI good or bad' into code. Combines Antigravity with Gemini Vision to implement a complete pipeline — from screenshot capture to AI evaluation, improvement suggestions, automated fixes, and CI/CD integration.
Agents & Manager2026-04-24
Implementing Antigravity's A2A Protocol — Practical Patterns for Agent-to-Agent Conversation
A hands-on guide to Antigravity's A2A (Agent-to-Agent) protocol. Walks through the minimal two-agent setup and three real-world patterns — fire-and-forget, bidirectional confirmation, and scatter-gather — with runnable samples.
Agents & Manager2026-04-09
Antigravity Multi-Agent System: The Complete Implementation Guide
Master Antigravity's Manager Surface to design and implement multi-agent systems. Learn role delegation, parallel processing, and how to pair Gemma 4 with Antigravity for autonomous development workflows.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →