ANTIGRAVITY LABJP
Articles/Agents & Manager
Agents & Manager/2026-04-22Advanced

Harness Engineering for Antigravity — Turning an Agent into a Project Executor

A deep dive into 'harness engineering' for Antigravity — the prompt-and-template design that makes an agent complete a project, not just answer a question.

antigravity413agents119harness-engineeringprompt-design2agent-architecture2

A few weeks into using agentic AI tools, most developers hit the same wall: the agent answers individual questions well but fails to deliver a "project." I've lived this repeatedly with Antigravity. Ask it to "prepare a feasibility document for an on-demand transit service," and you get a long-form blog post — not a structured deliverable a municipality could actually route through a decision process.

"Harness engineering," a framing originally articulated by nouernet on Qiita, offers a way out. The idea: don't ask the agent for an answer. Configure it to perform the work. This post translates that philosophy into Antigravity-specific templates, with three real case studies showing the before and after.

Why agents fail to ship projects

Human work implicitly runs a six-stage loop:

  1. Research
  2. Structure the information
  3. Assign roles
  4. Each role authors their piece
  5. Review and flag issues
  6. Revise based on feedback

When we ask an AI to "write something," we're asking for stage 4 — the authoring — without any guidance on the other five. That's why the output reads as a single coherent essay but doesn't have the depth and structure a real deliverable needs.

Harness engineering makes all six stages explicit in the prompt. Antigravity, built for long-horizon runs, responds especially well to this kind of scaffolding. Once you've written one harness, you tend to reuse it.

The six-stage harness, Antigravity-ready

Here's the template I keep at .antigravity/harnesses/project-execution.md and invoke as /harness project-execution:

# Project Execution Harness
 
You will run a multi-stage project to completion. Each stage's output
must be written to a file and used as input for the next stage.
 
## Inputs
- Project: {{project}}
- Research target: {{target}}
- Audience: {{audience}}
- Deadline: {{deadline}}
 
## Step 1: Research Agent — Gather information
- Web search; find at least 6 cases across 2+ countries
- For each case, summarize: name / operator / tech stack / cost model / success factors / challenges
- Output: `docs/01-research.md`
 
## Step 2: Architect Agent — Structural design
- Read the Research Agent's output
- Design the documentation structure for the project
- Build a top-level `README.md` listing all docs
- For each doc: purpose, audience, page budget
- Output: `README.md` and `docs/_outline.md`
 
## Step 3: Business Agent — Business case
- Market size, competitive landscape, revenue model
- Japan-specific constraints (regulation, business customs)
- Output: `docs/02-business.md`
 
## Step 4: Engineer Agent — Technical design
- Compare tech stacks from research; recommend a stack
- Estimate headcount, dev duration, run-rate cost
- Output: `docs/03-engineering.md`
 
## Step 5: Reviewer Agent — Quality review
- Read all docs; propose improvements across:
  - Difficulty mismatch for the audience
  - Missing prerequisites
  - Terminology inconsistencies
  - Where diagrams are needed
  - Risk/issue coverage
- Output: `docs/99-review.md`
 
## Step 6: Revision
- Apply Reviewer feedback to each doc
- Append a final summary to README.md
 
## Quality floor (apply to all docs)
- Named audience
- Prerequisites listed
- 5+ defined terms
- Diagram instructions (actual diagrams later)
- Pros/cons comparisons
- Risk/issue inventory
 
## Completion criteria
- 6+ docs under `docs/`
- `README.md` links all docs
- 100% of Reviewer feedback resolved

Hand this to Antigravity and you get a directory of deliverables rather than a single long reply. The non-obvious detail is that every step specifies an output file. Without that, the agent tends to produce one monolithic document and call it done.

Running five agents through one workflow

When running Research / Architect / Business / Engineer / Reviewer in a single conversation, the key is explicit role handoff. Antigravity respects transition markers; this block style works well:

---
## 🔄 Role handoff: Research Agent → Architect Agent
 
Your work as Research Agent is complete.
From here, operate as the Architect Agent.
 
Architect Agent responsibilities:
- Read the previous output (docs/01-research.md)
- Design the overall documentation structure
- Do not author content (structure only)
 
Previous output: `docs/01-research.md`
Input to next agent: `docs/_outline.md`
---

Without the handoff block, a prompt like "proceed to the next step" produces Architect-stage output in the Researcher's voice — hybrids that read like research summaries masquerading as structural design.

Case study 1 — On-demand transit feasibility

A real project: prepare a feasibility document for local government considering an on-demand transit service. Comparing a naive request to the harnessed version:

Naive request (one 3,000-word essay)

  • No structure. Intro → benefits → challenges → examples → conclusion.
  • Missing what the municipality actually needs: budget envelope, operator options, 3-year exit criteria.
  • Unusable for a formal review.

Harnessed run (six files under docs/)

  • 01-research.md: Five domestic plus three international cases with tech stacks, cost structure, and operators
  • 02-business.md: Estimated market size, 3-year P&L, subsidy options
  • 03-engineering.md: Headcount, maintenance cost, tech-stack comparison
  • 04-risk.md: Region-specific risks (driver shortage, difficult demand forecasting)
  • 99-review.md: 12 reviewer findings, resolution status
  • README.md: Executive summary + links

The harnessed version could attach directly to a municipal review package. Same runtime — about 40 minutes of agent execution. The gap comes entirely from whether the implicit human process is made explicit for the agent.

Case study 2 — SaaS feature planning

Feature planning for a personal finance app: "add AI-based expense categorization." I adapted the harness:

  • Step 1 (Research): 5 competing implementations
  • Step 2 (Architect): Integration points with existing app
  • Step 3 (Business): Three monetization scenarios with estimates
  • Step 4 (Engineer): Tech selection (Gemini vs Claude vs custom model)
  • Step 5 (UX Designer): A new role. UI flow and error patterns.
  • Step 6 (Reviewer): Review across privacy, accuracy, experience

The on-demand transit project slotted Business at step 3; here I swapped a UX Designer into step 5 because UI flow would determine the deliverable's quality. The harness shouldn't be frozen — treat it as a framework you customize per project by swapping in the right specialists.

Case study 3 — 50-person enterprise AI rollout

For a B2B client ("plan a ChatGPT Enterprise rollout for our 50-person financial services firm"):

  • Research Agent: Case studies of comparable rollouts
  • Policy Agent: Draft internal guidelines and prohibitions
  • Training Agent: Staff training curriculum
  • Budget Agent: 3-year cost projection
  • Reviewer Agent: Legal and HR perspective

The distinctive outcome was that the Policy Agent produced prohibitions aligned specifically with financial-sector customer data handling rather than generic AI usage guidance. That happened because the Research Agent had already collected "AI deployments in financial services" in step 1. The payoff from early-stage specificity compounds in later stages — exactly what harness design is optimizing for.

Three common failure modes, in debug order

1. Over-stuffed steps. Cramming "5 domestic + 5 international cases + timeline + tech stack + cost + success factors + challenges + stakeholder quotes" into one Research Agent step means the agent silently drops one or two items. Five to seven requirements per step is a realistic ceiling.

2. Vague role names. "Planning Agent" or "Main Agent" makes the agent lose track of which identity it's wearing across handoffs. "Japanese Municipality Researcher" or "Financial-sector Policy Drafter" — names that carry the industry, specialty, and region — hold up under role changes.

3. Missing completion criteria. Without an explicit "done when" condition, the agent declares victory after the first draft and skips the review cycle. Minimum viable criterion: "until 100% of Reviewer findings are resolved."

Debug order when an output doesn't meet expectations: (1) check completion criteria, (2) check role name specificity, (3) check step loading, (4) check handoff blocks.

How to build your own harness

Harness engineering doesn't transfer by copy-paste. The required work is verbalizing your own implicit process.

Pick one recent AI-assisted project that disappointed you. If a person (you, or a teammate) had done it manually, what would the flow have been?

  1. What would you research first? (Research)
  2. How would you structure it? (Architect)
  3. Which specialists would you consult? (Subject-matter agents)
  4. Who would review? (Reviewer)
  5. What defines "done"? (Completion criteria)

The answers to these five questions are the skeleton of your domain-specific harness. You pay the verbalization cost once, and it pays back across every similar project afterward.

Harness engineering is, fundamentally, the act of translating your implicit expertise into agent-readable form. Paired with a long-horizon runtime like Antigravity, it's where the real upside of agentic AI starts to appear.

Share

Thank You for Reading

Antigravity Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

If you found this article helpful, a small tip ($1.50) would mean a lot to us. Your support helps keep this site ad-free and covers server and hosting costs.

Related Articles

Agents & Manager2026-05-13
Telling AI Agents 'Why' — 5 Context Design Principles to Prevent Misjudgment
When AI agents write correct code for the wrong job, the root cause is usually that you told them What but not Why. This guide covers 5 intent-context design principles for Antigravity agents, with practical patterns for AGENTS.md, task instructions, and error diagnostics.
Agents & Manager2026-07-02
Turning Last Night's Failed Runs into Tomorrow's Prevention — Designing a Postmortem Feedback Loop
Stop letting unattended failures end at a notification. A concrete design for classifying failures and feeding fixes back into Guide skills, gates, and schedules, with measured recurrence rates.
Agents & Manager2026-07-01
When the Tech-Debt Score Drops but the Same Files Keep Breaking — Field Notes on Instrumenting Fan-in and Churn
Letting Antigravity's architecture agent score technical debt is not enough — bugs often recur in the same files after refactoring. Here is how we instrumented the fan-in times churn that static complexity misses, and reconciled the score against real incidents.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →