A few weeks into using agentic AI tools, most developers hit the same wall: the agent answers individual questions well but fails to deliver a "project." I've lived this repeatedly with Antigravity. Ask it to "prepare a feasibility document for an on-demand transit service," and you get a long-form blog post — not a structured deliverable a municipality could actually route through a decision process.
"Harness engineering," a framing originally articulated by nouernet on Qiita, offers a way out. The idea: don't ask the agent for an answer. Configure it to perform the work. This post translates that philosophy into Antigravity-specific templates, with three real case studies showing the before and after.
Why agents fail to ship projects
Human work implicitly runs a six-stage loop:
- Research
- Structure the information
- Assign roles
- Each role authors their piece
- Review and flag issues
- Revise based on feedback
When we ask an AI to "write something," we're asking for stage 4 — the authoring — without any guidance on the other five. That's why the output reads as a single coherent essay but doesn't have the depth and structure a real deliverable needs.
Harness engineering makes all six stages explicit in the prompt. Antigravity, built for long-horizon runs, responds especially well to this kind of scaffolding. Once you've written one harness, you tend to reuse it.
The six-stage harness, Antigravity-ready
Here's the template I keep at .antigravity/harnesses/project-execution.md and invoke as /harness project-execution:
# Project Execution Harness
You will run a multi-stage project to completion. Each stage's output
must be written to a file and used as input for the next stage.
## Inputs
- Project: {{project}}
- Research target: {{target}}
- Audience: {{audience}}
- Deadline: {{deadline}}
## Step 1: Research Agent — Gather information
- Web search; find at least 6 cases across 2+ countries
- For each case, summarize: name / operator / tech stack / cost model / success factors / challenges
- Output: `docs/01-research.md`
## Step 2: Architect Agent — Structural design
- Read the Research Agent's output
- Design the documentation structure for the project
- Build a top-level `README.md` listing all docs
- For each doc: purpose, audience, page budget
- Output: `README.md` and `docs/_outline.md`
## Step 3: Business Agent — Business case
- Market size, competitive landscape, revenue model
- Japan-specific constraints (regulation, business customs)
- Output: `docs/02-business.md`
## Step 4: Engineer Agent — Technical design
- Compare tech stacks from research; recommend a stack
- Estimate headcount, dev duration, run-rate cost
- Output: `docs/03-engineering.md`
## Step 5: Reviewer Agent — Quality review
- Read all docs; propose improvements across:
- Difficulty mismatch for the audience
- Missing prerequisites
- Terminology inconsistencies
- Where diagrams are needed
- Risk/issue coverage
- Output: `docs/99-review.md`
## Step 6: Revision
- Apply Reviewer feedback to each doc
- Append a final summary to README.md
## Quality floor (apply to all docs)
- Named audience
- Prerequisites listed
- 5+ defined terms
- Diagram instructions (actual diagrams later)
- Pros/cons comparisons
- Risk/issue inventory
## Completion criteria
- 6+ docs under `docs/`
- `README.md` links all docs
- 100% of Reviewer feedback resolvedHand this to Antigravity and you get a directory of deliverables rather than a single long reply. The non-obvious detail is that every step specifies an output file. Without that, the agent tends to produce one monolithic document and call it done.
Running five agents through one workflow
When running Research / Architect / Business / Engineer / Reviewer in a single conversation, the key is explicit role handoff. Antigravity respects transition markers; this block style works well:
---
## 🔄 Role handoff: Research Agent → Architect Agent
Your work as Research Agent is complete.
From here, operate as the Architect Agent.
Architect Agent responsibilities:
- Read the previous output (docs/01-research.md)
- Design the overall documentation structure
- Do not author content (structure only)
Previous output: `docs/01-research.md`
Input to next agent: `docs/_outline.md`
---Without the handoff block, a prompt like "proceed to the next step" produces Architect-stage output in the Researcher's voice — hybrids that read like research summaries masquerading as structural design.
Case study 1 — On-demand transit feasibility
A real project: prepare a feasibility document for local government considering an on-demand transit service. Comparing a naive request to the harnessed version:
Naive request (one 3,000-word essay)
- No structure. Intro → benefits → challenges → examples → conclusion.
- Missing what the municipality actually needs: budget envelope, operator options, 3-year exit criteria.
- Unusable for a formal review.
Harnessed run (six files under docs/)
01-research.md: Five domestic plus three international cases with tech stacks, cost structure, and operators02-business.md: Estimated market size, 3-year P&L, subsidy options03-engineering.md: Headcount, maintenance cost, tech-stack comparison04-risk.md: Region-specific risks (driver shortage, difficult demand forecasting)99-review.md: 12 reviewer findings, resolution statusREADME.md: Executive summary + links
The harnessed version could attach directly to a municipal review package. Same runtime — about 40 minutes of agent execution. The gap comes entirely from whether the implicit human process is made explicit for the agent.
Case study 2 — SaaS feature planning
Feature planning for a personal finance app: "add AI-based expense categorization." I adapted the harness:
- Step 1 (Research): 5 competing implementations
- Step 2 (Architect): Integration points with existing app
- Step 3 (Business): Three monetization scenarios with estimates
- Step 4 (Engineer): Tech selection (Gemini vs Claude vs custom model)
- Step 5 (UX Designer): A new role. UI flow and error patterns.
- Step 6 (Reviewer): Review across privacy, accuracy, experience
The on-demand transit project slotted Business at step 3; here I swapped a UX Designer into step 5 because UI flow would determine the deliverable's quality. The harness shouldn't be frozen — treat it as a framework you customize per project by swapping in the right specialists.
Case study 3 — 50-person enterprise AI rollout
For a B2B client ("plan a ChatGPT Enterprise rollout for our 50-person financial services firm"):
- Research Agent: Case studies of comparable rollouts
- Policy Agent: Draft internal guidelines and prohibitions
- Training Agent: Staff training curriculum
- Budget Agent: 3-year cost projection
- Reviewer Agent: Legal and HR perspective
The distinctive outcome was that the Policy Agent produced prohibitions aligned specifically with financial-sector customer data handling rather than generic AI usage guidance. That happened because the Research Agent had already collected "AI deployments in financial services" in step 1. The payoff from early-stage specificity compounds in later stages — exactly what harness design is optimizing for.
Three common failure modes, in debug order
1. Over-stuffed steps. Cramming "5 domestic + 5 international cases + timeline + tech stack + cost + success factors + challenges + stakeholder quotes" into one Research Agent step means the agent silently drops one or two items. Five to seven requirements per step is a realistic ceiling.
2. Vague role names. "Planning Agent" or "Main Agent" makes the agent lose track of which identity it's wearing across handoffs. "Japanese Municipality Researcher" or "Financial-sector Policy Drafter" — names that carry the industry, specialty, and region — hold up under role changes.
3. Missing completion criteria. Without an explicit "done when" condition, the agent declares victory after the first draft and skips the review cycle. Minimum viable criterion: "until 100% of Reviewer findings are resolved."
Debug order when an output doesn't meet expectations: (1) check completion criteria, (2) check role name specificity, (3) check step loading, (4) check handoff blocks.
How to build your own harness
Harness engineering doesn't transfer by copy-paste. The required work is verbalizing your own implicit process.
Pick one recent AI-assisted project that disappointed you. If a person (you, or a teammate) had done it manually, what would the flow have been?
- What would you research first? (Research)
- How would you structure it? (Architect)
- Which specialists would you consult? (Subject-matter agents)
- Who would review? (Reviewer)
- What defines "done"? (Completion criteria)
The answers to these five questions are the skeleton of your domain-specific harness. You pay the verbalization cost once, and it pays back across every similar project afterward.
Harness engineering is, fundamentally, the act of translating your implicit expertise into agent-readable form. Paired with a long-horizon runtime like Antigravity, it's where the real upside of agentic AI starts to appear.