Size Antigravity Agent Tasks by What You Can Review — a Practical Cure for Rework

On a Friday evening I wanted CSV export on an admin screen, so I handed the agent a single line: "Add a CSV export feature to the list view." Forty minutes later, the implementation came back — and it worked. It also duplicated an existing download routine almost line for line in a new file, and the button matched none of the surrounding screens. I rewrote it myself on Monday morning.

The agent was not the problem. My request was. For an indie developer running several projects in parallel, the expensive part is rarely the time an agent spends working — it is the time you spend fixing what comes back. After that Monday, I rebuilt how I size tasks and write briefs, and the shape of my rework changed visibly. These are the rules, with the actual briefs included.

Why big requests break — count the hidden decisions

Afterward, I counted the implicit decisions buried in that one-line request. Which file the feature lives in. Whether to reuse the existing export logic. Whether current filters apply. Which encoding to write. Where the button goes and how it is styled. Conservatively, more than ten.

Handing over one line means the agent makes all ten decisions alone. Worse, decisions stack like building blocks: one early misjudgment — "no need to look at existing download code" — and the remaining nine are stacked correctly on top of a wrong foundation. When a result is "working but unusable," what is broken is usually not the code. It is the premise underneath it.

The opposite extreme fails too. Split the work into ten chat-sized questions and the context shatters; every exchange loses the premises of the last one, and you pay the re-explanation tax each time. If both ends fail, the question becomes where to cut.

One task = what I can review in fifteen minutes

The rule I landed on: a task is the amount of output I can read and judge in about fifteen minutes. Anything larger gets split.

Cutting by reviewability tends to land the seams at natural phase boundaries — investigation, implementation, verification. It also plays well with how Antigravity works: agents produce implementation plans and walkthroughs as Artifacts, which gives you a checkpoint to review the plan before any code exists. Saying "please reuse the existing export helper" at the plan stage costs a few seconds. Saying it after implementation costs the whole forty minutes again.

The same feature, redone as three briefs

A few days later, on a different screen, I reissued the failed request as three separate briefs.

Brief 1 (investigation — no code changes)

In src/admin, list every existing routine related to exporting or downloading data. For each, note whether it can be reused; where reuse is a bad idea, say why. Do not change any code.

Brief 2 (implementation — with acceptance criteria)

Using useExportBase from the investigation memo, implement CSV export on the orders list. Acceptance criteria: (1) a button appears at the top right, styled like the other screens; (2) the downloaded CSV reflects the currently active filters; (3) no new utility functions are introduced.

Brief 3 (verification)

Exercise the feature in the browser and leave a walkthrough. Please check three cases: zero rows, one hundred thousand rows, and cells containing line breaks and quotation marks.

The three briefs take longer end to end than the one-liner did. But with no Monday rewrite at the end, the total is faster. One small trick worth copying: the explicit "do not change any code" in Brief 1. Without it, an investigation request sometimes comes back with a helpful, unrequested implementation attached.

Acceptance criteria: describe the finished state in three lines

The acceptance criteria attached to Brief 2 are the load-bearing part of this whole routine. The format is plain: three observable statements describing what "done" looks like.

The craft is in deleting vague words. To an agent, "appropriately," "as needed," and "in a clean way" all translate to "you decide." If there is a decision you do not want delegated, write it down as a criterion on the spot. Whatever you leave unwritten, you have delegated — and you forfeit the right to complain about how it was decided. Moving that line into the brief itself has eliminated most of my arguments with the review (the counterparty being an agent notwithstanding).

Criteria pay a second dividend: Brief 3 reuses them verbatim as its verification checklist. Two minutes of writing buys the test plan for free.

Going parallel? Cut along file boundaries

Antigravity 2.0 added worktrees and project management, which makes running several agents at once genuinely practical. But parallel failures almost always reduce to a single cause: two tasks touching the same file.

My rule is conservative. I parallelize only when I can say with confidence that the sets of files involved do not intersect. If I hesitate, it runs serially. In practice, the time saved by parallel execution has been smaller than I hoped and the time lost untangling merge conflicts larger than I feared. The wins survive only across coarse boundaries — frontend versus backend, app code versus documentation.

When a big, vague request is the right call

The review-unit rule has a deliberate exception. Throwaway prototypes, greenfield experiments, and explorations where failure costs nothing — those I hand over in one big line, on purpose. Output that will never be reviewed has no business being sized for review.

The deciding question is the lifespan of the artifact. Code I will maintain gets cut small and walked through checkpoints; code I will discard gets thrown large and fast. Same agent, different brief shapes, chosen by how long the result has to live.

One question before you hit send

Next time you are about to hand an agent a task, pause after writing the brief and ask: how many minutes will it take me to review what comes back? If the honest answer is more than fifteen, send only the investigation or the plan first. That single habit changes both the precision of what returns and the shape of the time you spend correcting it.