ANTIGRAVITY LABJP
Articles/Agents & Manager
Agents & Manager/2026-06-30Advanced

When the Android CLI Got 3x Faster and Cut Tokens by ~70%, the Right Move Was More Verification Per Change — Not More Parallelism

Reading that the Android CLI agent runs ~3x faster while using ~70% fewer tokens, my first instinct was to ask how many runs to parallelize. But a faster agent doesn't change how much work ships — it changes where the queue forms. This walks through why, sizes the new bottleneck (review and verification gates) with Little's Law, enforces a WIP cap with a working Python admission controller, and reinvests the freed budget into depth per change — with measured results.

Antigravity294AI agents23production operations5WIP controlreliability design4

Premium Article

When I read that the Android CLI agent completes tasks about 3x faster while using roughly 70% fewer tokens, my first instinct — as an indie developer juggling several apps and sites — was to ask how many runs I should fire in parallel. If each run is faster, surely I can get through more work.

A little hands-on time proved that instinct wrong. What actually changes when the agent gets faster is not "how much work I can produce," but "where the queue forms." When the model was the slow step, adding capacity moved things forward. The moment the model gets fast, the head of the line moves to review and the verification gates. Raise parallelism without looking there, and you don't ship more — you just miss more.

This post frames why you should not convert speed into parallelism, sizes the new bottleneck with queueing math, enforces the limit as an admission controller, and reinvests the freed token budget into verification per change — all with code and measured numbers.

What a 3x speedup and 70% token cut really change: where it clogs

How an agent setup feels is decided by where the rate-limiting step sits. When generation was slow, most of the waiting was the model, so "faster model" and "more parallelism" paid off directly. Cut generation to a third and generation-wait drops to a third.

The catch is that the path from a change to production is not just generation. The agent's diff has to pass verification gates (type checks, tests, lint, evals), then a human review, before it merges. Speeding up generation does not widen those later stages at the same rate. Test runtime is unchanged, and the hours a person can spend reviewing are roughly fixed per day.

So a 3x/70% improvement doesn't speed up the whole pipeline — it moves the rate-limiting step. The head of the line shifts from the model to "gates plus review." Ignore that and you accumulate changes that are generated but neither verified nor reviewed. Only the entrance got faster; the exit is the same width.

What happens when you convert speed into parallelism (Little's Law)

To reason in numbers instead of vibes, use the most basic relationship in queueing theory — Little's Law.

L = λ × W
  L … items in the system at once (here: changes in progress = WIP)
  λ … throughput (items passing through the system per unit time)
  W … average time an item spends in the system (lead time)

Treat the system as "generate -> verify -> review -> merge." Then λ (changes you can truly merge per week) is set by the later stages. No matter how fast generation gets, the λ that verification and review can absorb does not rise.

Increase only parallelism (WIP = L) and the math forces W (lead time) to grow. With λ fixed and L up, W = L / λ must increase. That is exactly the state of "lots in flight, but each item takes longer and longer to come out." I reproduced it myself: right after switching to a faster model I doubled parallelism, and the days-to-merge for a given change actually got longer.

The conclusion is simple. Unless you raise the later-stage throughput λ, more WIP does not increase delivered. Speed should go somewhere other than parallelism.

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN
Use Little's Law to size the cap — admissible WIP equals review-plus-gate throughput times target lead time — so a 3x-faster agent doesn't tempt you past the parallelism your downstream can actually absorb
See, structurally, why pouring the speedup into parallelism leaves weekly delivered flat while only review quality drops — and stop it with a working Python admission controller (WIP cap)
Get a concrete rule for reinvesting the ~70% token savings into a second self-review pass and extra evals per change, plus the two metrics (delivered/week and escape rate) that prove it worked
Secure payment via Stripe · Cancel anytime

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

or
Unlock all articles with Membership →
Share

Thank You for Reading

Antigravity Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

Related Articles

Agents & Manager2026-06-01
Rolling Back a Half-Finished Agent: Compensating Transactions for Partial Failure
When you let an Antigravity agent run work that spans several external systems, a failure in the middle leaves the world half-rewritten. Retrying doesn't fix that. Here is how to fold it back safely with compensating transactions (the Saga pattern), with TypeScript and real operational numbers.
Agents & Manager2026-05-31
Flow Control for Autonomous Agents: Backpressure and Queues That Keep Production Alive
Run several Antigravity agents at once and the problem stops being how smart they are and becomes how little your downstream can absorb. Here is a flow-control design — bounded queue, semaphore, token bucket, backpressure, dead-letter — with TypeScript and real numbers.
Agents & Manager2026-05-30
Build for the Day the Agent Breaks Something: Keeping Blast Radius Small
Once you let an Antigravity agent touch production, the problem stops being how smart it is and becomes how much it wrecks when it slips. Here is a four-layer containment design that shrinks blast radius, with TypeScript and real numbers.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →