When Your Agent Got 4x Faster: Rebuilding the Parallel Pipeline

When the Antigravity CLI moves to a faster model, the bottleneck in your parallel agent pipeline shifts. Here is a practical way to rethink verification, task granularity, concurrency, and cost caps with speed as the new baseline.

antigravity³⁵⁸ antigravity-cli⁵ agents⁹² architecture¹³ pipeline⁴

✦ Premium Article

One morning I moved the helper pipeline that drafts my nightly content over to the new Antigravity CLI. The engine underneath had switched to a faster model, and each step now responded roughly three to four times quicker. I assumed total throughput would climb by the same amount.

When I actually measured it, the improvement was only about 1.4x. The agent's "thinking time" had clearly shrunk, yet the pipeline as a whole had barely moved.

Chasing the reason, I realized my design assumptions had gone stale. The structure I had built to "hide the waiting" back when the model was slow was now the very thing holding me back. As an indie developer running several apps and channels in parallel, the efficiency of these background pipelines quietly adds up. So where exactly should you rebuild the design when speed changes? Here is the set of judgment calls I landed on.

When speed rises, verification — not waiting — becomes the bottleneck

A parallel pipeline built around a slow model is usually organized to hide inference latency. You fire several tasks at once and process one result while the other is still thinking. As long as inference latency dominates, this scales the whole thing cleanly.

But once inference is 4x faster, everything else moves to the foreground. In my pipeline the new bottleneck was the verification I ran after each step: format checks, broken-link checks, build validation. These depend on external processes and the network, so they do not shrink just because the model got faster.

In other words, raising speed flips the inference-to-verification ratio. What used to be 80% inference and 20% verification becomes 20% inference and 80% verification. At that point, cranking up concurrency while leaving verification untouched only swells the verification queue. The first thing to confront is this inversion.

Re-slice tasks into smaller units

The old slow-model rule of thumb was to "do a lot in one call." With heavy per-call latency, minimizing round trips was rational.

That premise collapses when calls get cheap. Large units become a liability: a big task that fails is expensive to redo. If one call generates five files and fails on the fourth, you roll back the three that already succeeded too.

In this pipeline I re-sliced the unit of work from "a whole article" down to "a single section." The blast radius of a failure stays inside one section, and retries get lighter. Smaller granularity means more round trips, but with each trip now cheap, that increase is easily absorbed. My rule of thumb: if the work you roll back on a single failure exceeds the cost of the task itself, your granularity is too coarse.

✦

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN

✦Why faster inference moves the bottleneck from waiting to verification, and how to stage a two-tier check loop

✦How to set concurrency limits by blast radius instead of raw speed

✦An orchestration skeleton that bakes in retry budgets and rate limits so 4x speed never runs away

Secure payment via Stripe · Cancel anytime

✦

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

Unlock all articles with Membership →

Set concurrency by blast radius, not by speed

When things get faster, it is tempting to push concurrency up. Going from two concurrent agents to eight should, on paper, be 4x faster.

But I have come to believe you should never derive a concurrency cap from speed. The deciding factor is blast radius — how far the damage spreads when one of the running agents goes off the rails. Eight agents writing to the same repository at once is a hell of conflicts and rollbacks. The faster things are, the faster a runaway spreads too.

I partition concurrency by shared-resource boundaries. Tasks whose write targets are isolated (separate directories, separate branches) get higher concurrency; tasks that touch shared state get serialized. Concretely, independent generation tasks run at up to six in parallel, while an integration step that touches a shared config file runs strictly one at a time. Speed only applies within the isolated boundary.

Make the verification loop two-tiered (cheap checks first)

When verification becomes the bottleneck, the instinct is to make verification itself faster. What actually helps more is reordering it: cheap checks first, expensive checks last.

I split verification into two tiers. Tier one is static checks that finish in tens of milliseconds (required fields, banned words, format). Tier two is the heavy work that takes seconds (build, link reachability, consistency scan). Simply not letting tier-one rejects reach tier two cut the number of heavy verification runs to less than half.

This trick only pays off because speed went up. When generation is fast, regenerating an artifact that tier one rejected is cheap. The "generate freely, reject quickly with cheap checks" loop did not pay off on a slow model, but on a fast one it becomes the most efficient strategy. I stopped thinking of verification as something to remove and started treating it as something to reorder.

Bake cost caps and retry budgets into the design

Speed reshapes cost. When each attempt is cheap and fast, the psychological brake on retrying comes off, and before you know it you have retried something dozens of times. The faster things are, the faster a runaway bill stacks up too.

So I started embedding retries as a budget rather than unlimited goodwill. Each task carries a retry ceiling; when it hits the ceiling, the work drops to a human queue instead of staying with the agent. On top of that, the whole pipeline has an hourly call limit (a rate limit), so even an unexpected retry cascade caps the cost per hour.

As a rule of thumb I set the per-task retry ceiling to three. If something fails in the same spot three times, that is an input or design problem, and a fourth attempt is unlikely to fix it. The faster the environment, the more valuable it is to put this kind of "stopping" logic in up front.

The skeleton of the async pipeline I rebuilt

Pulling these decisions together, the configuration ends up looking roughly like this. The real orchestrator is more involved, but the skeleton makes the intent visible.

# pipeline_config.py — parallel pipeline rebuilt for a fast-model baseline
from dataclasses import dataclass, field
 
@dataclass
class StageBudget:
    name: str
    max_concurrency: int      # decided by blast radius (not speed)
    retry_limit: int = 3      # 3 failures at the same spot -> human queue
    cheap_checks: list = field(default_factory=list)   # tier 1: tens of ms
    heavy_checks: list = field(default_factory=list)   # tier 2: seconds
 
STAGES = [
    StageBudget(
        name="generate-section",       # granularity: a section, not an article
        max_concurrency=6,             # write targets are isolated -> parallel ok
        cheap_checks=["required_fields", "banned_words", "frontmatter"],
        heavy_checks=["build_ok", "link_reachable"],
    ),
    StageBudget(
        name="integrate-index",        # step that touches a shared config file
        max_concurrency=1,             # shared state -> serialize
        retry_limit=2,
        heavy_checks=["consistency_scan"],
    ),
]
 
# A global rate limit (calls per hour) caps the cost.
GLOBAL_RATE_LIMIT_PER_HOUR = 240
 
def gate(stage: StageBudget, artifact) -> bool:
    # Don't let tier-1 (cheap) rejects flow into tier-2 (expensive).
    for check in stage.cheap_checks:
        if not run_cheap(check, artifact):
            return False
    for check in stage.heavy_checks:
        if not run_heavy(check, artifact):
            return False
    return True

What matters here is not the function bodies but the structure: concurrency, retries, verification order, and the rate limit are all held explicitly on an axis separate from speed. Speed is variable, but blast radius and cost caps must not drift just because the model got faster. Externalizing them as constants means that the next time the model speeds up again, you can raise throughput without breaking the design.

After the rebuild, the throughput that started at 1.4x climbed to about 2.9x in my measurements. The gain from faster inference was finally recovered by redesigning verification and concurrency.

The order to rebuild in — applying it safely in production

When you actually touch an existing pipeline, the key is to not raise concurrency first. Get the order wrong and it breaks before it gets faster. Here is the sequence I follow in production.

Measure the inference-to-verification ratio just once. If it hasn't inverted, there's nothing to rebuild.
Split verification into a cheap tier one and an expensive tier two, and put the cheap one first. This is low-risk and high-impact.
Re-slice task granularity down to a unit where a failure rolls back little.
Identify every step that touches a shared resource and pin only those to a concurrency of one.
Only then raise concurrency within the isolated boundary — and only after retry ceilings and rate limits are in place.

There's a reason concurrency comes last: raise it before verification and granularity are in order, and isolating the cause of a failure suddenly gets much harder. I run this helper pipeline overnight while operating my calm-wallpaper apps on the App Store and Google Play. The closer a step gets to production data, the more it pays to respect this "don't-break-it" order. The trap is getting giddy about speed and changing several axes at once. Change one axis at a time, and re-measure the ratio after each.

What stays human after the speed changes

The biggest change from the speed-up was not the numbers but my own role. The faster agents move, the more a human's job shifts from "individual executions" to "the assumptions behind the design." How much concurrency, what to keep serial, where to hand back to a person — speed will not make those calls for you.

With the old CLI shutting down on June 18, many of you will be porting your automation over. When you do, don't stop at swapping in the new CLI — add the one extra step of rebuilding the design around the new speed. A swap alone got me 1.4x; a rebuild got me 2.9x. Same model, very different outcomes.

I would suggest measuring the inference-to-verification ratio in your own pipeline just once. If it has inverted, the rebuild described here should apply directly. Thank you for reading to the end.

Thank You for Reading

Antigravity Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.