More Agents Won't Speed Up Every Part of Your Pipeline — Designing the Parallel/Serial Line

Antigravity 2.0's parallel multi-agent execution is powerful, but adding agents doesn't make everything faster. Here's how I decide which work to parallelize and which to keep serial, derived from invariants and a dependency graph, with examples from running several sites as a solo developer.

agents⁹⁸ parallel-execution⁷ orchestration²¹ throughput antigravity-2²

✦ Premium Article

The day I raised the number of agents running at once from two to four, the overall result actually came out slower. As an indie developer running several sites of my own (Dolice Labs) in parallel, I had assumed that assigning each site's article generation to its own agent would simply double my throughput. In reality, generation itself got faster, but the final verification and push backed up into a single waiting line, and the total wait time grew.

With Antigravity 2.0 putting "true parallel execution of multiple agents" front and center, the picture where one agent writes a component, another wires up an API route, and a third runs visual regression tests has become real. But as the parallel openings widen, it gets easier to fall for the illusion that "just line everything up and it'll be fast." The real task is to decide which work to parallelize and which to keep serial from structure, not from intuition.

The Moment Parallelism Feels "Free"

Parallel execution feels unconditionally good when you look at each agent's work as an independent box. In practice, though, the boxes are connected by invisible lines: the same git working tree, the same disk, the same model API rate budget, and cross-cutting invariants like "the Japanese and English article counts must always match." Every one of these becomes kindling for contention the instant you step into parallelism.

What parallelism buys you is only the shrinkage of the parts that can progress independently. The parts that touch shared resources, or where ordering carries meaning, don't shrink no matter how many agents you stack. They can even get slower, as locks and retries pile up for coordination. The first move isn't adding agents; it's putting into words which of your work is independent and which is shared.

Three Questions for Spotting Parallel-Safe Work

Whether a piece of work can go into the parallel lane comes down to three questions. If even one answer is "no," lining it up as-is is likely to break something.

Question	Yes (fits parallel)	No (pull toward serial)
Is the output independent of others?	Site A's draft and Site B's draft don't reference each other	A later stage takes the earlier stage's output as input
Does it write to shared mutable state?	Each writes only to its own file or branch	They update the same index or the same aggregate file
Does it share an external rate limit?	Targets are separate, or there's ample headroom under the cap	Everyone competes for the same quota on the same model

Of these three, the third is the most easily missed. Even if each agent's work is independent, if all of them hit the same Gemini quota, the moment you raise the agent count the 429s increase and retries inflate wall-clock time. It's safer to treat the upper bound on parallelism as set by "the narrowest pipe among the shared resources," not by "how many agents you happen to have."

✦

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN

✦Three questions for telling parallel-safe work apart from serial work (independent output, shared mutable state, shared external rate limit), with a decision table

✦A copy-paste scheduler (JavaScript) that derives the critical path and the set of concurrently runnable tasks from a dependency map

✦A measurement harness for confirming why throughput doesn't scale linearly with agent count, plus how to design the join step that must stay serial

Secure payment via Stripe · Cancel anytime

✦

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

Unlock all articles with Membership →

Deriving What Stays Serial From the Invariants

What stays serial is reasoned backward from the promises you must not break (the invariants), not from preference. The hardest promise in my pipeline is that the Japanese article count and the English article count are always equal. It's the premise for avoiding a 404 on language switch, and if it breaks, trust in the whole site is damaged.

To hold this invariant, the operation that "writes a JA/EN pair as one unit" must not be split. If two agents fire a push into the same category at the same time, you can end up with a half-finished state where only one side's English version lands first during a rebase. So while generation (drafting) can be parallel, the join step of verify → count check → push is kept on a single serial track per repository.

[parallel OK] per-site drafting (independent outputs)
        ├─ agent-A: claudelab ja/en drafts
        ├─ agent-B: gemilab ja/en drafts
        └─ agent-C: antigravitylab ja/en drafts
                    │
[serial required] per-repo join (the single track that holds the invariant)
        for repo in repos:
          gate(repo) -> assert ja_count == en_count -> push(repo)

The point is to make the serial region as small as you can. "All serial" is safe but slow. Keep serial only the minimal stretch that touches the invariant you want to protect, and open everything else to parallel. How you draw this boundary is, directly, how fast the whole thing runs.

Deriving the Concurrently Runnable Set From a Dependency Graph

Managing the line purely in your head will always fall apart as tasks grow. If you keep inter-task dependencies as an explicit graph, you can derive "the set you may run at the same time right now" mechanically. Below is a minimal implementation that computes the critical path and each stage's parallel-runnable set from a dependency map. When you write orchestration with Antigravity's SDK, inserting this preprocessing makes the "openings you may line up" surface on their own.

// tasks: declare each task's dependencies (an array of prerequisite task names)
const tasks = {
  draftClaude:  { deps: [],                cost: 40 }, // generation is independent
  draftGemini:  { deps: [],                cost: 40 },
  draftAnti:    { deps: [],                cost: 40 },
  gateClaude:   { deps: ["draftClaude"],   cost: 8  }, // verification depends on its draft
  gateGemini:   { deps: ["draftGemini"],   cost: 8  },
  gateAnti:     { deps: ["draftAnti"],     cost: 8  },
  pushAll:      { deps: ["gateClaude","gateGemini","gateAnti"], cost: 6 }, // the join
};
 
// group tasks whose deps are satisfied into "waves" (same wave = concurrently runnable)
function scheduleWaves(tasks) {
  const done = new Set();
  const waves = [];
  const all = Object.keys(tasks);
  while (done.size < all.length) {
    const ready = all.filter(
      (t) => !done.has(t) && tasks[t].deps.every((d) => done.has(d))
    );
    if (ready.length === 0) throw new Error("cyclic dependency");
    waves.push(ready);
    ready.forEach((t) => done.add(t));
  }
  return waves;
}
 
// compute the critical-path length (the theoretical minimum wall-clock time)
function criticalPath(tasks) {
  const memo = {};
  const longest = (t) => {
    if (memo[t] != null) return memo[t];
    const base = tasks[t].deps.reduce((m, d) => Math.max(m, longest(d)), 0);
    return (memo[t] = base + tasks[t].cost);
  };
  return Math.max(...Object.keys(tasks).map(longest));
}
 
console.log(scheduleWaves(tasks));
// => [['draftClaude','draftGemini','draftAnti'], ['gateClaude','gateGemini','gateAnti'], ['pushAll']]
console.log("critical path:", criticalPath(tasks), "→ 40 + 8 + 6 = 54");

The value 54 that criticalPath returns is what matters. It represents the theoretical floor: "even with infinite agents, you won't go faster than this." No matter how parallel the three drafts run, as long as the join pushAll exists, the whole thing pins to the length of one draft plus verification plus the join. Knowing this floor before you add agents keeps your judgment honest.

Measuring That Throughput Doesn't Scale Linearly

Even when the floor is visible in theory, the actual gain has to be measured. Keep a plain harness that times wall-clock while varying the degree of parallelism, and you'll grasp the "agent count that actually helps" for your pipeline.

#!/usr/bin/env bash
# vary parallelism across 1, 2, 4 and time the total wall-clock for the same jobs
set -euo pipefail
jobs=(draftClaude draftGemini draftAnti)   # jobs that can run independently
 
run_one() { sleep "$((RANDOM % 3 + 2))"; }  # replace this with the real generation call
 
for p in 1 2 4; do
  start=$(date +%s)
  printf '%s\n' "${jobs[@]}" | xargs -P "$p" -I{} bash -c 'run_one "{}"' _ 2>/dev/null || \
  printf '%s\n' "${jobs[@]}" | xargs -P "$p" -I{} sleep 2
  end=$(date +%s)
  echo "parallelism ${p}: $((end - start))s"
done

When I measured my article-generation pipeline in my own environment, instead of the naive 4x speedup, even at parallelism 4 the wall-clock only shrank by about 1.8x versus the parallelism-1 case. The reason is plain: generation can be opened up in parallel, but the serial stretch of verify and push, and the shared model-API quota, remain. That "1.8x" figure is exactly the retreat line that says adding more agents no longer pays. If you want more speed, shortening the serial stretch itself (lighter verification, a different join granularity) does more than adding agents.

Where I Drew the Line in My Pipeline

The boundary I eventually settled on is very simple, and it comes down to three rules.

Open drafting to parallel per site (independent outputs, so they may be lined up).
Close verification, count check, and push to serial per repository (the single track that holds the invariant).
Cap concurrent calls to the model API so the agents don't devour a shared quota among themselves.

I recommend keeping them in this order. With just those three, the line-waiting backup that had appeared when I increased agent count almost entirely disappeared.

Parallelism is the work of finding independent parts and opening them up; it isn't a race to move everything at once. Put the promise you want to keep into words first, leave serial only the smallest stretch that touches it, and lay out the rest. The order is always this direction.

As a next step, write down just one place in your currently-parallel work where multiple agents "write to the same mutable state." That spot is the real bottleneck that won't get faster no matter how many agents you add.

Thank You for Reading

Antigravity Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.