Measuring the Go-Based Antigravity CLI's Responsiveness to Rethink My Nightly Batch
The Antigravity CLI was reimplemented in Go, and startup and first-response feel different now. I measure startup, time-to-first-token, and throughput as three separate intervals, then use those numbers to move my nightly batch from serial to parallel.
On June 18, the Gemini CLI and the Gemini Code Assist IDE extension stop serving individual users, with the successor Antigravity CLI taking over. I run automated posting for several sites in a nightly batch as an indie developer, so I swapped in the new CLI a little ahead of the deadline to try the destination out.
The first thing I noticed after swapping was that startup was clearly faster. The Antigravity CLI has been reimplemented in Go. The old Gemini CLI was Node-based, so the time it takes for the process to spin up is fundamentally different. But changing how I structure a batch based on "it feels faster" is risky, so I decided to put numbers on it first. This article is that measurement, and how I rebuilt the nightly batch once I had the numbers.
Before You Argue About "Fast" or "Slow," Split It Into Three Times
When a CLI feels slow inside automation, the cause usually falls into one of three buckets. If you collapse them into a single number, you'll reach for the wrong fix.
Startup (cold start): from launching the process until config loading and auth initialization finish. The time between issuing the command and the process being "ready."
Time to first token: from sending the request until the model's first output starts coming back. This is where the network and the model-side queue come into play.
Throughput: from the first token until the response completes. The longer the output, the more this dominates.
Whether the slowness you felt in your nightly batch was startup or throughput completely changes the fix. If startup is heavy, you should reduce the number of calls (i.e., batch them); if throughput is heavy, you should chunk the output shorter or run things in parallel. Looking only at total time, without splitting the intervals, makes that decision impossible.
Benchmark With Headless Execution
The Antigravity CLI offers non-interactive (headless) execution: you pass the prompt as an argument and the result streams to stdout. That's the foundation for benchmarking. Measuring in interactive mode is unreliable because the presence or absence of a TTY changes how output is emitted, so always measure non-interactively.
First, prepare a minimal command that measures startup alone. A subcommand that returns immediately, like --version, gives you the "launch plus exit" time with no real work, which is your cold-start indicator.
#!/usr/bin/env bash# bench_startup.sh — measure startup only N times and report the median# Using --version, which does no real work, isolates cold startset -euo pipefailCMD="${1:-antigravity}" # pass "gemini" as the arg to compare with the old CLIN="${2:-20}"times=()for i in $(seq 1 "$N"); do start=$(date +%s.%N) "$CMD" --version >/dev/null 2>&1 end=$(date +%s.%N) times+=("$(echo "$end - $start" | bc)")done# Report the median (robust to outliers); the mean is skewed by warmupprintf '%s\n' "${times[@]}" | sort -n | awk ' { a[NR]=$1 } END { if (NR % 2) print "median:", a[(NR+1)/2]; else print "median:", (a[NR/2] + a[NR/2+1]) / 2; print "min:", a[1], "max:", a[NR]; }'
There's a reason I report the median rather than the mean. The first few runs tend to be slow because of file caching and network connection setup, and the mean is heavily pulled by that. What you want to know when designing a batch is "how many seconds does it usually take to spin up," so the outlier-resistant median is more useful for the decision.
Next, measure first response and throughput separately. If the CLI streams its response, recording the moment the first byte reaches stdout gives you the boundary between first response and throughput.
#!/usr/bin/env bash# bench_response.sh — measure first response and throughput separately# Split the intervals at the moment the first line hits stdoutset -euo pipefailCMD="${1:-antigravity}"PROMPT="${2:-Write about 200 characters of dummy text for benchmarking.}"start=$(date +%s.%N)first_token=""# run --print is assumed to run the prompt non-interactively and stream to stdout"$CMD" run --print "$PROMPT" 2>/dev/null | while IFS= read -r line; do if [ -z "$first_token" ]; then first_token=$(date +%s.%N) echo "ttft: $(echo "$first_token - $start" | bc)s" fidoneend=$(date +%s.%N)echo "total: $(echo "$end - $start" | bc)s"
The key here is stamping the time the instant the first line arrives, inside while read. If you write it to collect all output at the end, you miss the first-response timestamp even when streaming is happening. The CLI's subcommand names (run --print and so on) change between versions, so check your local antigravity --help for the actual non-interactive option and adjust.
✦
Thank you for reading this far.
Continue Reading
What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.
WHAT YOU'LL LEARN
✦If you've been running nightly batches on the old Gemini CLI without knowing where the slowness actually was, you'll be able to measure startup, first response, and throughput as three separate intervals.
✦You'll take home a reproducible benchmark harness built on headless execution that you can drop straight into your own automation.
✦You'll be able to decide whether to keep things serial or stagger them in parallel based on real numbers, not a hunch.
Secure payment via Stripe · Cancel anytime
✦
Unlock This Article
Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.
The medians I measured on my setup (macOS, wired connection, the same prompt 20 times) came out roughly like this. The numbers depend heavily on the environment, so look at "which interval moved" rather than the absolute values.
Startup: old Gemini CLI ~0.9s → Antigravity CLI ~0.2s. This is where the Go implementation made the biggest difference.
First response: old ~1.3s → new ~0.8s. The switch to Gemini 3.5 Flash as the engine seems to help, but this interval is dominated by the network and the model-side queue, so it wobbles run to run.
Throughput (same output length): no meaningful difference. With the same number of output tokens, this just scales with token generation speed.
On paper, the Gemini 3.5 Flash engine is said to be roughly 4x faster than competing frontier models, but in my measurement throughput at the same output length showed no meaningful difference. The headline figure is just a reference point; it's worth re-running the benchmark for your own use case.
In other words, the slowness that mattered for my use case was startup, not throughput. My nightly batch called the CLI dozens of times per site, so a 0.7s startup saving per call added up across every call. Conversely, for tasks dominated by throughput you'd barely feel this difference. That's exactly why it's worth measuring the intervals separately.
Rebuild the Nightly Batch From the Measured Numbers
With numbers in hand, I revisited the design. My old setup was "one site at a time, serially, calling the CLI many small times." On the old CLI with heavy startup this was pure loss, but lighter startup changes the calculus.
The principle is simple: if startup dominates, reduce the call count (batch); if throughput dominates, stagger across time and parallelize. Since startup is light enough on the Antigravity CLI, rather than cramming everything into one process, running each site as an independent process staggered to a different time made failure isolation much easier.
#!/usr/bin/env bash# nightly.sh — split each site into an independent process and stagger start times# Startup is light, so prefer "split and stagger" over "cram into one process"set -euo pipefaildeclare -A OFFSET=( [siteA]=0 # anchored at 00:00 [siteB]=900 # +15 min [siteC]=2700 # +45 min [siteD]=3600 # +60 min)run_site() { local site="$1" sleep "${OFFSET[$site]}" # One site's work. If throughput is heavy, subdivide further here antigravity run --print "$(cat prompts/${site}.txt)" \ > "logs/${site}-$(date +%F).log" 2>&1 echo "[$(date +%T)] ${site} done (exit $?)"}for site in "${!OFFSET[@]}"; do run_site "$site" & # launch each site as an independent parallel processdonewaitecho "all sites finished"
I stagger the times because if every process fires its first request at the same instant, they all hit the API rate limit and the model queue together, and first response ends up slower anyway. Startup may be fast, but the first-response interval is bound by the network and the queue. Confirming this by measurement helps you avoid the "I parallelized it but it didn't get faster" trap. I also cover off-peak distribution and how far to hand non-interactive execution over to automation in Running the Antigravity CLI Non-Interactively: Designing Before You Put It on CI and cron.
Traps I Hit While Measuring
Actually measuring surfaced a few things that skew the numbers. Knowing them up front saves wasted re-measurement.
Throw away the warmup. The first two or three runs are slow from connection setup and cache misses. Take the median, or explicitly discard the first few runs before measuring. Measuring once and declaring it "slow" is the most dangerous pattern.
Watch for output buffering. Passed through a pipe, the CLI may block-buffer its output, so even though it's streaming, the first line arrives late in a clump. When first-response time unnaturally matches throughput time, suspect this. For stdout disappearing or lagging in a non-TTY environment, I've written it up in detail in Running the Antigravity CLI (agy) Non-Interactively in CI: Handling the Non-TTY stdout Problem.
Don't confuse rate limiting with your measurement. Firing repeatedly in a short window can make what looks like first-response time actually be rate-limit wait time. Put a sufficient gap between iterations in your measurement loop, or detect rate-limit responses and exclude them from the measurement.
Check whether --version is really lightweight. On some CLIs, --version still loads config files or runs an update check, so it isn't pure startup time. Diffing against strace or --help once to confirm no extra work is mixed in gives you peace of mind.
How to Decide When to Migrate
Even with the June 18 deadline, I don't recommend rushing every piece of automation over at once. I first verified startup and first response with a small measurement prompt, identified which interval dominates my batch, then switched just one low-impact site to the Antigravity CLI and ran it for a week. After confirming it was stable, I moved the rest. For how to choose between the CLIs in the first place, Choosing Seriously Between Gemini CLI and Antigravity: The Conclusion I Reached in the Field, May 2026 is also worth a look.
Start by running bench_startup.sh 20 times locally to get the median startup time for your own environment. With just that one number, the decision of "should I batch the calls, or stagger them in parallel" becomes far more concrete.
Share
Thank You for Reading
Antigravity Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.