Running Pre-Release Checks Without Opening the IDE — Designing the Android CLI as the Verification Gate of an Unattended Pipeline
How to slot Android CLI v1.0 into an unattended pipeline as its verification gate — three layers of checks, an exit-code contract, and a density-by-locale matrix, sized for an indie developer's day-to-day.
There is always a small pause before I tap the release button. I poke around in the emulator, glance at the main screens, and quietly ask myself whether it is really fine. As an indie developer juggling several apps, that "one last look" piles up, and shipping starts to feel heavier than it should.
On June 24, 2026, the Android CLI reached its v1.0 stable release. The headline is that you can now run semantic analysis, Compose preview rendering, and UI tests from the command line without opening the IDE. What drew me in was not the feature list itself, but how it widens a design choice: where you put your verification.
For a long time, most pre-release verification silently assumed a human moving the mouse inside the IDE. You eyed the preview, right-clicked to run a test, and read the result. As long as one person sits in front of one screen, that works.
The trouble appears the moment you try to push automation one step further. You can hand the build to a scheduled agent, but the final verification still stalls until a person opens the IDE. The flow breaks right there. Drop a human tool into an unattended context and everything jams just before it.
What the Android CLI changes is the location of verification
The real significance of the Android CLI is that it carries verification out of the IDE and into the world of the command line and exit codes. When pass or fail comes back as an exit code, verification becomes one stage in a pipeline. A shell, a cron job, and an agent's run loop can all invoke it the same way.
That raises one design decision: rather than handing everything to the machine, where do you draw the line between "checks the machine owns" and "judgment a human takes back"? In this article I organize that line into three layers.
✦
Thank you for reading this far.
Continue Reading
What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.
WHAT YOU'LL LEARN
✦How to build a three-layer gate that runs semantic analysis, Compose preview rendering, and UI tests without opening the IDE
✦An exit-code contract that separates failures a machine may block from failures a human must judge
✦A practical pattern that covers 4 densities x 2 locales = 8 combinations unattended, catching breakage before release
Secure payment via Stripe · Cancel anytime
✦
Unlock This Article
Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.
The defects you want to catch before release differ wildly in nature. Mix them into one gate and it becomes unclear which ones the machine is allowed to block on. I think in terms of these three layers.
Layer
Mainly catches
Safe to block unattended?
Semantic analysis
Unresolved references, type mismatches, unreachable code
Yes (deterministic)
Preview rendering
Broken layouts, overflow, rendering exceptions
Conditional (via baseline diff)
UI tests
Broken flows, missed state transitions
Yes (if reproducible)
Layer one: checks that settle statically
Semantic analysis does not waver. An unresolved reference, a type that does not fit, code that is never reached — none of these leave room for taste. The machine may block on them, which makes this the ideal first gate for an unattended run.
Layer two: breakage that only shows once rendered
Compose preview rendering catches the breakage static analysis cannot see. Judging "is it broken" needs a reference, though. I keep baseline images per density and treat anything above a tolerance threshold (0.5% in my setup) as advisory — routed to a human rather than blocked. Let the machine rule on this and it will stop intentional tweaks too.
Layer three: defects that only surface when operated
UI tests actually walk the screens and confirm state transitions. If a test is reproducible, blocking the release when it fails is fine. Conversely, blocking on a flaky test that fails now and then erodes trust in unattended operation itself. A flaky test needs a decision — fix it or remove it — before it enters the gate.
Designing the exit code as a contract
The single most useful thing in unattended operation is the promise that each check returns its result as an exit code. The subcommand names can track the official docs; what you should fix in place is this contract.
#!/usr/bin/env bashset -euo pipefail# Verification gate for an unattended pipeline. Each check runs in# sequence; if one fails, the rest are skipped. Let subcommand names# track the docs and fix only the "return via exit code" contract.REPORT_DIR="reports/$(date +%Y%m%d-%H%M%S)"mkdir -p "$REPORT_DIR"run_stage () { local name="$1"; shift echo "> $name" if "$@" > "$REPORT_DIR/$name.log" 2>&1; then echo " ok $name" else local code=$? echo "$name $code" >> "$REPORT_DIR/failures.txt" echo " fail $name (exit=$code)" return "$code" fi}run_stage analyze android analyze --format json --out "$REPORT_DIR/analyze.json"run_stage preview android preview render --baseline baselines/ --out "$REPORT_DIR/preview"run_stage uitest android test ui --module app --out "$REPORT_DIR/uitest.xml"
This wrapper guarantees exactly one thing regardless of how a check is implemented: a failure records its name and exit code in failures.txt. The later stages only need to read that file.
Separating failures a machine may block from failures a human takes back
Treat all three layers alike and unattended operation turns brittle. A deterministic check failing stops the release. An advisory check's diff does not stop anything; it goes into a human review queue. Make that split explicit in code and your morning self will not hesitate.
import json, sys, pathlibREPORT = pathlib.Path(sys.argv[1])# Deterministic checks block on failure. Advisory checks go to a human.BLOCKING = {"analyze", "uitest"}ADVISORY = {"preview"}failures = {}fpath = REPORT / "failures.txt"if fpath.exists(): for line in fpath.read_text().splitlines(): stage, code = line.split() failures[stage] = int(code)block = [s for s in failures if s in BLOCKING]review = [s for s in failures if s in ADVISORY]digest = REPORT / "digest.md"digest.write_text( f"# Pre-release check {REPORT.name}\n\n" f"- Blocked: {', '.join(block) or 'none'}\n" f"- Review: {', '.join(review) or 'none'}\n")# Make the exit code the answer to the caller (0=pass / 1=human / 2=stop)sys.exit(2 if block else (1 if review else 0))
The caller can branch: 2 stops the release, 1 pushes it onto a human review queue, 0 proceeds. The key is to not treat a "return to human" failure as a failure at all. A diff is material for judgment, not a reason to stop.
Covering the density-by-locale matrix unattended
Back when I was building wallpaper apps, rendering breakage often appeared only in a specific combination of density and locale. By hand, coverage inevitably gets thin. If you can drive rendering from the CLI, you can brute-force the combinations.
DENSITIES="mdpi hdpi xhdpi xxhdpi"LOCALES="ja en"for d in $DENSITIES; do for l in $LOCALES; do android preview render --density "$d" --locale "$l" \ --baseline "baselines/$d-$l" --out "out/$d-$l" \ || echo "diff: $d/$l" >> review-queue.txt donedone
Four densities and two locales make 8 combinations. Compared with reviewing each one by hand, missed cases dropped noticeably. Rendering itself takes a few seconds per screen, so even 8 combinations finish quickly. Only the combinations that diff end up in review-queue.txt, so a human looks only at what actually broke. I strongly recommend this "render everything, hand only the breakage to a person" shape.
Logs your next-morning self will not curse
What scares me in unattended operation is a quiet failure at 3 a.m. that leaves no trace by breakfast. This is where a production stumble pays off. Mine was streaming logs to standard output and never saving them; when I tried to reproduce the issue, there were no clues left.
The fix is unglamorous. Cut a timestamped directory per run, and keep both the raw log of each check and a human-readable digest.md. With that, the morning check becomes two steps: read the digest, drop into the raw log only if needed. One caution — keep proper nouns and any secrets out of the digest by trimming what it prints.
What to delegate, and what to keep in hand
What the Android CLI gives is the freedom to move verification outside the IDE. But that is not the same as surrendering everything to the machine. Hand deterministic checks to the machine, and keep the "matters of degree" — like a rendering shift — in your own hands. That line, I believe, is the spine that lets unattended operation last.
As a next step, start by wiring in just the semantic-analysis layer. It is deterministic, low on false positives, and its effect is immediately clear. Once trust has grown there, adding rendering and UI tests in that order fits an indie developer's scale.
Thank you for reading this far. I hope it offers a useful design cue for anyone building their own unattended operation.
Share
Thank You for Reading
Antigravity Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.