The Day Generation, Device, and Internal-Test Shipping Became One Step — What I Refused to Hand Over

AI Studio now turns a text prompt into a Kotlin/Compose app and carries it through the emulator, a real device, and Google Play's internal test track from one screen. Behind that convenience sits a question: how much of the moment of shipping do you hand to the machine, and what do you keep in your own hands? Here is where I draw the line as a solo developer running several apps, and the implementation that holds that boundary.

AI Studio² Antigravity²⁶⁷ Android¹⁷ Google Play³ app-dev³⁸

✦ Premium Article

I still hold my breath at the moment of shipping. Code can be fixed later. A crash, a review note, a clumsy store description — all of those can be rolled back the instant I notice them. But the fact that "this reached tens of thousands of people on the production track" cannot be undone once I press the button. As a solo developer juggling several apps, this is the one point that never feels lighter, no matter how far automation goes.

On June 24, 2026, Google AI Studio began generating Kotlin and Jetpack Compose apps from a text prompt, running them in an embedded emulator, pushing them to a real device over USB, and shipping them to Google Play's internal test track — all from a single screen. The distance between "build it, try it, ship it" collapsed at once. I welcomed the update, and at the same time I set myself one question: when everything connects this smoothly, unless I decide in advance where to stop, I will lose the ability to stop at all.

This article is about that line — where I let the machine carry the work and where I keep my own hands on it as I slot this end-to-end flow into a solo developer's shipping pipeline — and about the implementation that holds the boundary.

What became one step was the chaining of reversible work

The first thing worth seeing clearly is what distance AI Studio actually shortened. Generation, emulator runs, device transfer, internal-test shipping. Every one of those is reversible work. You can regenerate, you can relaunch the emulator endlessly, you can overwrite the install on the device. Even shipping to the internal test track only reaches the people you invited — yourself and a handful of testers — so a mistake stays contained.

In other words, what got chained into one screen was the connection of reversible steps. That is where the real convenience lives: the cost of bouncing back and forth through reversible work dropped to nearly zero. On my own side, redesigning the settings screen of a wallpaper app and then looking at it on a real device — a loop I would repeat — grew to a dozen-plus rounds in a single night. Before, my concentration broke at each step of waiting for a build, plugging in the cable, and waiting for the install.

The catch is that this smoothness makes the irreversible step look continuous with everything before it. After internal testing come closed testing, open testing, and then production. The same screen and the same feel carry through, and only the final push changes its nature. The first thing I decided was to build a deliberate step at that change of nature.

What I automate, and what I keep in my hands

The rule for drawing the line is simple. If it can be redone, give it to the machine; if there is no way back, keep it in your own hands. Applied step by step, the work split out like this.

Step	Nature	Owner	Why
Code generation / edits	Reversible	AI Studio	Regenerate anytime; diffs tracked in git
Emulator check	Reversible	Automated	Screenshot diffs of key screens are machine-judgeable
Device transfer	Reversible	Automated	Overwrite install; damage stays on my device
Pre-ship sanity check	Reversible	Automated (contract)	Version, signing, mapping integrity are sure things for a machine
Ship to internal test	Nearly reversible	Automated + approval	Only invitees; but recorded in the ledger
Promote to production	Irreversible	My own hands	Reaches the public. This is the one button I do not press from a script

What I want to emphasize is placing "ship to internal test" between automated and approval. Internal testing is nearly reversible in nature, but if I made it fully automatic, the psychological step between it and promotion to production would vanish. I deliberately left a small piece of friction on internal-test shipping: writing a one-line commit message. By leaving, in my own words, what the shipment is for, I can later recall what I actually verified when I move on to closed or production.

✦

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN

✦A way to split the pipeline from generation to internal-test shipping into reversible and irreversible steps, and draw the line of automation there

✦Handing a build to the internal test track via the Play Developer API, plus a contract-style pre-ship sanity check

✦A release ledger that keeps you owning generated code, and a pattern for running staged rollout safely on your own

Secure payment via Stripe · Cancel anytime

✦

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

Unlock all articles with Membership →

Making the pre-ship sanity check a contract

Before handing a generated app to the internal test track, I gathered only the promises a machine can reliably keep and made a script that returns them through exit codes. The design point here is a deliberate cut: let the machine handle only what the machine can judge, and bring nothing that needs judgment.

Concretely: is the versionCode monotonically larger than the last shipment, is the applicationId what I expect, does the release build have a mapping.txt, is the signing config not debug. Every one is an objective item with a clear true or false.

#!/usr/bin/env python3
"""preflight.py — the irreversible check before internal-test shipping.
Gathers only items with a clear true/false and contracts via exit codes.
  exit 0  pass
  exit 1  a violation the machine should block on (stop shipping)
  exit 2  bad input (failed to read configuration)
"""
import json
import re
import sys
from pathlib import Path
 
EXPECTED_APP_ID = "net.dolice.wallpaper"   # match your own app
LEDGER = Path("release_ledger.jsonl")      # past shipment records
 
def read_gradle_version(gradle: Path):
    text = gradle.read_text(encoding="utf-8")
    code = re.search(r"versionCode\s*[=\s]\s*(\d+)", text)
    name = re.search(r'versionName\s*[=\s]\s*"([^"]+)"', text)
    app  = re.search(r'applicationId\s*[=\s]\s*"([^"]+)"', text)
    if not (code and name and app):
        print("read failed: versionCode / versionName / applicationId")
        sys.exit(2)
    return int(code.group(1)), name.group(1), app.group(1)
 
def last_released_code() -> int:
    if not LEDGER.exists():
        return 0
    codes = [json.loads(l)["versionCode"] for l in LEDGER.read_text().splitlines() if l.strip()]
    return max(codes) if codes else 0
 
def main():
    gradle = Path("app/build.gradle.kts")
    if not gradle.exists():
        print("app/build.gradle.kts not found"); sys.exit(2)
 
    version_code, version_name, app_id = read_gradle_version(gradle)
    failures = []
 
    if app_id != EXPECTED_APP_ID:
        failures.append(f"applicationId mismatch: {app_id} != {EXPECTED_APP_ID}")
    if version_code <= last_released_code():
        failures.append(f"versionCode not monotonic: {version_code} <= {last_released_code()}")
 
    mapping = Path("app/build/outputs/mapping/release/mapping.txt")
    if not mapping.exists():
        failures.append("missing mapping.txt (R8 may not have run)")
 
    bundle = Path("app/build/outputs/bundle/release/app-release.aab")
    if not bundle.exists():
        failures.append("release .aab not found")
 
    if failures:
        print("stopping the shipment:")
        for f in failures:
            print(f"  - {f}")
        sys.exit(1)
 
    print(f"OK: {app_id} versionCode={version_code} ({version_name})")
 
if __name__ == "__main__":
    main()

I would not ask this script to judge whether the design is tidy or the wording reads naturally. That is the human's domain. To the machine I hand only the dull, certain items — monotonic increments, signing configs — that humans most easily overlook. Dull verification is exactly what a machine is good at and what a tired human misses most.

Handing the build to the internal test track

Once the sanity check passes, I hand the build to the internal test track through the Play Developer API. AI Studio's screen can do the same thing, but I want to keep the last few steps of shipping as my own script, in my own hands. The reason connects to the talk of ownership later on.

#!/usr/bin/env python3
"""publish_internal.py — hand the .aab to the internal test track.
Reads the service account key from the path in env var PLAY_SA_JSON.
"""
import os
import sys
from google.oauth2 import service_account
from googleapiclient.discovery import build
from googleapiclient.http import MediaFileUpload
 
PACKAGE = "net.dolice.wallpaper"
AAB = "app/build/outputs/bundle/release/app-release.aab"
SCOPES = ["https://www.googleapis.com/auth/androidpublisher"]
 
def service():
    key_path = os.environ["PLAY_SA_JSON"]   # path to the service account key
    creds = service_account.Credentials.from_service_account_file(key_path, scopes=SCOPES)
    return build("androidpublisher", "v3", credentials=creds, cache_discovery=False)
 
def main():
    note = sys.argv[1] if len(sys.argv) > 1 else ""
    if not note:
        print("pass a shipment note as an argument (what is this shipment for)"); sys.exit(2)
 
    svc = service()
    edit = svc.edits().insert(packageName=PACKAGE, body={}).execute()
    edit_id = edit["id"]
 
    uploaded = svc.edits().bundles().upload(
        packageName=PACKAGE, editId=edit_id,
        media_body=MediaFileUpload(AAB, mimetype="application/octet-stream"),
    ).execute()
    version_code = uploaded["versionCode"]
 
    svc.edits().tracks().update(
        packageName=PACKAGE, editId=edit_id, track="internal",
        body={"releases": [{
            "versionCodes": [version_code],
            "status": "completed",
            "releaseNotes": [{"language": "en-US", "text": note}],
        }]},
    ).execute()
 
    svc.edits().commit(packageName=PACKAGE, editId=edit_id).execute()
    print(f"shipped to internal: versionCode={version_code} note={note!r}")
 
if __name__ == "__main__":
    main()

If you change track="internal" to another value — say "production" — the same script could ship straight to production. That is precisely why I do not write the production track name in the script. Internal testing is the script's territory; promotion to production happens from the Play Console screen, with my own finger. What can be written in code and what should be written in code are, to me, two different things.

Running staged rollout safely, even alone

I let it rest on the internal track for a few days, use it daily on my main device, and actually operate the AdMob placements and the purchase flow to see that nothing broke. I do not automate this. The time I spend touching the app as a single user is the time I measure the distance between the generated code and myself.

Promotion goes one rung at a time: internal, then closed, then production. On closed and production I use Play Console's staged rollout, starting at a few percent and widening it while watching the crash rate and ANR. The table below shows the metrics I watch at each rung when I run this alone, and the thresholds at which I stop.

Track	Audience	Soak time	What I mainly watch	Roll-back threshold
Internal	Me + a few	2-3 days	Launch, key flows, purchases	Any one reproduction in a key flow
Closed	20-50 invitees	3-5 days	Crash rate, ANR	Crash rate over 1%
Production (staged)	5% → 20% → 100%	1-2 days each	Crash rate, low ratings	Halt at crash rate over 0.5%

The strength of running solo is that there is no back-and-forth in judgment. The weakness is that no one else points out what I missed. So I decide the thresholds as numbers in advance. It is a promise I make to myself ahead of time, so that in the elation of shipping I do not soften into "a little more watching and it'll be fine." In fact, on an update to one manifestation-themed app, the crash rate touched 0.6% at the production 5% stage, and following the threshold I had set, I stopped the rollout without hesitation. The cause turned out to be a layout break on older devices, and I was relieved that stopping had been the right call.

A ledger, so I never hand over ownership of what was generated

Finally, the step I value most in this end-to-end flow: recording one line in a ledger every time I ship.

The more AI Studio writes code from generation, the more the sense of "how much do I actually grasp of this app's current shape" fades. Generation is convenient, but leaning on it entirely leads to a moment where I cannot explain the insides of my own app. To prevent that, at every shipping milestone I record the version code, the git commit, the shipment note, and the track — with my own words attached.

#!/usr/bin/env bash
# ledger.sh — append one line to release_ledger.jsonl on every shipment
set -euo pipefail
 
VERSION_CODE="$1"   # the value printed by publish_internal.py
TRACK="$2"          # internal / closed / production
NOTE="$3"           # in my own words, "what this shipment is for"
 
SHA="$(git rev-parse --short HEAD)"
TS="$(TZ=Asia/Tokyo date '+%Y-%m-%dT%H:%M:%S%z')"
 
printf '{"versionCode":%s,"track":"%s","sha":"%s","ts":"%s","note":%s}\n' \
  "$VERSION_CODE" "$TRACK" "$SHA" "$TS" "$(printf '%s' "$NOTE" | python3 -c 'import json,sys; print(json.dumps(sys.stdin.read()))')" \
  >> release_ledger.jsonl
 
echo "recorded in ledger: code=$VERSION_CODE track=$TRACK sha=$SHA"

This ledger is both the input preflight.py reads to confirm monotonic increments and a diary that explains to me alone "when, what, and why I shipped." However much generated code piles up, the shipping decisions remain in my own words. That single line is my last anchor for keeping the generated work mine, as my own creation.

A convenient end-to-end flow becomes something you can truly use with peace of mind only once you have designed the places to stop. The next time I start a new app in AI Studio, I plan to set up this preflight.py and the ledger first, deciding "where I stop" before I generate. Thank you for reading.

Thank You for Reading

Antigravity Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.