Vetting AI Studio's Native Android Code Before It Reaches Your Live App
AI Studio's native Android vibe coding produces working screens at startling speed. But before it goes into a live app, it needs its own vetting. Here is a pre-merge review design for generated Kotlin.
The first time I tried AI Studio's native Android vibe coding, a single prompt stood up an entire settings screen and I caught my breath. Layout and navigation both worked. Then I went to drop that generated code straight into an app that had been running for years, and my hand stopped. Working in a fresh project and behaving correctly as part of a live app are two different things.
The app I maintain as an indie developer carries conventions that years of operation have settled — things you cannot decide from a screen alone. Generated code knows none of that context. So here I will design the vetting that AI Studio's Kotlin passes through before it enters a live app, split into what a machine filters and what a human reviews.
Why "works in a fresh project" is not "safe in production"
Vibe-coded output is correct in isolation. The question is whether it meshes with an existing app's assumptions. The generator does not know your established dependency-injection style, how you share state across screens, your custom Activity base class, or the threading contract the whole app honors. None of that is visible in a screenshot, so generated code tends to be written in a way that works in the moment but causes incidents in the app's context.
The three areas that break quietly in production
Three areas came up again and again in pre-merge review.
Area
What generated code tends to do
What happens in production
Lifecycle
Holds state without accounting for Activity recreation
State vanishes on rotation or resume
Memory leaks
Passes Context or a View to a long-lived object
Memory climbs as you move between screens
Threading
Calls I/O on the main thread
ANRs and jank on slow devices
None of these surface in a short emulator session. That is exactly why you need a machine layer before relying on human eyes.
✦
Thank you for reading this far.
Continue Reading
What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.
WHAT YOU'LL LEARN
✦How to spot the three areas where generated Kotlin quietly breaks a live app: lifecycle, leaks, threading
✦A pre-merge gate that filters by machine before a human looks (Detekt profile plus a diff-only run)
✦A staged rollout that lands 5,000 generated lines one feature at a time, with the criteria I actually used
Secure payment via Stripe · Cancel anytime
✦
Unlock This Article
Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.
The first defense is static analysis. But running Detekt over the whole app buries the generated-code issues under existing warnings. So apply a stricter ruleset to only the files in this import.
Narrow the run to changed files with a shell wrapper.
#!/usr/bin/env bashset -euo pipefail# only the .kt files changed on the import branchCHANGED=$(git diff --name-only origin/main...HEAD -- '*.kt')if [ -z "$CHANGED" ]; then echo "nothing to check"; exit 0; fi# run the generated-only profile against changed files onlyecho "$CHANGED" | xargs detekt \ --config detekt-generated.yml \ --fail-on-issues \ --report txt:build/detekt-generated.txtecho "✅ diff-only static analysis passed"
The three-dot origin/main...HEAD gives you only what changed since the branch point. Narrowing to the diff rather than scanning everything is the key; it alone surfaces almost all the noise specific to generated code.
Lifecycle and leaks need more than a machine
Static analysis catches shape-fixed problems like main-thread I/O, but it cannot fully catch the design question of "this way of holding state will not survive recreation." That part assumes a human reviewer, so fix the lens they look through.
The four points I always check:
Does state that must survive rotation live somewhere recreation-proof (a ViewModel, etc.)?
Is anything I pass a Context to shorter-lived than the screen?
Is each coroutine's scope tied to the screen's lifetime and reliably canceled on exit?
Does it bypass the existing base class or shared navigation with its own implementation?
Fixing the lens to four points turned generated-code review from "read it and see" into "knock down these four in order," and misses dropped.
Do not land 5,000 lines at once
The thing that helped most was not technique but how I imported. Vibe coding generates groups of screens, but landing them as one big change makes both review and rollback heavy all at once.
1. Split the output along feature boundaries (settings, list, detail...)
2. Branch one feature at a time and pass the pre-merge gate
3. Pass the human four-point review
4. Ship one feature to production; watch crash rate and memory for 2-3 days
5. If nothing's wrong, move to the next feature
In my case I landed the generated settings screen as the first feature, confirmed the crash rate was no different from usual, and only then moved on. Staging it means that if a problem appears, the cause is confined to one feature, so isolation is fast. Land it all at once and destabilize the whole thing, and just figuring out which generated piece is at fault can melt days.
Where to delegate and where a human takes over
Finally, a decision table. Vibe coding is fast, but it blurs where responsibility sits, so drawing the line up front saves hesitation.
Step
Owner
Why
Screen scaffolding
AI Studio
The speed benefit is largest here
Static analysis / diff gate
Machine
Shape-fixed incidents are caught reliably by a machine
Lifecycle / design review
Human
Only a human holds the app-specific context
Go/no-go for production
Human
Reads the observed data and owns the final call
Get pulled along by the speed of generation and hand even those last two rows to a machine, and a live app is hard to walk back. Use generation freely; lock down the vetting with a static gate, the four points, and staged rollout. That combination is the approach I actually use in my own indie development now. I hope it helps steady your footing if you want to bring this new generation experience into production.
Share
Thank You for Reading
Antigravity Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.