A Week of Coding Hands-Free with Antigravity 2.0's Live Voice Transcription

There are evenings when I sit with my hands on the keyboard, staring at the ceiling. The request I want to hand to the agent is already clear in my head, but turning it into text feels like a chore, so I stall. I started using the live transcription added in Antigravity 2.0 hoping to shrink exactly that gap between having a thought and typing it out.

After mixing it into a week of real solo development work, it turned out useful somewhere quite different from where I expected. My honest takeaway: it isn't a tool for writing code by voice. It's a tool for pouring your intent into the agent quickly.

What actually changed

Until now, developing by voice meant standing up a separate dictation tool such as Aqua Voice — Setup and Workflow for Voice-Only Development or Typeless — The AI Voice Dictation App That Pairs Perfectly with Any AI Tool, then pasting the transcribed text into Antigravity's chat box. A two-step shuffle.

What 2.0 changes is that this transcription, powered by a Gemini Audio model, now lives inside the editor. Toggle the mic and what you say flows straight into the instruction field for the agent. No window-hopping between apps, no round trip through the clipboard. It looks like a small change, but whether you can issue an instruction without breaking your train of thought hinges on exactly that one removed step.

It is not for dictating code

For the first two days I got greedy and tried to speak the contents of functions. Saying "if the user is nil, return early" transcribes accurately enough, but turning that into code is the agent's job, not the granularity I should be voicing. Spelling out brackets, symbol matching, and indentation by voice is plainly slower than typing.

Things clicked once I raised the granularity by one level. "This payment handler — I want a retry, exponential backoff, max three attempts, only retry on 429, throw everything else as-is." Speaking the whole approach and handing it to the agent. At this level, voice is faster than the keyboard, and it carries the context in my head across without dropping pieces. Once I stopped thinking of transcription as "a stand-in typist" and started treating it as "a mouth for intent," it suddenly fit.

Accuracy with technical terms and mixed languages

As a developer working in Japanese, the thing I worried about most was accuracy on Japanese sentences laced with technical terms. The short version: terms that have settled into loanwords — refactoring, deploy, migration — are basically fine. Trouble shows up with proper nouns read in English.

Library and command names pronounced in English tend to wobble. Say useEffect out loud and it can split into use effect or morph into something else entirely. My fix is blunt: I don't speak proper nouns at all, and patch them in by keyboard afterward. Pour the approach in by voice, fix only the spelling of proper nouns with my fingers. After splitting the work this way, redo's on the transcription nearly disappeared. Cut down the typing, but keep in your hands the parts that demand precision — not handing everything to voice, but dividing the roles, is what an indie developer's week of trial pointed me to as the fastest path.

Numbers come with one caveat. Short figures like "three times" or "429" are stable, but long digits and version numbers — anything dot-separated like 2.0.3 — get misread. When a version belongs in the instruction, I say only "the latest" out loud and attach the exact number in text.

What stuck after a week

As an indie developer I run everything from design to implementation to post-release operations alone, so any time shaved off input is welcome. Three situations turned into near-daily use.

First, the opening request to an agent. I can speak the background and constraints in one breath, so the preamble I used to type as bullet points now takes a twenty- or thirty-second utterance. Second, review comments on generated code. Looking at the code on screen and replying out loud — "this error handling is swallowing the exception, rethrow it upward" — is faster precisely because I never move my eyes. Third, notes to myself. I transcribe something I want to look into later without interrupting the task at hand.

Conversely, for commit messages or fine-grained naming where character-level precision matters, I don't use it. I just go back to the keyboard.

Remember there's a human behind the mic

One operational note to close. Voice transcription speeds up input, but I wouldn't change the habit of reviewing the instruction that actually reaches the agent. Spoken instructions carry more momentum than typed ones, and it's easy to skip the check. For requests that involve deleting files or sending data outward in particular, I read the transcribed text once with my own eyes before running it. Having operated scheduled agents, I've learned the hard way that accidents during "hours when no one is watching" — the kind in When a Scheduled Agent Runs Twice — Designing for Idempotency Against Overlap and Retry — are the scariest, so I hold the line of running every entry-point instruction past human eyes.

If you were hoping for development that completes entirely by voice, this may still feel short of it. But seen as "a tool for shortening the distance from thinking to typing," Antigravity 2.0's live transcription is well into usable territory. Try voicing just the opening request once this weekend — you'll probably notice the same gap I did between where you assumed it would help and where it actually does.