After asking an agent to write out a short narration or an ambient loop, I used to take the same detour every time. Export the file, hunt for it in the file manager, open it in a separate player, then come back to the conversation. Each step is small on its own, but when I want to line up several variants and compare them, that round trip quietly eats my focus.
The string of point releases Antigravity shipped in late June (the v2.2.1 line) folded in a small change that pays off daily: inline audio rendering in the conversation view. Audio an agent produces, or that you attach, can now be played without leaving the flow of the conversation.
What inline rendering actually changes
Until now, audio sat there as a linked artifact. Checking it meant stepping outside the conversation, and the moment you step out, you are cut off from the instruction you just gave the agent and from the reasoning behind the parameters you chose. With inline rendering, audio appears as a playable element inside the conversation itself. Instruction, output, and audition all sit in a single vertical thread — that is the biggest shift.
In my own testing, when I asked an agent to "write three takes of the same script with only the voice tone changed," the three takes stacked vertically in the conversation and I played them top to bottom to compare. Just removing the export-and-open step noticeably lowers the friction of A/B listening.
The bottleneck was auditioning, not generation
When we hand audio work to an agent, we tend to fixate on how fast and how well it can generate. But as an indie developer shipping apps that carry their own audio, what actually slows me down is the auditioning step, not generation. Producing ten takes is instant; listening to ten takes, judging them, and deciding which one to ship is human ear-work, and that does not get faster.
That is exactly why trimming the incidental steps around listening — export, file hunting, app switching — matters more than it looks. Doubling generation speed does not speed up judgment, but removing the audition round trip shortens the listen-reject-redo loop directly. The thing agent workflows genuinely shorten, I think, is the time it takes to reject a candidate.
Using the conversation itself as an audition log
Because playback is inline now, you can run the conversation view as an audition log. I keep two small rules.
First, when I have an agent produce candidates, I make it attach a meaningful label to each filename. Something like narration_v3_warm_slow, so the name tells me what each version changed. Later, scrolling back through the conversation, I can trace which sound matched which intent from both the ear and the name.
Second, I leave a one-line text note on whether I kept or dropped each take. Just writing "Keeping v3. v2 ends too stiff" turns the conversation into a record of the decision. The next time I rebuild audio for the same app, the old rejection reasons are right there to reuse. For navigating long threads, pairing this with the search approach I described in Tracing What a Long Agent Run Actually Did: Review That Starts From In-Conversation Search makes the audition log even easier to pull from.
Where I draw the line on inline playback
The more convenient it gets, the more I want a clear boundary. I treat in-conversation playback strictly as a way to confirm direction quickly. Final loudness, export format, and how a clip actually sounds on a device are not things browser or editor playback can settle.
When I work on loops or notification sounds, I narrow the candidates inline and then always drop them into the real app and play them on the device. A sound that works in a quiet room often reads differently outdoors, played softly through a phone speaker. Inline playback is the first round; on-device checking is the final round — that two-stage setup cuts steps without lowering quality. For building voice agents at production quality, I go deeper in Building Voice AI Apps with ElevenLabs and Antigravity — A Practical Development Guide.
Small point releases rarely make headlines, and they are easy to skim past. But a change like inline audio rendering — one that removes a single step from a tool you touch every day — adds up and genuinely shifts the rhythm of the work. Next time you ask an agent for audio, stop before you export and open it, and just listen right there in the conversation.