Generating Multilingual Release Notes with the Managed Antigravity Agent via the Gemini API

A hands-on record of building a pipeline that turns git commit logs into multilingual App Store and Google Play release notes using the Managed Antigravity Agent, now in public preview through the Gemini API.

antigravity³⁵⁶ agents⁹¹ gemini-api⁷ automation⁴⁴ app-dev³²

✦ Premium Article

When you ship four apps in parallel to the App Store and Google Play, the quietly time-consuming part of every release is localizing the release notes. When I rolled out v2.1.0 in a staged release recently, rewriting the English draft into each language, fitting each store's character limit, and keeping the tone consistent ate up the better part of an hour. The work around the code often feels longer than the code itself.

On June 15, antigravity-preview-05-2026 (the Managed Antigravity Agent) entered public preview through the Gemini API, so I built a small pipeline to hand this tedious step to an agent. Here is what differs from an ordinary generate_content call, and where I got stuck.

Why a Managed Agent instead of a one-shot generation call

At first I assumed I could just pass the commit log to generate_content and have it write the notes in each language. In practice the prose was fine, but Google Play's "What's new" field caps at roughly 500 characters, and whenever the output ran over, the trimming came right back to me. I ended up rewriting the prompt and regenerating every time something exceeded the limit.

What makes a Managed Agent fundamentally different from one-shot generation is that it can plan, reason, run code, touch files, and browse the web autonomously inside a sandbox. My task has several stages — classify commits, translate to each language, count characters and fit the limit — and the part that really paid off was letting the agent count the characters itself and trim when it went over. The job of watching the limit and bouncing things back moves cleanly onto the agent.

The way I think about dividing labor between "one-shot generation" and "an agent" is a continuation of what I laid out in the cloud-vs-local boundary of the Managed Agents API. Treat this article as its implementation companion.

The overall flow

The pipeline I built has four stages. Only the actual submission to the stores happens after I review everything by hand. I keep the agent's autonomy to text generation and press the publish button myself.

Extract a structured git log on the caller side
Hand the agent the commit summary plus constraints, and let it plan, classify, generate per-language, and validate length
Receive JSON back from the agent
Re-validate that JSON on the caller side and bounce anything that breaks a limit

✦

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN

✦Solve the 'translate while staying under the character limit' step that a single generate_content call kept choking on, using the agent's own self-checking loop

✦You'll get concrete code that calls the Managed Agent from the google-genai SDK and uses Function Calling to validate length, ready to drop into your own release flow

✦Sidestep the three traps — sandbox file access, over-translation, and cost — and cut a manual 50-minute chore down to about 6 minutes

Secure payment via Stripe · Cancel anytime

✦

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

Unlock all articles with Membership →

Step 1: Extract the commit log in a structured form

The first thing to watch for is that your local git repository is not directly visible from the agent's sandbox. I missed this at first and told it to "look at the repository," and the agent peered into an empty working directory and got confused. The right move is to extract the log on the caller side and pass it as text.

import subprocess
import json
 
def extract_commits(since_tag: str) -> list[dict]:
    """Return commits since the latest tag, classified Conventional-Commits style."""
    # %s=subject, %b=body, with field/record separators
    fmt = "%s%x1f%b%x1e"
    raw = subprocess.run(
        ["git", "log", f"{since_tag}..HEAD", f"--pretty=format:{fmt}"],
        capture_output=True, text=True, check=True,
    ).stdout
 
    commits = []
    for entry in raw.split("\x1e"):
        entry = entry.strip()
        if not entry:
            continue
        subject, _, body = entry.partition("\x1f")
        # Pick up the feat: / fix: / perf: prefix as the type
        ctype = "other"
        if ":" in subject:
            head = subject.split(":", 1)[0].strip().lower()
            if head in {"feat", "fix", "perf", "refactor", "chore", "docs"}:
                ctype = head
        commits.append({"type": ctype, "subject": subject, "body": body.strip()})
    return commits
 
 
if __name__ == "__main__":
    items = extract_commits("v2.0.0")
    # Users only care about feat / fix / perf; drop chore/docs
    visible = [c for c in items if c["type"] in {"feat", "fix", "perf"}]
    print(json.dumps(visible, ensure_ascii=False, indent=2))

Dropping chore and docs here matters. Filtering mechanically on the caller side is more reliable than asking the agent in prose to "exclude non-user-facing changes," and it saves tokens. Leaving the agent only the work that genuinely needs judgment also makes the quality more stable.

Step 2: Call the Managed Agent from the Gemini API

You call it from the google-genai SDK by specifying the model name antigravity-preview-05-2026. The basics of the SDK are covered in the google-genai SDK Python quickstart, so here I'll show only the delta.

import os
import json
from google import genai
from google.genai import types
 
client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])
 
# Practical character limits per store (rough single-byte equivalents)
STORE_LIMITS = {
    "app_store": 4000,   # "What's New in This Version"
    "google_play": 500,  # "What's new"
}
TARGET_LOCALES = ["ja", "en", "zh-Hans", "ko", "es"]
 
SYSTEM = """You are an editor who writes mobile app release notes.
From the given commit summary, write short, user-facing release notes in each language.
- Describe new features, bug fixes, and performance improvements in user terms, not developer jargon
- Do not translate app names, version numbers, or proper nouns (keep them as-is)
- Always respect each store's character limit; trim information if you are about to exceed it
Follow the specified JSON schema strictly."""
 
def build_prompt(commits, store, locales, limit):
    return (
        f"# Target store: {store} (limit {limit} chars)\n"
        f"# Target locales: {', '.join(locales)}\n\n"
        f"# Commit summary\n{json.dumps(commits, ensure_ascii=False)}\n\n"
        "Write release notes for each language and verify, by counting yourself, that each stays within the limit."
    )

A Managed Agent can verify deterministic work like "count the characters" by running code in the sandbox, rather than relying on a language model's sense of length. That is the single biggest difference from one-shot generation. If you hand it length validation as a Function Call, the agent loops generate → measure → (compress if over) on its own.

def validate_length(text: str, limit: int) -> dict:
    """The validation tool the agent calls. Returns the overflow amount."""
    n = len(text)
    return {"length": n, "limit": limit, "over_by": max(0, n - limit)}
 
length_tool = types.Tool(function_declarations=[
    types.FunctionDeclaration(
        name="validate_length",
        description="Verify whether one release note is within the character limit",
        parameters={
            "type": "object",
            "properties": {
                "text": {"type": "string"},
                "limit": {"type": "integer"},
            },
            "required": ["text", "limit"],
        },
    )
])
 
def generate_notes(commits, store):
    limit = STORE_LIMITS[store]
    prompt = build_prompt(commits, store, TARGET_LOCALES, limit)
    resp = client.models.generate_content(
        model="antigravity-preview-05-2026",
        contents=prompt,
        config=types.GenerateContentConfig(
            system_instruction=SYSTEM,
            tools=[length_tool],
            response_mime_type="application/json",
            temperature=0.4,
        ),
    )
    return json.loads(resp.text)

I lower temperature to 0.4. Release notes value consistency over creativity, and higher values made the tone drift across languages. Around 0.4 felt like the midpoint between blandness and stability.

Step 3: Re-validate the JSON on the caller side

Even when the agent says it stayed within the limit, I count again on the caller side. Once you put a preview-stage model into a production flow, trusting only the agent's self-report before pushing to a store is unnerving. The double check looks excessive, but it actually caught one case that ran three characters over Google Play's limit.

def verify(notes: dict, store: str) -> list[str]:
    """The caller's final guard. Returns a list of messages if anything is wrong."""
    limit = STORE_LIMITS[store]
    problems = []
    for locale, text in notes.items():
        if locale not in TARGET_LOCALES:
            problems.append(f"Unexpected locale: {locale}")
            continue
        if len(text) > limit:
            problems.append(f"{locale}: {len(text)} chars / limit {limit} ({len(text) - limit} over)")
        if not text.strip():
            problems.append(f"{locale}: empty note")
    return problems
 
 
for store in ("app_store", "google_play"):
    notes = generate_notes(visible, store)
    issues = verify(notes, store)
    if issues:
        print(f"WARNING - {store} needs fixes:")
        for m in issues:
            print(f"  - {m}")
    else:
        print(f"OK - {store}: all {len(notes)} locales within limit")

This two-layer stance — let the agent do the work, but always re-check the deterministically verifiable parts on the caller side — is, I believe, the baseline posture for putting autonomous execution into production. Trust the agent's judgment, but leave verification to the machine.

Three things that tripped me up

After actually running this across four apps, three snags showed up that the docs don't mention.

First, sandbox file access. As noted, the agent cannot see your local git repository. If you write "analyze the repository," the agent goes looking for files that don't exist. Extract whatever you need on the caller side and pass it all as text in contents.

Second, over-translation. With no guidance, the agent tried to translate even the app name ("Beautiful HD Wallpapers") and version numbers into each language. Spelling out "do not translate proper nouns" in the system instruction — and attaching a do-not-translate glossary if needed — settles it.

Third, cost and latency. I first crammed "4 apps x 5 languages x 2 stores" into a single job; responses grew long and one language went missing mid-run. Splitting the jobs per app and per store shortened each response and stabilized it. Because the engine is in the Flash family, splitting barely affects cost, and the smaller blast radius on a failed retry is the bigger win.

Before / After

Here is how the time spent on release notes changed.

Before: draft in English, hand-translate to each language, adjust character counts per store — about 50 minutes per release
After: extract the commit log, run the pipeline, do a final human review — about 6 minutes per release

Bigger than the numbers is the mental lightness of "no longer thinking about translation every release." In solo development, small frictions like this quietly drag down how often you ship. Each friction you remove gives back a little more time to return to the work you actually want to do.

How it relates to the CLI migration (June 18)

Because this pipeline hits the API directly from Python, it is unaffected by the June 18 Gemini CLI shutdown. That said, if your workflow is CLI-centric, the migration is a good moment to move automation like this onto antigravity CLI scheduled runs, consolidating your pre-release checks into one flow. What changes in the migration is covered in the dependency migration notes for the Gemini CLI shutdown.

The next step

Rather than running everything across all apps and languages at once, I'd suggest one app and two languages (Japanese and English) for a single pass first. Calling generate_notes for just one store and watching the verify output is enough to get a feel for how the agent handles the character limit. Once its behavior is predictable, add languages and apps gradually.

Building a preview-stage model into production still makes me cautious, to be honest. Even so, a usage pattern that keeps the final human check and delegates only the tedious steps has proven genuinely practical. Thank you for reading.

Thank You for Reading

Antigravity Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.