After Gemini 3.5 Flash Became the Default, Route Flash and Pro Per Task

Now that Antigravity's default Flash is Gemini 3.5 Flash, leaving everything on Flash wastes accuracy and forcing everything onto Pro wastes time. Here is a two-axis decision table for splitting work between Flash and Pro, plus the routing setup to wire it into your agents.

Gemini 3.5 Flash⁵ Antigravity³⁴⁷ model selection routing² cost management⁵ indie development¹⁶ latency² quality evaluation

✦ Premium Article

On the day the default Flash model switched to Gemini 3.5 Flash, several of my routine tasks got visibly faster without me touching a single setting. At the same time, when I handed off one tangled refactor, a change that used to land on the first try now took two round trips.

The part that got faster and the part that got sloppier are two sides of the same model change.

There are two stances you can take here. One is to shrug and say "it's fast, so put everything on Flash." The other is "I can't afford a quality drop, so I'll just go back to Pro." I tried both, once each, and both wasted something. The first added rework whenever accuracy mattered; the second paid heavy-model latency even for trivial replacements.

The landing spot was to split the work per task. The problem is making "how to split" a reproducible rule rather than a gut feeling.

Speed and correctness are different axes

When we compare models, we tend to reach for a single ruler: which one is smarter? But in real development, "does this task actually need that smartness?" matters more than the smartness itself.

I learned to look at tasks along two axes.

The first is the weight of the decision: how large is the rework if it's wrong? A bulk rename of a variable is easy to spot and fix when it goes wrong. A change to how state is managed propagates downstream, and you notice late.

The second is the breadth of context: how much surrounding code must be held in view to decide? A task that closes within one file and a task that requires the dependency graph of several files demand entirely different fields of view.

Lay tasks out on these two axes and the territory where Flash shines separates cleanly from the territory you want Pro to own.

The routing decision table

Here is the routing I actually run, organized as a table. Breadth of context runs across; weight of the decision runs down.

Light decision x narrow context -> Flash. Formatting, rename replacements, comment additions, boilerplate test scaffolding. Speed translates directly into how the work feels.
Light decision x broad context -> Flash. Cross-file string replacements, import cleanup: simple judgment but scattered targets. You need the field of view, but mistakes are safe.
Heavy decision x narrow context -> Pro. Even within one file, the boundaries of async work or the shape of error handling break quietly when wrong. The scope is narrow, but saving here costs you later.
Heavy decision x broad context -> Pro. Architecture changes, reshaping data flow, reconciling several modules. Going to Pro once is often faster than two round trips on Flash.

The nice thing about this table is that when you hesitate, you only ask two questions: "does it hurt if this is wrong?" and "how widely do I need to see?" You don't have to memorize the models' internal capabilities.

In my case, counting daily tasks, about 70% land on the Flash side and about 30% on Pro. But measured by share of development time, it flips: the Pro-side tasks take up more than half of how the day feels. Heavy work is fewer in count yet costs more per item. That asymmetry is an important intuition when you reason about the payoff of routing.

✦

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN

✦A decision table that sorts tasks by 'weight of the decision x breadth of context' and routes them to Flash or Pro mechanically

✦A drop-in subagent definition with pinned models that you can paste straight into your Antigravity workflow config

✦A complete, lightweight diff script that checks in one command whether quality dropped after a model swap

Secure payment via Stripe · Cancel anytime

✦

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

Unlock all articles with Membership →

Wiring the routing into config

Keeping the table in your head helps, but deciding by hand every time doesn't last. Pinning the model in Antigravity's subagent definitions and splitting the destination at the task's entrance reduces decision drift.

The idea is to declare, per kind of work, which model it uses.

# .antigravity/agents.yaml
# Pin a model per task type and route at the entrance
agents:
  quick-edit:
    model: gemini-3.5-flash
    description: Formatting, renames, comments — safe even when wrong
    temperature: 0.1
    max_context_files: 8
 
  bulk-refactor:
    model: gemini-3.5-flash
    description: Mechanical cross-file replacements — wide view, simple judgment
    temperature: 0.1
    max_context_files: 40
 
  design-change:
    model: gemini-3.5-pro
    description: Async boundaries, error design, state — mistakes propagate downstream
    temperature: 0.2
    max_context_files: 24
 
  architecture:
    model: gemini-3.5-pro
    description: Data-flow redesign, module reconciliation — heavy work to land in one pass
    temperature: 0.3
    max_context_files: 80

Holding temperature low on the Flash side is deliberate. The lighter the task, the more value there is in "the same result every time you run it." Being offered a different layout on every format pass only makes diffs harder to read. Nudging the architecture side up, by contrast, helps because at design time, several alternatives feed the discussion.

Varying max_context_files per type also curbs wasted reading. There's no point feeding 80 files to quick-edit, and showing only 8 to architecture strips it of the very reconciliation it exists for. Matching breadth to type keeps Flash-side tasks from slowing down under context they don't need.

Entrance routing works well enough even when decided mechanically from how the task is phrased.

# route.py — a minimal router from task description to destination agent
import re
 
# Words that suggest a "heavy decision" — lean toward Pro
HEAVY_SIGNALS = (
    "design", "architecture", "state", "async", "error handling",
    "data flow", "refactor", "reconcile", "migrat", "concurren",
)
 
# Words that suggest "broad context"
WIDE_SIGNALS = ("multiple files", "across", "cross-cutting", "dependencies", "between modules")
 
 
def route(task: str) -> str:
    t = task.lower()
    heavy = any(s in t for s in HEAVY_SIGNALS)
    wide = any(s in t for s in WIDE_SIGNALS)
 
    if heavy and wide:
        return "architecture"
    if heavy:
        return "design-change"
    if wide:
        return "bulk-refactor"
    return "quick-edit"
 
 
if __name__ == "__main__":
    samples = [
        "rename the variable userId to accountId across the file",
        "redesign the error handling in the auth flow",
        "clean up imports that span multiple files",
        "redesign state to reconcile between modules",
    ]
    for s in samples:
        print(f"{route(s):16} <- {s}")

Even a router this plain sorts the large majority of daily tasks correctly. You only hand-override the borderline cases where the word match is ambiguous; there's no need to chase perfection. Keeping the rules simple actually means you can explain why a task ended up where it did — which is reassuring.

Checking that a swap didn't drop quality

When the default model changes, tasks that ran fine on the old Flash can quietly lose accuracy. Distracted by the speed, this regression is easy to miss.

So I keep a handful of representative tasks as fixed inputs and, when the model changes, lay the outputs side by side to see how much they moved. Rather than auto-deciding pass or fail, it's a light comparison for a human to eyeball the diff and judge.

# compare.py — feed the same input to two models and lay out the diff
import subprocess
import difflib
import sys
 
# A few representative tasks you run often; pick to match your project
CASES = [
    "Add type annotations to this function",
    "Extract this component's props into an interface",
    "Reorganize this error handling into early returns",
]
 
 
def run(model: str, prompt: str, fixture: str) -> str:
    # Assumes calling the antigravity CLI non-interactively; adapt to your env
    result = subprocess.run(
        ["antigravity", "run", "--model", model, "--input", fixture, "--prompt", prompt],
        capture_output=True, text=True, timeout=120,
    )
    return result.stdout
 
 
def main(fixture: str):
    for prompt in CASES:
        a = run("gemini-3.5-flash", prompt, fixture)
        b = run("gemini-3.5-pro", prompt, fixture)
        diff = list(difflib.unified_diff(
            a.splitlines(), b.splitlines(),
            fromfile="flash", tofile="pro", lineterm="",
        ))
        changed = len(diff)
        mark = "BIG" if changed > 40 else "small"
        print(f"[{mark}/{changed} lines] {prompt}")
        if changed > 40:
            print("\n".join(diff[:30]))
            print("...")
 
 
if __name__ == "__main__":
    main(sys.argv[1] if len(sys.argv) > 1 else "fixtures/sample.ts")

This script does not emit a verdict. It emits only "which tasks did Flash and Pro disagree on the most." Tasks with a small diff are safe to leave on Flash; tasks with a large diff become candidates to revisit in the table and move to the Pro side.

I run it once, when I see an announcement that the default model was updated. It is not a daily job. It is an inspection tool for the moment a model changes, to confirm that my routing is still reasonable. When 3.5 Flash arrived, type-annotation tasks showed a small diff and stayed on Flash, while error-handling cleanups showed a large diff and became the reason to move them to design-change.

Where to start

Trying to assign the optimal model to every task makes the config complex and unsustainable. As an indie developer watching my own time, what I'd suggest is to pin only the "heavy decision x broad context" quadrant to Pro at first, and leave the rest on the default Flash.

That single move keeps you from dropping the tasks where rework hurts, while running the bulk of your day on the fast model. Refining the routing is enough if you move one quadrant at a time, later, whenever you feel "I dropped something here" in practice.

That models got faster is a change to welcome. The preparation that keeps you calm when the default model shifts is deciding for yourself where you accept the speed and where you refuse to trade it for quality.

Thank You for Reading

Antigravity Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.