⚙ AI Tools/2026-06-14Advanced

Pairing a Local LLM With Antigravity to Keep Sensitive Code Off the Cloud

Should you really let a cloud agent read code that holds your billing keys and revenue logic? For indie developers that worry is concrete. Here I pair Ollama and Gemma as a local LLM with Antigravity, routing sensitive parts to local and general parts to the cloud, with the decision rules and measurements.

Antigravity²²⁷ Ollama¹⁵ Gemma³ local LLM¹⁴ security¹¹ routing design ai-tools¹³

✦ Premium Article

Should billing logic really go to the cloud?

Antigravity's agents are convenient, but when I switched to letting them read the whole codebase, my hand stopped. In there sit the Stripe billing flow, in-app purchase receipt verification, and server logic tied directly to revenue. Is it fine to hand all of that to a cloud agent wholesale? As an indie developer shipping paid apps to the App Store and Google Play at Dolice, this wasn't abstract security theory for me — it was a daily, practical call.

Yet processing everything locally isn't realistic either. A local LLM has the comfort of staying on your own machine, but for large design decisions and codebase-wide investigation, the cloud agent is overwhelmingly faster and deeper. So it isn't a binary; you need a design that routes sensitive parts to local and general parts to the cloud. Here I build that routing assuming Ollama and Gemma run alongside Antigravity.

First, set up Ollama + Gemma locally

The local side runs models through Ollama. The Gemma family balances code comprehension and summarization well, and runs at practical speed even on a personal machine.

# prepare local models with Ollama
ollama pull gemma:7b          # speed-first; for everyday code reading
ollama pull gemma:27b         # accuracy-first; when you want design review
 
# confirm it's listening as a local API
curl http://localhost:11434/api/generate -d '{
  "model": "gemma:7b",
  "prompt": "Summarize the role of the following function in one line",
  "stream": false
}'

Splitting 7b and 27b by purpose is practical. Everyday code understanding is plenty fast on 7b, and you bump up to 27b only when accuracy matters, like design judgment or review. Negotiating with your machine's memory, I recommend making 7b the default first.

✦

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN

✦A sensitivity-scoring implementation that decides per file whether to process locally or send to the cloud

✦A comparison of Ollama + Gemma local responses versus the cloud agent, with measured latency ballparks

✦An audit-log design that lets you verify afterward that no sensitive file ever went to the cloud

Secure payment via Stripe · Cancel anytime

✦

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

Unlock all articles with Membership →

Score file sensitivity automatically

The core of routing is mechanically deciding "should this file be processed locally, or is it fine to send to the cloud?" Leave it to per-moment human judgment and it will leak on a busy day. I derive a sensitivity score from the file path and in-content signals.

# sensitivity_score.py — score a file's sensitivity and pick a destination
import re
 
SECRET_PATTERNS = [
    r"sk_live_", r"STRIPE_SECRET", r"priceId", r"webhook.*secret",
    r"receipt.*verif", r"private[_-]?key", r"\.env",
]
SENSITIVE_PATH = ["/billing/", "/payments/", "/auth/", "/secrets/", "config/pricing"]
 
def score(path, content):
    s = 0
    for p in SENSITIVE_PATH:
        if p in path:
            s += 3
    for pat in SECRET_PATTERNS:
        if re.search(pat, content, re.IGNORECASE):
            s += 4
    return s
 
def route(path, content):
    sc = score(path, content)
    # at or above threshold: pin to local. Calibrate the boundary to your risk
    return "local" if sc >= 3 else "cloud-ok"
 
print(route("src/config/pricing.ts", "export const priceId = 'price_x'"))  # local
print(route("src/utils/format.ts", "export const toYen = n => n"))          # cloud-ok

Don't chase perfection on the threshold from day one; set it conservatively first (anything remotely suspicious goes local) and lower it as you operate. The failure you don't want in production is "sending something to the cloud without realizing it was sensitive," so you tolerate erring toward local. It's a deny-by-default mindset.

Measure the local-vs-cloud feel ahead of time

If you're going to design routing, you should know how long pinning to local makes you wait. Route to local without measuring the difference and you reach the worst ending: "safe but too slow, so I stopped using it."

On my own setup, the latency ballpark for the same "summarize a single function" looked like this.

Latency ballpark (single-file summary)

Gemma 7b (local): a few seconds. No friction for everyday use.
Gemma 27b (local): a dozen-ish seconds. Acceptable for review work.
Cloud agent: feels fast, but varies with network and queueing.

The important point: for a "summary" of a sensitive file or an explanation of a single function, local 7b is plenty practical. Where local doesn't fit is codebase-wide investigation and multi-file refactors. I settled on leaving those to the cloud — but excluding sensitive files in advance.

Verify afterward that nothing sensitive left

A design only matters if you can audit whether it actually worked. I always log "which file went where," and periodically confirm that no file judged sensitive ever appears in the cloud-side records.

# routing_audit.py — record routing in JST so leaks become verifiable
import json, datetime, pathlib
 
AUDIT = pathlib.Path.home() / "routing_audit.jsonl"
 
def log_route(path, destination, sensitivity):
    jst = datetime.timezone(datetime.timedelta(hours=9))
    rec = {
        "ts": datetime.datetime.now(jst).isoformat(),
        "path": path, "dest": destination, "score": sensitivity,
    }
    with AUDIT.open("a") as f:
        f.write(json.dumps(rec) + "\n")
 
def verify_no_leak():
    # check there's no record of sensitive(score>=3) going to cloud
    leaks = []
    for line in AUDIT.read_text().splitlines():
        r = json.loads(line)
        if r["score"] >= 3 and r["dest"].startswith("cloud"):
            leaks.append(r["path"])
    return leaks
 
print("leaks:", verify_no_leak())  # should be empty

A day where verify_no_leak() isn't empty means there was a hole in the scoring threshold or the path patterns. In that case you add the pattern for that file so the same leak never happens again. A gotcha: record the audit-log timestamp in JST too, or cross-day verification will miss entries. In production, this audit log becomes the very basis for asserting "it's safe."

Drop the binary, and you move fast with peace of mind

Local-LLM discussions tend to collapse into the extreme of "the cloud is dangerous, so everything goes local," but what actually works is routing design. Score sensitivity per file, pin only the risky ones to local 7b/27b, and enjoy the cloud agent's speed for the rest — then keep the destination verifiable with an audit log.

Since adopting this design, my hand no longer freezes in front of billing logic. Because the machine draws the safe line, the human gets to confidently pick the fast path. If you've been hesitant to hand sensitive code to a cloud agent, I hope this helps.

Thank You for Reading

Antigravity Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.