Faster Substring Search Changes How You Should Let Agents Explore Code

The 6/26 update made substring search noticeably faster. Rather than treating it as a comfort improvement, here is how to redesign the way agents explore code, budget context, and verify their targets, with measurements from indie development.

Antigravity³⁰¹ code exploration agent design⁶ context management² search²

✦ Premium Article

The 6/26 update made Antigravity's substring search visibly faster. On a monorepo of roughly 20,000 files, a search that used to make me wait a few breaths now returns almost instantly.

But what mattered to me was not the speed itself. It was what came after. While search was slow, we quietly designed around it by having agents search as little as possible. Once search is cheap, that assumption deserves to be rebuilt. This article is about turning the substring-search speedup into an actual redesign of how agents explore code, measured against my own indie development work.

What we quietly gave up when search was slow

When you let an agent work in a codebase, exploration splits into two broad styles: raw substring search (grep-style) to pin down a lead, and semantic search that surfaces candidates by conceptual closeness.

Slow search makes each grep expensive. Designs then drift toward "search rarely, load a large context up front." I used to have the agent read entire likely directories before starting. That looks efficient, but it fills the context with irrelevant files and blurs the judgment that actually matters.

Fast substring search tips the scale back. A design closer to how humans actually navigate — search narrowly first, read only what you need — becomes practical again.

Measure the change before trusting the feel

Before talking about feel, I turn exploration cost into numbers. I use a thin wrapper that records searches, total tokens read, and round-trips per task.

#!/usr/bin/env bash
# agy-probe.sh — record per-task cost of an agent run
# usage: ./agy-probe.sh "investigate the AdMob consent flow, scoped to the consent module"
set -euo pipefail
 
TASK="$1"
LOG="probe_$(date +%Y%m%d_%H%M%S).jsonl"
 
agy run --json-events --prompt "$TASK" | while IFS= read -r line; do
  echo "$line" >> "$LOG"
done
 
python3 - "$LOG" << 'PY'
import json, sys
searches = reads = tokens = turns = 0
for ln in open(sys.argv[1]):
    ev = json.loads(ln)
    t = ev.get("type")
    if t == "tool_call" and ev.get("name") in ("search", "grep"):
        searches += 1
    if t == "file_read":
        reads += 1
        tokens += ev.get("tokens", 0)
    if t == "assistant_turn":
        turns += 1
print(f"searches={searches} file_reads={reads} read_tokens={tokens} turns={turns}")
PY

Running the same investigation task ten times each on pinned before/after versions produced this trend in my environment. These figures are one sample from my own monorepo.

Metric (avg of 10)	Old design (read broadly)	New design (search narrowly)
Search calls	3.2	11.4
Tokens read	~48,000	~16,500
Assistant round-trips	6.1	5.4
Edits to the wrong file	2	0

Search calls actually went up. That is not the point. The point is that tokens read dropped to a third while wrong-file edits went to zero. Because search got cheap, there was room to narrow down before reading.

✦

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN

✦A concrete rule for rebalancing grep-first versus semantic search now that substring search is fast

✦A measurement script that tracks exploration cost per task in tokens and round-trips

✦How to build a scoping step so agents don't trust the first search hit blindly

Secure payment via Stripe · Cancel anytime

✦

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

Unlock all articles with Membership →

Put grep first and treat semantic search as the finish

My current default, given the speedup, is to lead with grep-style substring search and use semantic search to finish. The reason is simple: the entry point to code exploration is usually a concrete identifier.

Function names, constants, error messages, API paths — these are matched fastest by exact string, not by conceptual nearness. Faster substring search pays off exactly here. Semantic search earns its keep when you are chasing "a concept I can't remember where I wrote." The two are not rivals; it is a question of order.

Writing the exploration policy down for the agent keeps it stable.

## Default exploration steps (excerpt from AGENTS.md)
1. Hit known identifiers (function/constant/error text/path) with substring search first
2. Of the hits, read only the definition and the call sites (never the whole neighborhood)
3. For a concept you can't name, list up to 5 semantic candidates, then confirm with grep
4. Read at most 20,000 tokens per task. If you'd exceed it, split the scope with search

That per-task token ceiling matters more than it looks. Left alone, an agent reads broadly for reassurance. When search is cheap, it produces better results to make it "search rather than read."

Build a scaffold so agents don't trust the first hit

Fast search has a trap. As hits multiply, the agent is quicker to leap at the first match. While working on AdMob consent, I once had a constant with the same name in two places, old and new, and nearly had the old one edited.

To prevent this, I insert a scoping step right after the search. When hits exceed expectations, I have the agent produce only the count and locations first, declare its edit target, and then proceed.

# helper to force a "what will you touch" declaration first
# fold grep hits by file and show the busiest first
agy search --pattern "CONSENT_STATUS" --count-by-file \
  | sort -t: -k2 -nr \
  | head -20
# → tell the agent: "pick 1-2 edit targets from this list, state your reason, then start"

Just inserting a human-readable summary visibly reduced beelines to the wrong candidate. Precisely because search is fast, you need to separate "hitting fast and often" from "narrowing correctly" in the design.

Wiring it into scheduled, unattended tasks

I maintain several apps and four blogs through nightly scheduled runs. In unattended operation, exploration efficiency turns directly into token cost.

When embedding the exploration policy into scheduled tasks, set an even tighter ceiling than in interactive use. With no one watching, an agent's "just in case" broad reads are what inflate cost the most.

# example exploration policy for a scheduled task
search_policy:
  substring_first: true           # known identifiers via substring search first
  max_read_tokens_per_task: 12000 # lower than interactive for unattended runs
  semantic_fallback_limit: 5      # cap semantic candidates at 5
  declare_targets_before_edit: true

After this change, nightly token consumption dropped by roughly a third by feel, and it became easier to trace "why did it read this file" in the morning logs. In unattended operation, "don't read needlessly" beats raw speed.

At what scale does this pay off

Honestly, on a few-hundred-file repo the benefit of this rework is small. You can read the whole thing without breaking. The effect becomes clear only past a few thousand files, in the territory where "you can't read all of it."

In my case, the code for a wallpaper app or a healing app is small on its own, but shared libraries and automation scripts swell the exploration surface. There, the speed of substring search converts directly into how much reading you can cut.

For a small project, adding just the measurement wrapper and introducing the exploration policy once a token-heavy task appears is enough. Questioning your design assumptions when the tools change is, in itself, an investment in operating something for the long haul.

If you are wrestling with a codebase that has grown the same way, I hope this gives you a reason to rework your steps.

Thank You for Reading

Antigravity Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.