Turning Faster Substring Search into Solid Grounding for Agents in Large Repos
Antigravity's substring search got faster. Rather than stopping at perceived speed, here is how to wire it into a search design that hands agents exactly the right context in a huge codebase, with concrete steps and pitfalls.
In a repo with tens of thousands of files, have you ever asked an agent to "fix the callers of this function" and watched it edit the wrong place entirely? Most of the time the cause is that the agent could not narrow down where to look and pulled unrelated code into context.
A recent Antigravity point release made substring search faster. It would be a waste to file that under "less waiting, nicer experience." Fast search also means you can narrow the context you assemble for the agent, cheaply, as many times as you like. Speed converts directly into grounding accuracy.
As an indie developer, I carry a monorepo that grew large under Dolice, and I have hit the wall many times where letting the agent read the whole thing increases off-target suggestions. Once I started using search as a context-design tool, the precision of fixes changed noticeably. Here is the thinking.
Why agents miss in large repositories
When accuracy drops in a huge codebase, the cause is less about capability and more about how context is supplied. Looking back at my own failures, nearly 80% of the off-target edits were triggered by giving too much context.
Hand over loosely related files and the agent treats them as clues, editing places that look similar but are not. Conversely, leave out a needed file and it fills the gap by imagination, writing code that papers over the hole. Too much or too little, accuracy falls either way. The target is a single bundle that is neither.
This is where fast search earns its keep. If you can narrow cheaply and repeatedly, you do not need to land the perfect query on the first try. You can afford the round trip of casting wide, looking at results, then narrowing.
Narrow in two passes
I never finish search in one shot; I always split it into two passes.
Pass 1: cast wide by symbol
First, hit the whole repo with the function or class name itself to understand the distribution. See which directories it concentrates in and whether it has spread to unexpected places.
# Pass 1: grasp the overall distribution by symbol namerg --no-heading -n "processPayment" \ | awk -F: '{print $1}' | sort | uniq -c | sort -rn
Tallying hit counts per file makes "where the core lives" obvious at a glance. Files with one or two hits are periphery; files with a dozen are the center.
Pass 2: narrow by context terms
Next, restrict the scope to the apparent core and re-search with terms tied directly to the fix. If the goal is "change the arguments at the call site," hit the call pattern itself.
# Pass 2: extract only the call pattern inside the core directoryrg --no-heading -n "processPayment\(" src/billing/ \ --type ts -C 2
With two passes, the candidates you hand the agent shrink from "everything" to "the relevant calls inside the core directory." It is not unusual to cut context volume by an order of magnitude.
✦
Thank you for reading this far.
Continue Reading
What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.
WHAT YOU'LL LEARN
✦A two-pass procedure: cast wide, then narrow, instead of one vague query
✦An observation that about 80% of lost accuracy in big repos is too much context, and how to trim it
✦An ignore design that cuts search noise, plus a lightweight script that bundles only the relevant files
Secure payment via Stripe · Cancel anytime
✦
Unlock This Article
Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.
Narrowing precision also depends on whether you have excluded irrelevant matter from the search target in advance. When build artifacts and generated files mix into search, they distort the pass-1 distribution.
I keep a search-only exclude file at the repo root.
# .ignore — kept out of search (builds, generated, large data)
node_modules/
.next/
dist/
build/
*.min.js
*.map
public/content/ # generated HTML output
src/generated/ # build-time artifacts
Excluding generated files alone makes the pass-1 tally reflect real source only. In my setup, before and after this exclude, the cast of files at the top of the distribution changed and the accuracy of my guesses clearly improved, because generated files had been occupying the top of the results.
Bundle only the relevant files
After narrowing, assemble the context itself lightly. A small script that bundles only files above a hit threshold from the pass-1 tally is handy.
#!/usr/bin/env bash# focus-context.sh <symbol> <min-hits># list only files with hits at or above the threshold as context candidatesset -euo pipefailsymbol="$1"; min="${2:-3}"rg --no-heading -n "$symbol" \ | awk -F: '{print $1}' | sort | uniq -c \ | awk -v m="$min" '$1 >= m {print $2}' \ | while read -r f; do echo "=== $f ===" # pull only around the symbol (8 lines of context) rg -n -C 8 "$symbol" "$f" echo done
This script pulls just a few lines around the symbol, limited to files at or above the threshold. Passing the surroundings of the relevant spot rather than the whole file trims context further. I recommend starting the threshold near 3 and adjusting from results, raising it if too wide and lowering it if you miss things.
Common pitfalls
Two traps are easy to step on with this method.
The first is skipping pass 1 and firing a narrow search immediately. Narrowing without a hunch means you miss the core entirely when it sits outside your search scope. Roundabout as it looks, not skipping the distribution pass lands you faster in the end.
The second is letting bundled context go stale. If you save search results to a file and feed that to the agent, later edits drift from reality. I generate bundled context on the spot, use it, and never save it, because saved context brings mistakes along while staying silent.
Convert speed into accuracy
Faster substring search reads, on the surface, as a comfort improvement, but at its core it means the cost of narrowing has dropped. If the cost dropped, you can turn searches that used to be one-shot gambles into the round trip of casting wide and narrowing. That round trip is the foundation for handing agents exactly enough context in a large repo.
In running Dolice Labs, the shortcut to better agent accuracy was tidying the context I pass rather than waiting for a smarter model. Fast search makes that tidying almost free. Try it starting from the pass-1 distribution on your own large repository.
Share
Thank You for Reading
Antigravity Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.