Designing MCP Tool Output So It Doesn't Flood Your Agent's Context
When a custom MCP server returns large results in one shot, your Antigravity agent quietly degrades. Field projection, pagination, resource_link, and an output budget keep context from overflowing — shown with concrete TypeScript and measured numbers.
You connect a custom MCP server to Antigravity, it works beautifully at first, and then one day — as your repo or data grows — the agent suddenly gets sloppy. I ran into exactly this as an indie developer automating article updates across the four Dolice tech blogs: once a tool that lists articles passed 800 entries, the agent started dropping instructions partway through a run. The cause was not model degradation or a broken prompt. The tool was returning every row in a single call.
Tool results land directly in the agent's context. So the output design of your MCP server is really the design of how much "thinking room" the agent has left. Here is how to build the server side so it survives when results grow large.
Large tool results bite quietly, not loudly
The tricky part is that an overloaded context usually throws no exception. The model reads what fits and silently pushes out older instructions and earlier tool results. The symptom shows up as a flaky bug: "it ignored the rule I just gave it," or "it skipped a step in the middle."
The naive version of my tool looked like this.
// ❌ Breaks as data grows: returns everything at onceserver.registerTool( "list_articles", { description: "Return the article list for a site", inputSchema: { site: z.string() }, }, async ({ site }) => { const articles = await db.allArticles(site); // 800+ rows return { content: [{ type: "text", text: JSON.stringify(articles, null, 2) }], }; });
If each row averages 600 tokens with its excerpt and tags, 800 rows is roughly 480,000 tokens. That fits in no model's window. Even if it did, the agent's working headroom would drop to nearly zero the moment it loaded.
First, measure how many tokens a tool result costs
Before redesigning anything, put a number on the current cost. Just logging the size of each tool result on the server already reveals which tools are heavy.
Use console.error, not console.log: over MCP's stdio transport, standard output is reserved for the protocol, and mixing logs into it corrupts the channel. This is the first trap people hit when writing a custom server, so always send debug output to stderr.
In my setup the measurement showed list_articles alone at ~480K tokens, with the next heaviest, search, around 60K. Fixing the top two or three heavy tools is usually enough to bring the feel back.
✦
Thank you for reading this far.
Continue Reading
What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.
WHAT YOU'LL LEARN
✦Pinpoint why a custom MCP server returning hundreds of thousands of tokens was degrading your agent, and cut each page under 2,000 tokens with pagination and continuation cursors
✦Switch your tool contract to 'return references, not contents' using resource_link and field projection, with TypeScript you can drop into your own server
✦Bake an output budget into your tool's input schema so it never overflows context even when called from Antigravity Managed Agents or the CLI, reproduced across four sites of automation
Secure payment via Stripe · Cancel anytime
✦
Unlock This Article
Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.
What an agent needs from a listing is usually the minimum required to decide which item to act on — not full bodies and metadata. So let the caller pick which fields it wants.
This alone drops a row from 600 tokens to about 30. But if the row count itself is large, projection still scales linearly. The next move is pagination.
Principle 2: Pagination and continuation cursors
Always return a listing with an upper bound, and make the caller fetch the rest with a continuation token. The MCP spec itself uses a nextCursor pattern for pagination, and aligning with it meshes cleanly with how Antigravity behaves.
Returning total alongside the page matters more than it looks. The agent learns the overall count from page one, so instead of blindly paging through everything, it fetches only the range it needs.
Principle 3: Return references, not contents, with resource_link
For large data like article bodies, returning a resource_link reference is far more effective than embedding the text. The agent scans the list and only later fetches the single item it actually wants to open. This is the opposite of "load everything into context at once" — it mirrors how a person scans filenames in a file browser before opening one.
The reference resolves through a separate resource handler (article://...). A 600-token body compresses to 30–40 tokens per link. Even twenty of them keep a page under 1,000 tokens.
Server-side summarization is powerful but muddies the contract
It is tempting to think "if it's heavy, just summarize it." I don't make that the default. Summarization adds a model call, adds latency and cost, and ties the tool's reliability to the quality of the summary. When an agent expects "the listing tool returns facts" and you mix in interpreted prose, downstream judgment quietly drifts.
If you do add summarization, I'd only do it when all of these hold:
The source is free text that resists structuring (logs, long documents)
It lives under a separate tool name, distinct from the listing tool
It always includes the supporting resource_link so the original is reachable
In other words, summarization is a separate service invoked on explicit request — not a replacement for listing. Listing and search tools stay light through projection and pagination.
Bake an output budget into the tool contract
Leave these techniques as implicit rules scattered across tools, and they eventually break down. I make "a single tool result must not exceed N tokens" explicit in both the input schema and the implementation.
const OUTPUT_BUDGET = 2000; // token ceiling (a shared tool contract)function enforceBudget(payload: string, toolName: string): string { const tokens = enc.encode(payload).length; if (tokens <= OUTPUT_BUDGET) return payload; // Don't silently truncate — surface the overage and prompt a smaller call return JSON.stringify({ error: "OUTPUT_BUDGET_EXCEEDED", tool: toolName, tokens, budget: OUTPUT_BUDGET, hint: "Lower limit or narrow fields and call again", });}
Silently truncating an over-budget result makes the agent mistake partial data for the whole. That is the worst kind of failure. Instead, return the overage as a structured error with a hint on how to call again. Antigravity's agent reads the hint, lowers limit, and retries, converging on its own without a human stepping in.
Measured: applying this to a four-site automation tool
Here are before/after numbers from my own article-update automation after adding field projection, pagination, and an output budget to list_articles, with a page cap of 20.
Metric
Before
After
One list_articles result
~480K tokens
~1,400 tokens
Working headroom after listing
~zero
over 90% of the window
"Drops an earlier instruction" frequency
~1 in 3 runs
not reproduced
Round trips to open one article
1 (fetch all)
2–3 (list then resolve)
Round trips go up, but each trip got lighter, so total latency actually dropped. More importantly, the instruction-dropping bug disappeared, which is what finally let me hand overnight batch runs to the agent with confidence.
Offset cursors break when data shifts between pages
For simplicity I used an offset-based cursor above, but one caveat shows up in long-running production. If an article is added or removed between fetching page one and page two, the offset shifts and you get duplicates or gaps. In my own automation, the moment a separate overnight task added an article, the agent nearly processed the same one twice.
If updates are frequent enough to cause real harm, switch from offset to a keyset cursor that stores "the key of the last row seen."
// Build the cursor from a stable key (updatedAt + slug)const decodeKey = (c?: string) => c ? JSON.parse(Buffer.from(c, "base64").toString()) : null;async function fetchPageStable(site: string, cursor: string | undefined, limit: number) { const after = decodeKey(cursor); // { updatedAt, slug } or null const rows = await db.articlesAfter(site, after, limit + 1); // fetch one extra const hasMore = rows.length > limit; const page = rows.slice(0, limit); const last = page.at(-1); const next = hasMore && last ? Buffer.from(JSON.stringify({ updatedAt: last.updatedAt, slug: last.slug })).toString("base64") : null; return { page, next };}
The key points: make the sort key unique (updatedAt alone drops rows that share a timestamp, so pair it with slug), and fetch limit + 1 rows to decide "is there more" in a single query. When listing consistency breaks, the agent quietly repeats work or misses an article that should exist. Pagination correctness, not just context size, is part of a tool's reliability.
Where to start
You don't need to rebuild every tool at once. The first move is to add one line of console.error token logging to your running server and identify the heaviest tools. The culprit is usually one or two listing or search tools. Add field projection and bounded pagination there, replace heavy payloads like bodies with resource_link, and most of the feel comes back.
Thanks for reading. I hope this gives the first nudge to anyone watching their agent go sluggish behind a custom tool.
Share
Thank You for Reading
Antigravity Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.