ANTIGRAVITY LABJP
Articles/Integrations
Integrations/2026-06-26Advanced

Designing MCP Tool Output So It Doesn't Flood Your Agent's Context

When a custom MCP server returns large results in one shot, your Antigravity agent quietly degrades. Field projection, pagination, resource_link, and an output budget keep context from overflowing — shown with concrete TypeScript and measured numbers.

MCP17antigravity391agents104context6TypeScript11

Premium Article

You connect a custom MCP server to Antigravity, it works beautifully at first, and then one day — as your repo or data grows — the agent suddenly gets sloppy. I ran into exactly this as an indie developer automating article updates across the four Dolice tech blogs: once a tool that lists articles passed 800 entries, the agent started dropping instructions partway through a run. The cause was not model degradation or a broken prompt. The tool was returning every row in a single call.

Tool results land directly in the agent's context. So the output design of your MCP server is really the design of how much "thinking room" the agent has left. Here is how to build the server side so it survives when results grow large.

Large tool results bite quietly, not loudly

The tricky part is that an overloaded context usually throws no exception. The model reads what fits and silently pushes out older instructions and earlier tool results. The symptom shows up as a flaky bug: "it ignored the rule I just gave it," or "it skipped a step in the middle."

The naive version of my tool looked like this.

// ❌ Breaks as data grows: returns everything at once
server.registerTool(
  "list_articles",
  {
    description: "Return the article list for a site",
    inputSchema: { site: z.string() },
  },
  async ({ site }) => {
    const articles = await db.allArticles(site); // 800+ rows
    return {
      content: [{ type: "text", text: JSON.stringify(articles, null, 2) }],
    };
  }
);

If each row averages 600 tokens with its excerpt and tags, 800 rows is roughly 480,000 tokens. That fits in no model's window. Even if it did, the agent's working headroom would drop to nearly zero the moment it loaded.

First, measure how many tokens a tool result costs

Before redesigning anything, put a number on the current cost. Just logging the size of each tool result on the server already reveals which tools are heavy.

import { encoding_for_model } from "tiktoken";
 
const enc = encoding_for_model("gpt-4o"); // approximate; precision not required
 
function logResultCost(toolName: string, payload: string) {
  const tokens = enc.encode(payload).length;
  console.error(`[mcp] ${toolName} -> ${tokens} tokens (${payload.length} chars)`);
  return tokens;
}

Use console.error, not console.log: over MCP's stdio transport, standard output is reserved for the protocol, and mixing logs into it corrupts the channel. This is the first trap people hit when writing a custom server, so always send debug output to stderr.

In my setup the measurement showed list_articles alone at ~480K tokens, with the next heaviest, search, around 60K. Fixing the top two or three heavy tools is usually enough to bring the feel back.

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN
Pinpoint why a custom MCP server returning hundreds of thousands of tokens was degrading your agent, and cut each page under 2,000 tokens with pagination and continuation cursors
Switch your tool contract to 'return references, not contents' using resource_link and field projection, with TypeScript you can drop into your own server
Bake an output budget into your tool's input schema so it never overflows context even when called from Antigravity Managed Agents or the CLI, reproduced across four sites of automation
Secure payment via Stripe · Cancel anytime

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

or
Unlock all articles with Membership →
Share

Thank You for Reading

Antigravity Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

Related Articles

Integrations2026-05-23
Antigravity × UMP/ATT Consent Rate Optimization Agent — A Weekly Autonomous Loop That Lifts AdMob Revenue Region by Region
ATT and Google UMP consent rates are hidden levers that move AdMob eCPM by 1.3–2.0x. This is a working memo on letting Antigravity sub-agents run weekly experiments on regional consent UX across six apps, and how that lifted ARPDAU.
Integrations2026-05-19
Running AdMob and AppLovin MAX side by side with an Antigravity sub-agent that compares them daily
For the past two months I have been running my wallpaper apps on both AdMob and AppLovin MAX in parallel, and letting Antigravity sub-agents pull eCPM, fill rate and ARPDAU into a single daily comparison. This is an implementation memo from those 90 days, focused on cross-network normalisation and the thresholds I trust an agent with.
Integrations2026-04-17
Google Antigravity Python SDK Production Masterguide: Multimodal, Agents, and RAG Pipelines from Design to Deployment
The complete guide to using the Google Antigravity Python SDK in production. Covers multimodal input, tool calling, RAG pipelines, streaming, cost optimization, and Cloud Run deployment with working code examples.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →