Articles/Integrations

⬡ Integrations/2026-06-26Advanced

Designing MCP Tool Output So It Doesn't Flood Your Agent's Context

When a custom MCP server returns large results in one shot, your Antigravity agent quietly degrades. Field projection, pagination, resource_link, and an output budget keep context from overflowing — shown with concrete TypeScript and measured numbers.

MCP¹⁷ antigravity³⁹¹ agents¹⁰⁴ context⁶ TypeScript¹¹

✦ Premium Article

You connect a custom MCP server to Antigravity, it works beautifully at first, and then one day — as your repo or data grows — the agent suddenly gets sloppy. I ran into exactly this as an indie developer automating article updates across the four Dolice tech blogs: once a tool that lists articles passed 800 entries, the agent started dropping instructions partway through a run. The cause was not model degradation or a broken prompt. The tool was returning every row in a single call.

Tool results land directly in the agent's context. So the output design of your MCP server is really the design of how much "thinking room" the agent has left. Here is how to build the server side so it survives when results grow large.

Large tool results bite quietly, not loudly

The tricky part is that an overloaded context usually throws no exception. The model reads what fits and silently pushes out older instructions and earlier tool results. The symptom shows up as a flaky bug: "it ignored the rule I just gave it," or "it skipped a step in the middle."

The naive version of my tool looked like this.

// ❌ Breaks as data grows: returns everything at once
server.registerTool(
  "list_articles",
  {
    description: "Return the article list for a site",
    inputSchema: { site: z.string() },
  },
  async ({ site }) => {
    const articles = await db.allArticles(site); // 800+ rows
    return {
      content: [{ type: "text", text: JSON.stringify(articles, null, 2) }],
    };
  }
);

If each row averages 600 tokens with its excerpt and tags, 800 rows is roughly 480,000 tokens. That fits in no model's window. Even if it did, the agent's working headroom would drop to nearly zero the moment it loaded.

First, measure how many tokens a tool result costs

Before redesigning anything, put a number on the current cost. Just logging the size of each tool result on the server already reveals which tools are heavy.

import { encoding_for_model } from "tiktoken";
 
const enc = encoding_for_model("gpt-4o"); // approximate; precision not required
 
function logResultCost(toolName: string, payload: string) {
  const tokens = enc.encode(payload).length;
  console.error(`[mcp] ${toolName} -> ${tokens} tokens (${payload.length} chars)`);
  return tokens;
}

Use console.error, not console.log: over MCP's stdio transport, standard output is reserved for the protocol, and mixing logs into it corrupts the channel. This is the first trap people hit when writing a custom server, so always send debug output to stderr.

In my setup the measurement showed list_articles alone at ~480K tokens, with the next heaviest, search, around 60K. Fixing the top two or three heavy tools is usually enough to bring the feel back.

✦

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN

✦Pinpoint why a custom MCP server returning hundreds of thousands of tokens was degrading your agent, and cut each page under 2,000 tokens with pagination and continuation cursors

✦Switch your tool contract to 'return references, not contents' using resource_link and field projection, with TypeScript you can drop into your own server

✦Bake an output budget into your tool's input schema so it never overflows context even when called from Antigravity Managed Agents or the CLI, reproduced across four sites of automation

Secure payment via Stripe · Cancel anytime

✦

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

Unlock all articles with Membership →

Principle 1: Stop returning everything (field projection)

What an agent needs from a listing is usually the minimum required to decide which item to act on — not full bodies and metadata. So let the caller pick which fields it wants.

const FIELDS = ["slug", "title", "updatedAt", "premium"] as const;
 
server.registerTool(
  "list_articles",
  {
    description: "Return a summary list; fields narrows the columns returned",
    inputSchema: {
      site: z.string(),
      fields: z.array(z.enum(FIELDS)).default(["slug", "title"]),
    },
  },
  async ({ site, fields }) => {
    const rows = await db.allArticles(site);
    const projected = rows.map((r) =>
      Object.fromEntries(fields.map((f) => [f, r[f]]))
    );
    return { content: [{ type: "text", text: JSON.stringify(projected) }] };
  }
);

This alone drops a row from 600 tokens to about 30. But if the row count itself is large, projection still scales linearly. The next move is pagination.

Principle 2: Pagination and continuation cursors

Always return a listing with an upper bound, and make the caller fetch the rest with a continuation token. The MCP spec itself uses a nextCursor pattern for pagination, and aligning with it meshes cleanly with how Antigravity behaves.

const PAGE_LIMIT = 20;
 
server.registerTool(
  "list_articles",
  {
    description: "Return articles one page at a time; pass cursor for more",
    inputSchema: {
      site: z.string(),
      cursor: z.string().optional(),
      limit: z.number().min(1).max(50).default(PAGE_LIMIT),
    },
  },
  async ({ site, cursor, limit }) => {
    const offset = cursor ? decodeCursor(cursor) : 0;
    const rows = await db.allArticles(site);
    const page = rows.slice(offset, offset + limit);
    const next = offset + limit < rows.length
      ? encodeCursor(offset + limit)
      : null;
 
    const summary = page.map((r) => ({ slug: r.slug, title: r.title }));
    return {
      content: [{
        type: "text",
        text: JSON.stringify({
          items: summary,
          nextCursor: next,
          total: rows.length,
        }),
      }],
    };
  }
);
 
// Wrapping the cursor in base64 so its contents aren't guessable is enough in practice
const encodeCursor = (n: number) => Buffer.from(String(n)).toString("base64");
const decodeCursor = (c: string) => Number(Buffer.from(c, "base64").toString());

Returning total alongside the page matters more than it looks. The agent learns the overall count from page one, so instead of blindly paging through everything, it fetches only the range it needs.

Principle 3: Return references, not contents, with resource_link

For large data like article bodies, returning a resource_link reference is far more effective than embedding the text. The agent scans the list and only later fetches the single item it actually wants to open. This is the opposite of "load everything into context at once" — it mirrors how a person scans filenames in a file browser before opening one.

async ({ site, cursor, limit }) => {
  const { page, next, total } = await fetchPage(site, cursor, limit);
 
  const links = page.map((r) => ({
    type: "resource_link" as const,
    uri: `article://${site}/${r.slug}`,
    name: r.title,
    description: `${r.updatedAt} ${r.premium ? "[premium]" : ""}`.trim(),
    mimeType: "text/markdown",
  }));
 
  return {
    content: [
      { type: "text", text: JSON.stringify({ nextCursor: next, total }) },
      ...links,
    ],
  };
}

The reference resolves through a separate resource handler (article://...). A 600-token body compresses to 30–40 tokens per link. Even twenty of them keep a page under 1,000 tokens.

Server-side summarization is powerful but muddies the contract

It is tempting to think "if it's heavy, just summarize it." I don't make that the default. Summarization adds a model call, adds latency and cost, and ties the tool's reliability to the quality of the summary. When an agent expects "the listing tool returns facts" and you mix in interpreted prose, downstream judgment quietly drifts.

If you do add summarization, I'd only do it when all of these hold:

The source is free text that resists structuring (logs, long documents)
It lives under a separate tool name, distinct from the listing tool
It always includes the supporting resource_link so the original is reachable

In other words, summarization is a separate service invoked on explicit request — not a replacement for listing. Listing and search tools stay light through projection and pagination.

Bake an output budget into the tool contract

Leave these techniques as implicit rules scattered across tools, and they eventually break down. I make "a single tool result must not exceed N tokens" explicit in both the input schema and the implementation.

const OUTPUT_BUDGET = 2000; // token ceiling (a shared tool contract)
 
function enforceBudget(payload: string, toolName: string): string {
  const tokens = enc.encode(payload).length;
  if (tokens <= OUTPUT_BUDGET) return payload;
  // Don't silently truncate — surface the overage and prompt a smaller call
  return JSON.stringify({
    error: "OUTPUT_BUDGET_EXCEEDED",
    tool: toolName,
    tokens,
    budget: OUTPUT_BUDGET,
    hint: "Lower limit or narrow fields and call again",
  });
}

Silently truncating an over-budget result makes the agent mistake partial data for the whole. That is the worst kind of failure. Instead, return the overage as a structured error with a hint on how to call again. Antigravity's agent reads the hint, lowers limit, and retries, converging on its own without a human stepping in.

Measured: applying this to a four-site automation tool

Here are before/after numbers from my own article-update automation after adding field projection, pagination, and an output budget to list_articles, with a page cap of 20.

Metric	Before	After
One `list_articles` result	~480K tokens	~1,400 tokens
Working headroom after listing	~zero	over 90% of the window
"Drops an earlier instruction" frequency	~1 in 3 runs	not reproduced
Round trips to open one article	1 (fetch all)	2–3 (list then resolve)

Round trips go up, but each trip got lighter, so total latency actually dropped. More importantly, the instruction-dropping bug disappeared, which is what finally let me hand overnight batch runs to the agent with confidence.

Offset cursors break when data shifts between pages

For simplicity I used an offset-based cursor above, but one caveat shows up in long-running production. If an article is added or removed between fetching page one and page two, the offset shifts and you get duplicates or gaps. In my own automation, the moment a separate overnight task added an article, the agent nearly processed the same one twice.

If updates are frequent enough to cause real harm, switch from offset to a keyset cursor that stores "the key of the last row seen."

// Build the cursor from a stable key (updatedAt + slug)
const decodeKey = (c?: string) =>
  c ? JSON.parse(Buffer.from(c, "base64").toString()) : null;
 
async function fetchPageStable(site: string, cursor: string | undefined, limit: number) {
  const after = decodeKey(cursor); // { updatedAt, slug } or null
  const rows = await db.articlesAfter(site, after, limit + 1); // fetch one extra
  const hasMore = rows.length > limit;
  const page = rows.slice(0, limit);
  const last = page.at(-1);
  const next = hasMore && last
    ? Buffer.from(JSON.stringify({ updatedAt: last.updatedAt, slug: last.slug })).toString("base64")
    : null;
  return { page, next };
}

The key points: make the sort key unique (updatedAt alone drops rows that share a timestamp, so pair it with slug), and fetch limit + 1 rows to decide "is there more" in a single query. When listing consistency breaks, the agent quietly repeats work or misses an article that should exist. Pagination correctness, not just context size, is part of a tool's reliability.

Where to start

You don't need to rebuild every tool at once. The first move is to add one line of console.error token logging to your running server and identify the heaviest tools. The culprit is usually one or two listing or search tools. Add field projection and bounded pagination there, replace heavy payloads like bodies with resource_link, and most of the feel comes back.

If you want to go deeper into tool boundaries and permission scoping, the custom MCP server production guide and least-privilege allowlist design for MCP tools are the next footholds. For the broader cost picture, prompt cache and context cost strategy for agents helps you decide where to set the output budget.

Thanks for reading. I hope this gives the first nudge to anyone watching their agent go sluggish behind a custom tool.

Thank You for Reading

Antigravity Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.