Feeding Spec PDFs to Antigravity 2.1.4: A Practical Attachment Workflow
Field notes on using the PDF attachment added in Antigravity 2.1.4 as a spec-driven implementation workflow: supported formats, the scanned-PDF trap, token cost, and how to verify the output.
The night I noticed the payment provider only shipped its REST reference as a PDF, I was rebuilding the billing layer of an app I run as an indie developer.
The endpoint list, the request body types, the error-code table — all of it lived inside tables in a PDF, and I was retyping it line by line into the chat box. One slip in the transcription, and the agent would generate a client using a field name that does not exist.
Once Antigravity 2.1.4 let me attach the PDF directly, that retyping disappeared. But you cannot just throw any file at it and expect a clever read. After a few days, the ways of handing over a PDF that work and the ways that don't separated cleanly, so here is where I draw the line.
Which PDFs the attachment actually handles
The short version: attachment only works for PDFs that carry a text layer.
PDFs come in two broad kinds. In one, the characters are embedded as text data. In the other, a page was scanned from paper and the content is just an image. The agent reads the first kind directly. The second is, from the agent's point of view, a single picture — it cannot make out the table rules or the numbers.
A PDF exported from official API docs, an OpenAPI printout, a spec sheet from a design tool — these almost always carry a text layer. Old internal documents scanned from paper, or a procedure doc that is really pasted screenshots, are often image PDFs, and attaching them rarely gives the accuracy you hoped for.
Before attaching, you can confirm which kind it is with a single command.
# Check whether a text layer exists.# Uses pdftotext, included in poppler-utils.pdftotext spec.pdf - | head -c 400# If nothing comes out, or only a scatter of symbols, it's an image PDF# -> attaching it as-is won't be readable; you'll need OCR.
If even a few hundred characters of clean text come back, attach it as-is. If it's empty, run the OCR step below or hunt down the source data (an HTML or text version), which is usually faster.
The basic attachment flow
The operation itself is simple. Drag the PDF onto the chat input, or pick it from the attach icon. You can attach several at once.
After attaching, always tell the agent which part of the PDF to look at. Skip this and the agent tries to take in the whole document, returning answers pulled off course by unrelated pages.
Refer to the attached payment-api.pdf.Target only the "Create Charge" endpoint on pages 14-18,and write the request body type as a TypeScript interface.Do not add fields that are not written in the PDF.
That last sentence — do not add unwritten fields — goes into nearly every prompt of mine. A type that drops a field from a spec becomes a defect as written, so I'd rather shut down the agent's well-meaning autocompletion.
✦
Thank you for reading this far.
Continue Reading
What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.
WHAT YOU'LL LEARN
✦When PDF attachment actually beats copy-pasting text, and the exact conditions where it does
✦A one-command check with pdftotext to spot scanned image PDFs and avoid wasted tokens
✦A Python script to split a 100+ page spec into sections, with notes on the measured difference
Secure payment via Stripe · Cancel anytime
✦
Unlock This Article
Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.
Here is the flow from when I actually built the payment client, laid out so you can reproduce it.
Run the text check, then attach (the pdftotext step above)
Hand it to the agent with the pages and the task scoped down
Generate "types only" first, and eyeball them against the PDF
Once the types are firm, generate the function bodies that use them
Pass the error-code table as a separate instruction to build the exception mapping
Locking the types down first is the key. Ask it to "build the whole client" up front, and the types, transport, and error handling all spill out together, leaving you unsure which part came from the PDF and which was guessed.
The generated type settles into something like this.
// Type lifted from the Create Charge definition on PDF pages 14-18.// Required/optional follows the PDF's "Required" column.export interface CreateChargeRequest { amount: number; // smallest currency unit (integer yen as-is) currency: "jpy" | "usd"; // from the PDF's Supported currencies table customerId: string; description?: string; // Optional in the PDF metadata?: Record<string, string>;}export interface CreateChargeResponse { id: string; status: "succeeded" | "pending" | "failed"; createdAt: string; // ISO 8601, stated in the PDF's Notes column}
With the type held fixed, I then ask for the function body. Because the type is settled, the agent can concentrate on the transport, and the output wavers far less.
export async function createCharge( req: CreateChargeRequest, apiKey: string,): Promise<CreateChargeResponse> { const res = await fetch("https://api.example-pay.test/v1/charges", { method: "POST", headers: { Authorization: `Bearer ${apiKey}`, "Content-Type": "application/json", }, body: JSON.stringify(req), }); if (!res.ok) { // The error-code table is mapped separately in the next step. throw new Error(`charge failed: ${res.status}`); } return (await res.json()) as CreateChargeResponse;}
The scanned-image PDF trap
This is what tripped me up first. I attached an image PDF assuming it was readable, and the type that came back was full of names that aren't in the PDF. The agent can't lift characters off an image, so it was imagining what it couldn't see from context.
There are two fixes: find the original text data, or add a text layer with OCR. The latter was easy with ocrmypdf.
# Add a text layer to an image PDF.# Specify the language pack if Japanese is present.ocrmypdf -l jpn+eng scanned-spec.pdf searchable-spec.pdf# Confirm text is now extractable.pdftotext searchable-spec.pdf - | head -c 400
OCR'd PDFs don't recognize table structure perfectly. Digits and symbols can shift, so when I hand over a spec that went through OCR, I eyeball the numeric ranges and enums in the types with extra care. It's less about avoiding the trap and more about raising the inspection density once I know the trap is there.
Don't hand over a large PDF whole
I once attached a 100+ page spec in full, and it was a mistake. Responses slowed, most of the context filled with irrelevant pages, and attention to the few pages that mattered thinned out. The token cost is not trivial either.
Now I cut out only the sections I need before attaching. Splitting is more reliable left to a script that takes a page range than done by hand.
# A small script to cut a spec into chapters.# pip install pypdffrom pypdf import PdfReader, PdfWriterdef extract_pages(src: str, start: int, end: int, dst: str) -> None: """Write pages start..end (1-based, inclusive) to dst.""" reader = PdfReader(src) writer = PdfWriter() for i in range(start - 1, end): writer.add_page(reader.pages[i]) with open(dst, "wb") as f: writer.write(f)# Example: cut out just the Create Charge chapter.extract_pages("payment-api.pdf", 14, 18, "charge-section.pdf")
In practice, handing over just the five pages I needed gave a visibly faster response than the full document, and the generated type was more accurate too. Keeping irrelevant information out of the context seems to translate directly into precision.
Build a way to verify the output
To avoid over-trusting what's generated, I keep verification in two layers.
The first is making the agent state its source pages. Ask it to "note, for each field's type, which PDF page the entry is based on," and the cross-checking effort drops sharply. A field for which it can't write a page number is a sign that the field isn't in the PDF — likely a guess.
The second is a mechanical check. I confirm the generated type passes a minimal test on my end.
// Minimal confirmation that the generated type is as expected.import { CreateChargeRequest } from "./payment-client";const sample: CreateChargeRequest = { amount: 1000, currency: "jpy", customerId: "cus_test_001", // description and metadata are optional, so they should be omittable.};// If it made description required, this line throws a type error.console.log("type check OK:", sample.amount);
Whether this sample compiles is enough to catch the common slip of turning an optional field into a required one. A type lifted from a spec is safer with this kind of small guard in front of it.
Where I've landed
After a few days, my conclusion is that PDF attachment is at its strongest when you hand over a text-layer spec, section by section, while making the agent cite its source pages.
Conversely, passing an image PDF whole, without saying which pages to read, trades away the retyping effort for the effort of hunting down invisible errors later. The more convenient the feature, the more the result depends on the manners of handing it over.
If you're staring down a spec PDF the same way, I hope this gives you one starting point to work from. Thank you for reading.
Share
Thank You for Reading
Antigravity Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.