Pass Your Agent's Structured Output Downstream With Schema Validation and Bounded Repair
Before the JSON an Antigravity agent returns flows straight into downstream automation and causes an incident, build a safe boundary with JSON Schema validation and a turn-limited repair loop. Includes the implementation.
Ask an agent to "return the next article candidates as JSON" and most of the time you get clean JSON. Most of the time. About once a week, a preamble sneaks in, a trailing comma appears, or level comes back as "beginner-intermediate", a value that does not exist.
The problem is that the one broken case flows into downstream automation. I run several sites on my own, with a process that takes the agent's output and assembles MDX. If the JSON parse fails, that whole generation is wasted. Even with a smart model, this incident never reaches zero unless you place validation at the boundary of the output.
Agent output is "untrusted input"
The first mindset to hold is to treat agent output the same as input from outside.
Nobody puts a value from a web form into the database without validating it. Yet generative output somehow tends to get passed downstream unvalidated. Because the output is linguistically fluent, you mistake the structure for correct too. Fluency and correctness are different things. Place a single validation layer at the boundary and the downstream code can be written on the assumption that "only the correct shape arrives."
Make the "shape" a contract with JSON Schema
First, write the shape the downstream expects as a schema. Not vague prose inside the prompt, but a contract a machine can judge.
enum pins level to three values, and pattern keeps periods and uppercase out of the slug. additionalProperties: False matters too: if the model helpfully adds an extra key, it gets rejected. A schema is the tool that promotes the "request" in your prompt into an unbreakable "promise."
✦
Thank you for reading this far.
Continue Reading
What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.
WHAT YOU'LL LEARN
✦A concrete layer design that treats agent output as a trust boundary and never passes it downstream unvalidated
✦Locking type and required fields with JSON Schema, plus a turn-limited repair loop that self-heals broken output
✦Fallback and logging design that prevents infinite repair and runaway cost on failure
Secure payment via Stripe · Cancel anytime
✦
Unlock This Article
Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.
Have it fix the output once before you throw it away
When validation fails, giving up and discarding the whole generation is a wasteful choice. In most cases the breakage is minor. So you return the failure detail to the model and ask for a fix, once.
import jsonfrom jsonschema import Draft7Validatordef validate_or_repair(agent, prompt: str, schema: dict, max_repair: int = 2) -> dict: """Self-heal with a turn limit until validation passes.""" validator = Draft7Validator(schema) raw = agent.run(prompt) for attempt in range(max_repair + 1): text = extract_json(raw) # strip surrounding prose try: data = json.loads(text) except json.JSONDecodeError as e: errors = [f"JSON syntax error: {e.msg} (pos {e.pos})"] else: errors = [f"{list(err.path)}: {err.message}" for err in validator.iter_errors(data)] if not errors: return data # passed validation if attempt == max_repair: break # Tell it the failure specifically and ask for a fix raw = agent.run( "Your previous output was invalid for the following reasons. " "Return JSON only, conforming to the schema.\n" + "\n".join(f"- {m}" for m in errors) + f"\n\nPrevious output:\n{raw}" ) raise SchemaRepairFailed(errors)
Two things matter. One is returning specific error messages. Saying only "invalid" fixes nothing. Say "level is beginner-intermediate, but allowed values are beginner / intermediate / advanced" and the next attempt almost always fixes it. The other is bounding the count with max_repair. Without it, repair runs forever against output that will not fix, and only cost balloons.
A small pre-process before the repair
Before entering the repair loop, absorbing mechanically fixable breakage in a pre-process visibly reduces the repair count.
import redef extract_json(raw: str) -> str: """Strip code fences and preamble, extract the JSON body only.""" # remove ```json ... ``` fences fenced = re.search(r"```(?:json)?\s*(\{.*?\}|\[.*?\])\s*```", raw, re.DOTALL) if fenced: return fenced.group(1) # naively cut from the first { to the last } start, end = raw.find("{"), raw.rfind("}") if start != -1 and end != -1: return raw[start : end + 1] return raw.strip()
In my setup, about 60% of validation failures were "prose attached before or after the JSON." Just adding this pre-process cut the cases that reach repair to under a third by feel. Rather than asking the model, stripping mechanically what you can strip mechanically is faster and surer.
What to keep when it fails
It genuinely happens that repair, tried up to the limit, does not fix it. Your design for that moment decides how calm the operation feels.
On failure I do three things. First, skip that generation and do not halt the downstream. I do not build it so one failure brings down the whole overnight batch. Second, log the last raw output and the validation error. Next morning I can grasp what happened in five minutes. Third, alert if the same input fails three times in a row, to separate transient wobble from a structural problem.
def safe_plan(agent, prompt, schema): try: return validate_or_repair(agent, prompt, schema) except SchemaRepairFailed as e: log.warning("schema repair failed", errors=e.errors, prompt=prompt[:200]) return None # the caller sees None and skips
Return None and fall to skip, or raise and halt. You choose here by the nature of the process. Something like a draft, where "dropping one is fine, there's a next," skips; something like payments, which "must not drop," halts. Even with the same repair mechanism, how you fall on failure should change with the use case.
How far to fix automatically, and where to hand off to a human
The repair loop is handy but not omnipotent. Raising the count raises the fix probability, yet output that does not fix in three tries usually does not fix in five either. The instruction on the prompt side is vague, or the schema itself does not match reality. The cause is often outside the model.
So I fix max_repair at 2. If it does not fix in two, I take it as a signal to revisit the schema or the prompt, not to raise the repair count. Auto-repair is for "silently fixing light breakage," not for "papering over design flaws."
When I have an agent assemble config values for an AdMob app, I use this same boundary design as is. As much as the effort to make an agent smarter, the effort to not over-trust its output is the key to running it stably for the long haul.
Thank you for reading. I hope it helps as a way to place a boundary that does not break the downstream.
Share
Thank You for Reading
Antigravity Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.