Your Antigravity Custom Tools Don't Break by Design — They Break on Re-execution: Field Notes on Idempotency and Error Contracts
Once you add a custom tool to an Antigravity agent, the real production problem is re-execution and duplicated side effects. Here are the idempotency keys, error contracts, health gates, and tool-sprawl checks that actually held up in practice.
An agent usually behaves beautifully right after you add a custom tool. The trouble shows up a few days later, when that same tool starts getting called more often than you expected. As an indie developer, I run a few automation agents across my own projects, and on one of them I added a single tool that talked to an internal API. The first week was flawless. The second week, I noticed the failure for the first time in the worst possible form: the same invoice had been created twice. The cause was not the tool's implementation. The agent had read a timeout as a failure and re-called an operation that had, in fact, already succeeded.
Articles on designing custom tools tend to stop at schemas and permission boundaries. But what actually matters in operation is resilience to re-execution — the property that nothing breaks when the same tool is called twice. These are the implementations and decision rules that earned their keep from that angle.
Where duplicate side effects come from
There are more paths to a double call than you'd think. A few recur.
One: the tool actually succeeds, but the response times out, and the agent decides it failed and retries. Two: midway through a long task, context gets compacted, the agent forgets a call it already made, and reconstructs it. Three: in a parallel-agent setup, two workers pick up the same task.
What they share is how they look from the tool's side — two near-identical calls arriving within a short window. That means you can mount your defense on the argument side.
Idempotency keys — never run the same operation twice
For write and destructive tools, make an idempotency key mandatory. The key is a string that names the unit "this operation should run exactly once." The caller (the agent) generates it; the tool stores it. When a second call arrives with the same key, the tool returns the first result instead of re-executing.
import timeclass IdempotencyStore: """Holds idempotency keys and first results with a TTL. In production, back this with a shared store like Redis so parallel workers share it.""" def __init__(self, ttl_seconds: int = 86400): self._store: dict[str, tuple[float, dict]] = {} self._ttl = ttl_seconds def get(self, key: str) -> dict | None: entry = self._store.get(key) if not entry: return None ts, result = entry if time.time() - ts > self._ttl: self._store.pop(key, None) return None return result def put(self, key: str, result: dict) -> None: self._store[key] = (time.time(), result)def create_invoice(args: dict, idem: IdempotencyStore) -> dict: key = args.get("idempotency_key") if not key: return {"ok": False, "error": {"code": "MISSING_IDEMPOTENCY_KEY", "message": "Write operations require an idempotency_key.", "retryable": False}} cached = idem.get(key) if cached is not None: # On the second call onward, cause no side effect; return the first result return {**cached, "replayed": True} result = {"ok": True, "data": _do_create_invoice(args)} idem.put(key, result) return result
The point is that the caller generates the key. You could derive a key from a hash of the arguments on the tool side, but then you can't distinguish a legitimate "I deliberately want to create this twice" from an accidental re-execution. Make it a contract that the agent issues a unique key per operation and reuses the same key when retrying, and the two cases separate cleanly.
In the schema, make idempotency_key a required argument on write tools, and put the contract in the description: this operation is once-only; reuse the same key to retry.
CREATE_INVOICE_SCHEMA = { "name": "create_invoice", "description": "Creates an invoice. Idempotent. Reuse the same idempotency_key when retrying.", "parameters": { "type": "object", "properties": { "customer_id": {"type": "string"}, "amount_cents": {"type": "integer", "minimum": 1}, "idempotency_key": { "type": "string", "description": "A string uniquely identifying this invoice creation. Reuse the same value on retry.", }, }, "required": ["customer_id", "amount_cents", "idempotency_key"], },}
Set the TTL by the nature of the operation. For things where "doing the same operation again tomorrow is almost certainly an accident" — invoices, payments — go 24 hours or more. For something like a notification, where you only need to block short-window duplicates, a few minutes is plenty. I give the destructive ones the longest windows.
✦
Thank you for reading this far.
Continue Reading
What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.
WHAT YOU'LL LEARN
✦An idempotency-key design that keeps re-execution from doubling side effects (who generates it, where it's stored, TTL)
✦An error contract the agent can branch on — a schema with retryable and a grace period
✦A health gate so the agent stops hammering a downed dependency, plus a rule that curbs tool sprawl
Secure payment via Stripe · Cancel anytime
✦
Unlock This Article
Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.
Error contracts — return failures in a form the agent can act on
Re-execution resilience and error handling are the same problem from two sides, because whether the agent mistakes a call for a failure and retries depends on the shape of the error you return. Hand back a raw stack trace and the agent can't tell whether it's retryable or fatal, so it tends to make the worst choice: just call again.
Return tool errors in a structure the agent can branch on. At minimum, fix three fields: an error code, a human-readable message, and whether it's retryable.
def get_order_status(order_id: str) -> dict: try: return {"ok": True, "data": adapter.get_status(order_id)} except ValueError as e: return {"ok": False, "error": { "code": "INVALID_INPUT", "message": str(e), "retryable": False}} except TransientAPIError: return {"ok": False, "error": { "code": "UPSTREAM_UNAVAILABLE", "message": "The order API is temporarily unresponsive.", "retryable": True, "retry_after_seconds": 30}} except NotFoundError: return {"ok": False, "error": { "code": "NOT_FOUND", "message": f"Order {order_id} does not exist.", "retryable": False}}
The trick is to deliberately grow the paths that return retryable: False. Invalid input, a missing resource, insufficient permission — none of these change no matter how many times you call. Just telling the agent clearly that these are not retryable cuts pointless re-calls visibly. Reserve retryable: True for things with a real prospect of clearing up given time, like a transient downstream outage, and attach retry_after_seconds so the agent doesn't immediately hammer it again.
Keep the set of error codes common across your tools. The table below maps the codes I actually use to the behavior I expect from the agent.
Error code
Meaning
retryable
Expected agent behavior
INVALID_INPUT
Bad arguments
false
Fix the arguments first. Don't retry with the same ones
NOT_FOUND
Target doesn't exist
false
Identify the target by another route
PERMISSION_DENIED
Insufficient rights
false
Don't call; defer to a human
UPSTREAM_UNAVAILABLE
Transient downstream outage
true
Retry after retry_after_seconds
RATE_LIMITED
Too many calls
true
Retry after the grace period; reduce parallelism
CONFLICT
State conflict
false
Re-fetch the latest state before deciding
Drop this table straight into your descriptions or system prompt and the agent's retry behavior settles down. A shared code system also pays off because you no longer have to re-teach behavior from scratch every time you add a tool.
Health gates — stop querying a downed dependency
Custom tools often depend on an external API or DB, and while that dependency is down, an agent that keeps querying it wastes both wall-clock time and tokens. Cache the dependency's health for a short window and put a gate in front that returns UPSTREAM_UNAVAILABLE immediately when it's unhealthy.
class ToolHealth: def __init__(self, check_interval: int = 30): self._last: dict[str, float] = {} self._ok: dict[str, bool] = {} self._interval = check_interval def is_healthy(self, tool: str, checker) -> bool: now = time.monotonic() if now - self._last.get(tool, 0) < self._interval: return self._ok.get(tool, True) # return the cache try: ok = checker() except Exception: ok = False self._last[tool] = now self._ok[tool] = ok return ok
Running the real check once every 30 seconds and returning the cache in between is naive but works well. A tool whose health check is failing gets rejected the moment the agent calls it, as a retryable error — a lightweight circuit breaker. Make that error retryable: True too, so the agent resumes naturally once the dependency recovers.
Without a call log, you notice the accident too late
The reason I caught the double invoice at all was that I logged tool calls in structured form. In production, keeping every call with at least these six fields makes both avoiding errors and tracing causes far easier.
import json, loggingdef log_tool_call(session_id, tool, args, result, elapsed_ms): logging.info(json.dumps({ "session_id": session_id, "tool": tool, "args": _mask_secrets(args), # mask sensitive values "ok": result.get("ok"), "error_code": (result.get("error") or {}).get("code"), "replayed": result.get("replayed", False), # was this an idempotent replay? "elapsed_ms": elapsed_ms, }))
Keeping replayed pays off in a quiet way. You can see how often the idempotency key rejected a second execution, which tells you quantitatively how much the agent is re-calling the same operation on retry. A tool whose replay rate is climbing is diagnosing itself: either its timeout setting or its error contract is too loose. I glance at this log once a week and work through the tools with the highest replay rates first.
Ask one more question before you add a tool
One last piece of operational wisdom from a different angle than re-execution. The more custom tools you add, the more you degrade the agent's selection accuracy. When similarly named tools line up, the agent hesitates over which to call and picks the wrong one more often.
I add a criterion to the add-a-tool decision that's separate from whether the design is good. Can't a combination of the standard tools solve it? Wouldn't one extra argument on an existing tool do? And can I answer, on the spot, "is this tool safe to call twice?" If I can't answer the third one immediately, it isn't time to add it yet. That's the instinct that hardened in operation.
Start with read-only tools that return deterministic results; add write tools in stages, only once their idempotency keys and error contracts are in place. Hold that pace and you can extend functionality without breaking the "using the agent makes life easier" experience. The value of a custom tool lives not in how many you've added, but in a design that doesn't break when you add them.
Share
Thank You for Reading
Antigravity Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.