Your Antigravity Custom Tools Don't Break by Design — They Break on Re-execution: Field Notes on Idempotency and Error Contracts

Once you add a custom tool to an Antigravity agent, the real production problem is re-execution and duplicated side effects. Here are the idempotency keys, error contracts, health gates, and tool-sprawl checks that actually held up in practice.

antigravity³⁸⁶ agents¹⁰⁰ tool-use reliability¹⁰ idempotency⁹

✦ Premium Article

An agent usually behaves beautifully right after you add a custom tool. The trouble shows up a few days later, when that same tool starts getting called more often than you expected. As an indie developer, I run a few automation agents across my own projects, and on one of them I added a single tool that talked to an internal API. The first week was flawless. The second week, I noticed the failure for the first time in the worst possible form: the same invoice had been created twice. The cause was not the tool's implementation. The agent had read a timeout as a failure and re-called an operation that had, in fact, already succeeded.

Articles on designing custom tools tend to stop at schemas and permission boundaries. But what actually matters in operation is resilience to re-execution — the property that nothing breaks when the same tool is called twice. These are the implementations and decision rules that earned their keep from that angle.

Where duplicate side effects come from

There are more paths to a double call than you'd think. A few recur.

One: the tool actually succeeds, but the response times out, and the agent decides it failed and retries. Two: midway through a long task, context gets compacted, the agent forgets a call it already made, and reconstructs it. Three: in a parallel-agent setup, two workers pick up the same task.

What they share is how they look from the tool's side — two near-identical calls arriving within a short window. That means you can mount your defense on the argument side.

Idempotency keys — never run the same operation twice

For write and destructive tools, make an idempotency key mandatory. The key is a string that names the unit "this operation should run exactly once." The caller (the agent) generates it; the tool stores it. When a second call arrives with the same key, the tool returns the first result instead of re-executing.

import time
 
class IdempotencyStore:
    """Holds idempotency keys and first results with a TTL.
    In production, back this with a shared store like Redis so parallel workers share it."""
    def __init__(self, ttl_seconds: int = 86400):
        self._store: dict[str, tuple[float, dict]] = {}
        self._ttl = ttl_seconds
 
    def get(self, key: str) -> dict | None:
        entry = self._store.get(key)
        if not entry:
            return None
        ts, result = entry
        if time.time() - ts > self._ttl:
            self._store.pop(key, None)
            return None
        return result
 
    def put(self, key: str, result: dict) -> None:
        self._store[key] = (time.time(), result)
 
 
def create_invoice(args: dict, idem: IdempotencyStore) -> dict:
    key = args.get("idempotency_key")
    if not key:
        return {"ok": False, "error": {"code": "MISSING_IDEMPOTENCY_KEY",
                "message": "Write operations require an idempotency_key.", "retryable": False}}
    cached = idem.get(key)
    if cached is not None:
        # On the second call onward, cause no side effect; return the first result
        return {**cached, "replayed": True}
    result = {"ok": True, "data": _do_create_invoice(args)}
    idem.put(key, result)
    return result

The point is that the caller generates the key. You could derive a key from a hash of the arguments on the tool side, but then you can't distinguish a legitimate "I deliberately want to create this twice" from an accidental re-execution. Make it a contract that the agent issues a unique key per operation and reuses the same key when retrying, and the two cases separate cleanly.

In the schema, make idempotency_key a required argument on write tools, and put the contract in the description: this operation is once-only; reuse the same key to retry.

CREATE_INVOICE_SCHEMA = {
    "name": "create_invoice",
    "description": "Creates an invoice. Idempotent. Reuse the same idempotency_key when retrying.",
    "parameters": {
        "type": "object",
        "properties": {
            "customer_id": {"type": "string"},
            "amount_cents": {"type": "integer", "minimum": 1},
            "idempotency_key": {
                "type": "string",
                "description": "A string uniquely identifying this invoice creation. Reuse the same value on retry.",
            },
        },
        "required": ["customer_id", "amount_cents", "idempotency_key"],
    },
}

Set the TTL by the nature of the operation. For things where "doing the same operation again tomorrow is almost certainly an accident" — invoices, payments — go 24 hours or more. For something like a notification, where you only need to block short-window duplicates, a few minutes is plenty. I give the destructive ones the longest windows.

✦

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN

✦An idempotency-key design that keeps re-execution from doubling side effects (who generates it, where it's stored, TTL)

✦An error contract the agent can branch on — a schema with retryable and a grace period

✦A health gate so the agent stops hammering a downed dependency, plus a rule that curbs tool sprawl

Secure payment via Stripe · Cancel anytime

✦

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

Unlock all articles with Membership →

Error contracts — return failures in a form the agent can act on

Re-execution resilience and error handling are the same problem from two sides, because whether the agent mistakes a call for a failure and retries depends on the shape of the error you return. Hand back a raw stack trace and the agent can't tell whether it's retryable or fatal, so it tends to make the worst choice: just call again.

Return tool errors in a structure the agent can branch on. At minimum, fix three fields: an error code, a human-readable message, and whether it's retryable.

def get_order_status(order_id: str) -> dict:
    try:
        return {"ok": True, "data": adapter.get_status(order_id)}
    except ValueError as e:
        return {"ok": False, "error": {
            "code": "INVALID_INPUT", "message": str(e), "retryable": False}}
    except TransientAPIError:
        return {"ok": False, "error": {
            "code": "UPSTREAM_UNAVAILABLE",
            "message": "The order API is temporarily unresponsive.",
            "retryable": True, "retry_after_seconds": 30}}
    except NotFoundError:
        return {"ok": False, "error": {
            "code": "NOT_FOUND",
            "message": f"Order {order_id} does not exist.",
            "retryable": False}}

The trick is to deliberately grow the paths that return retryable: False. Invalid input, a missing resource, insufficient permission — none of these change no matter how many times you call. Just telling the agent clearly that these are not retryable cuts pointless re-calls visibly. Reserve retryable: True for things with a real prospect of clearing up given time, like a transient downstream outage, and attach retry_after_seconds so the agent doesn't immediately hammer it again.

Keep the set of error codes common across your tools. The table below maps the codes I actually use to the behavior I expect from the agent.

Error code	Meaning	retryable	Expected agent behavior
`INVALID_INPUT`	Bad arguments	false	Fix the arguments first. Don't retry with the same ones
`NOT_FOUND`	Target doesn't exist	false	Identify the target by another route
`PERMISSION_DENIED`	Insufficient rights	false	Don't call; defer to a human
`UPSTREAM_UNAVAILABLE`	Transient downstream outage	true	Retry after `retry_after_seconds`
`RATE_LIMITED`	Too many calls	true	Retry after the grace period; reduce parallelism
`CONFLICT`	State conflict	false	Re-fetch the latest state before deciding

Drop this table straight into your descriptions or system prompt and the agent's retry behavior settles down. A shared code system also pays off because you no longer have to re-teach behavior from scratch every time you add a tool.

Health gates — stop querying a downed dependency

Custom tools often depend on an external API or DB, and while that dependency is down, an agent that keeps querying it wastes both wall-clock time and tokens. Cache the dependency's health for a short window and put a gate in front that returns UPSTREAM_UNAVAILABLE immediately when it's unhealthy.

class ToolHealth:
    def __init__(self, check_interval: int = 30):
        self._last: dict[str, float] = {}
        self._ok: dict[str, bool] = {}
        self._interval = check_interval
 
    def is_healthy(self, tool: str, checker) -> bool:
        now = time.monotonic()
        if now - self._last.get(tool, 0) < self._interval:
            return self._ok.get(tool, True)   # return the cache
        try:
            ok = checker()
        except Exception:
            ok = False
        self._last[tool] = now
        self._ok[tool] = ok
        return ok

Running the real check once every 30 seconds and returning the cache in between is naive but works well. A tool whose health check is failing gets rejected the moment the agent calls it, as a retryable error — a lightweight circuit breaker. Make that error retryable: True too, so the agent resumes naturally once the dependency recovers.

Without a call log, you notice the accident too late

The reason I caught the double invoice at all was that I logged tool calls in structured form. In production, keeping every call with at least these six fields makes both avoiding errors and tracing causes far easier.

import json, logging
 
def log_tool_call(session_id, tool, args, result, elapsed_ms):
    logging.info(json.dumps({
        "session_id": session_id,
        "tool": tool,
        "args": _mask_secrets(args),          # mask sensitive values
        "ok": result.get("ok"),
        "error_code": (result.get("error") or {}).get("code"),
        "replayed": result.get("replayed", False),  # was this an idempotent replay?
        "elapsed_ms": elapsed_ms,
    }))

Keeping replayed pays off in a quiet way. You can see how often the idempotency key rejected a second execution, which tells you quantitatively how much the agent is re-calling the same operation on retry. A tool whose replay rate is climbing is diagnosing itself: either its timeout setting or its error contract is too loose. I glance at this log once a week and work through the tools with the highest replay rates first.

Ask one more question before you add a tool

One last piece of operational wisdom from a different angle than re-execution. The more custom tools you add, the more you degrade the agent's selection accuracy. When similarly named tools line up, the agent hesitates over which to call and picks the wrong one more often.

I add a criterion to the add-a-tool decision that's separate from whether the design is good. Can't a combination of the standard tools solve it? Wouldn't one extra argument on an existing tool do? And can I answer, on the spot, "is this tool safe to call twice?" If I can't answer the third one immediately, it isn't time to add it yet. That's the instinct that hardened in operation.

Start with read-only tools that return deterministic results; add write tools in stages, only once their idempotency keys and error contracts are in place. Hold that pace and you can extend functionality without breaking the "using the agent makes life easier" experience. The value of a custom tool lives not in how many you've added, but in a design that doesn't break when you add them.

Thank You for Reading

Antigravity Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.