Antigravity × Durable Execution: Designing Fault-Tolerant Long-Running AI Tasks
An implementation guide for putting Antigravity agents to work on long-running jobs using Durable Execution — covering checkpointing, idempotency, and automatic retries, plus three real incidents from a 50M-download indie app and seven production pitfalls the official docs never spell out.
One morning the pipeline that aggregates yesterday's AdMob revenue died ten minutes from the finish line — an hour and a half of work, gone. I gave up on the spot and re-kicked the same job from my phone on the train into the office. I've been shipping indie apps since 2014, and the family of wallpaper and ambient apps I run has now passed 50 million downloads. Every morning that revenue gets reconciled between AdMob, App Store Connect, and Google Play. Before I introduced Durable Execution, the "almost made it" failure mode was just brutal: AdMob rate limits, Cloudflare Workers timeouts, the occasional Supabase blip — each of them meant starting from scratch.
I'm Masaki Hirokawa (@dolice). Alongside my artist practice, I've been pushing Antigravity's agent features into long-running jobs in my indie developer workflow, and Durable Execution has been the design pattern that pays back hardest. This guide walks through the implementation code, three real incidents from running a 50M-download portfolio, and seven production pitfalls the official docs never spell out — the kind of material I think actually earns being behind the membership wall.
The Three Principles That Make Durable Execution Work
Before any code, the three principles. Everything else falls out of these — and they map cleanly onto Antigravity agent design too.
Checkpointing for State Persistence
Every time a workflow step completes successfully, its result is saved to durable storage — a database, a queue, Cloudflare KV, anything that survives a restart. If the process crashes, the next run resumes from the most recent checkpoint instead of replaying every step. For my AdMob aggregation pipeline, this single change shrank average recovery time from 47 minutes to 4 minutes — roughly an 11x improvement.
Idempotency by Design
Operations must produce the same result whether they're called once or ten times. Without this, retries lead to duplicate writes, duplicate emails, double-charged customers. Missing idempotency in payment or notification paths is the kind of bug you only catch by getting publicly embarrassed.
Automatic Retry with Backoff
Transient failures (network timeouts, rate limits) should be retried with exponential backoff. Permanent failures should bubble up. The distinction matters: for AdMob, UNAVAILABLE and RESOURCE_EXHAUSTED are retryable, but PERMISSION_DENIED should fail fast — don't burn quota on something that will never succeed.
Implementing the Minimal Durable Workflow in Antigravity
Let's build a TypeScript Durable Execution skeleton that fetches data from an external API, transforms it, and persists the result.
Each step saves a checkpoint to the filesystem on success. When the process restarts, completed steps are skipped automatically and execution resumes from where it stopped. In a serverless environment local FS is ephemeral, so swap the storage backend for KV or a relational DB before going to production.
✦
Thank you for reading this far.
Continue Reading
What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.
WHAT YOU'LL LEARN
✦Three real AdMob pipeline incidents from a 50M-download indie app — recovery time cut from 47 minutes to 4 minutes
✦Seven production pitfalls the official docs never spell out (checkpoint bloat, double notifications, rate-limit blast radius, partial recovery, time drift, cold start, monitoring blind spots)
✦A four-step prompt template for getting Antigravity to generate durable workflows that actually survive real failure modes
Secure payment via Stripe · Cancel anytime
✦
Unlock This Article
Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.
Three Real Incidents the AdMob Pipeline Survived Thanks to Durable Execution
Time for the actual war stories. The revenue analytics pipeline for my wallpaper apps (50M+ downloads cumulative) kicks off every morning at 04:30 JST and runs the following:
Fetch country-level eCPM, impressions, and revenue from the AdMob Reporting API (~11 minutes)
Fetch install counts from App Store Connect and Google Play Console (~6 minutes)
Upsert into the Supabase daily_revenue table (~2 minutes)
Run anomaly detection (flag any country that moved more than ±30% week-over-week)
Post the daily report to Slack
Under the old architecture, a failure in any single step meant a full rerun — 47 minutes. With Durable Execution, only the failed step and its downstream re-run, and average recovery time drops to around 4 minutes. Three incidents I remember vividly:
Incident 1: AdMob RESOURCE_EXHAUSTED at the 36-Minute Mark
The AdMob Reporting API returns RESOURCE_EXHAUSTED when the daily quota gets hit. We were 36 minutes in, Step 1 almost done. Under the old design this would have been a 70-plus-minute disaster. With Durable Execution, the checkpoint was sliced per country chunk, so I just re-fetched the 12 missing countries in the next quota window — total downtime 8 minutes.
Incident 2: App Store Connect OAuth Token Expiry
The JWT token for App Store Connect has a 20-minute lifetime. I forgot to refresh it and Step 2 died with a 401 partway through. Because checkpoints were keyed per app ID, I just regenerated the token and re-fetched the remaining apps — 16-minute interruption, not 90 minutes. Lesson learned: token generation is now its own first executeStep, and the token gets stored in state.data so the rest of the workflow shares it.
Incident 3: Supabase Connection Pool Exhaustion
I'd cranked up parallelism on the anomaly detection job (Step 4) and exhausted the Supabase connection pool. Old world: rerun from Step 1. With Durable Execution, Steps 1–3 stayed cached, I dropped Step 4's parallelism from 12 to 4, and the entire recovery took 3 minutes.
None of these are the kind of failures you'll find in the SDK README. They're the failures you only meet after you go live. How you carve up checkpoint granularity ahead of time will change your recovery time by a factor of 10.
Trigger.dev for Production-Grade Durable Execution
In production, rolling your own checkpoint manager is a tax you'll regret. Trigger.dev is a TypeScript-native Durable Execution platform that pairs beautifully with Antigravity. For my own setup, I/O-heavy jobs like AdMob aggregation live on Trigger.dev, while lightweight anomaly batches run on Cloudflare Workers + KV. Picking the right tool per job matters more than picking the "best" one overall.
With Trigger.dev, function results are persisted automatically — no hand-rolled checkpoint code. If the server restarts, execution resumes right after the last successful step. As a rule of thumb from running this stack as an indie developer: once a job crosses 30 minutes of runtime or 5+ distinct external API calls, switching from a custom loop to Trigger.dev / Inngest / Temporal pays back in development time almost immediately.
Idempotency Patterns That Actually Hold Up
The single most overlooked piece of Durable Execution is idempotency. Without it, retries silently double everything.
// idempotency-patterns.ts// Practical idempotency patternsimport { randomUUID } from "crypto";// Pattern 1: dedupe via idempotency keyclass IdempotentExecutor { private processedKeys = new Set<string>(); async execute( idempotencyKey: string, operation: () => Promise<void> ): Promise<void> { if (this.processedKeys.has(idempotencyKey)) { console.log(`Already processed: ${idempotencyKey}`); return; } await operation(); this.processedKeys.add(idempotencyKey); }}// Pattern 2: idempotent DB writes via upsertasync function upsertRecord(db: any, record: { id: string; data: any }) { await db.query( `INSERT INTO records (id, data, updated_at) VALUES ($1, $2, NOW()) ON CONFLICT (id) DO UPDATE SET data = $2, updated_at = NOW()`, [record.id, JSON.stringify(record.data)] ); // Expected: one row inserted or updated, never duplicated}// Pattern 3: idempotent payments via transaction IDasync function processPayment(orderId: string, amount: number) { const transactionId = `txn_${orderId}_${amount}`; const payment = await stripe.paymentIntents.create( { amount: amount, currency: "jpy", metadata: { orderId }, }, { idempotencyKey: transactionId, } ); return payment; // Same transactionId can be called repeatedly — charged once}const stripe = { paymentIntents: { create: async (...args: any[]) => ({}) } };
In practice: I write AdMob revenue into Supabase keyed by (date, country, app_id) so reruns just upsert the same rows, and the Slack daily report is gated by a successful INSERT into report_sent_log keyed by (date). With those two patterns in place, double-notification incidents went to zero.
Integration Patterns with Antigravity Agents
Combining Antigravity's multi-agent features with Durable Execution lets you orchestrate multiple AI agents on long-running jobs without losing intermediate work.
Seven Production Pitfalls the Official Docs Won't Save You From
Here's the list of things I've actually tripped over — none of which the Trigger.dev / Inngest / Temporal docs surface clearly. Ordered roughly by how badly each one stings.
1. Checkpoint JSON Bloats and Slows Down Every Write
The first one bites quickly. If you stash all fetched records in state.data, a few thousand rows of AdMob data balloons the JSON to 30–80 MB and every checkpoint write costs you seconds. Store raw data in S3 / R2 and put only the storage URI in state.data. That single fix shrank Step 1 of my pipeline from 11 minutes to 7.
2. "Retry Succeeded" Quietly Sends the Notification Twice
If the order is "success → send Slack message → restart," the next retry sends the message again. Fix: gate the notification on a successful INSERT into a report_sent_log table keyed by (date). Without this, every rerun fires a duplicate daily report. I've seen this happen in my own logs and it's deeply embarrassing.
3. Rate-Limit Retries Blast a Shared Quota
Exponential backoff on RESOURCE_EXHAUSTED is correct — but if multiple jobs share the same API token, retries from one can starve the others. For AdMob I now isolate the aggregation job into a separate Google Cloud project from the telemetry client. Per-job quotas, not per-account quotas.
4. Partial Recovery Corrupts the Most Recent Data
Stop at T1, resume at T2, and data that's only "real" between T1 and T2 may quietly slip. AdMob has a one-hour reporting lag, so if you resume at T2 and ask for "yesterday" without specifying an explicit time window, you'll get partial data and your anomaly detector will misfire. Always persist the explicit [from, to] window in the checkpoint and reuse it on resume.
5. Time Drift — Mixing UTC and Local Time
This one I owned spectacularly. AdMob is UTC, App Store Connect is America/Los_Angeles, Google Play is America/Los_Angeles, the Slack post is JST. If you store "date" as a string in the checkpoint, recovery can overwrite the wrong day's row. Always store ISO 8601 with timezone in the checkpoint and convert to display timezone at the very end.
6. Cold Start Beats Your First Checkpoint Write
On Cloudflare Workers and AWS Lambda, the cold-start window sometimes consumes enough of the budget that the first KV / S3 PUT (your first checkpoint) times out. Make the very first step a lightweight "boot heartbeat" write — I explicitly drop state.lastCheckpoint = "boot-ok" before any real work begins.
This is the scariest one. Durable Execution swallows failures, which makes it easy to silently get stuck on the same step forever. No errors, no progress. The cure: emit saveCheckpoint's timestamp as a Prometheus counter or GA4 custom event, and alert if it doesn't move for 15 minutes. Adding that single alert is the difference between sleeping soundly through a nightly job and waking up to a four-day-old failure.
How to Get Antigravity to Generate This Pattern Reliably
A final section on prompt design, because the value of Durable Execution multiplies when the AI agent generating the code understands the pattern. The four-step approach below is what got my AdMob pipeline shipped in roughly a month.
Step 1: Make Plan Mode Enumerate Failure Modes First
Don't ask for code yet. Ask: "List ten things that could go wrong with this job." Rate limits, token expiry, network drops, data shape changes, mid-deploy dependencies — let Antigravity surface the surface area. Then pick the five to seven the implementation will actually handle, and put those in the spec.
Step 2: Lock Down Step Boundaries and Checkpoint Granularity Up Front
Ask: "Break this into at most five steps, with typed inputs and outputs for each." Without this, Antigravity tends to dump everything into one function. With typed step boundaries, checkpoint granularity falls out naturally.
Step 3: Specify Resume Behavior Per Step
"If Step 3 fails, retry only Step 3 — preserve Steps 1 and 2's results" beats the generic "make it durable" prompt by roughly 2x in output quality. One prompt per step, not one mega-prompt.
Step 4: Force Sandbox Failure-Injection Testing
Don't ship without it. Add a throwAt(step: number) debug flag and verify that resume from each step actually works. Skip this and the bill comes due in production a month later. Always.
In Closing
Durable Execution is the boring, load-bearing piece of running AI agents on long jobs. Since I moved my morning AdMob pipeline onto it, the number of nightly Slack alerts I get has dropped from roughly 12 a month to one. If you're looking for a first move, pick the longest job you currently run and carve in just three checkpoints. That alone changes how reruns feel.
I'm still tuning this stack as I run it. If you're an indie developer wrestling with the same long-running jobs, I hope this was useful. Thanks for reading.
Share
Thank You for Reading
Antigravity Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.