ANTIGRAVITY LABJP
Articles/Integrations
Integrations/2026-06-14Advanced

Trusting Temporal Workflows in Production — Field Notes on Idempotency, Retry Triage, and Saga Compensation

Practical notes from running Temporal as a production backend: how to make activities idempotent for real, where to draw the line between retryable and fatal errors, how to keep Saga compensation from firing twice, and how to make it all observable—built with Antigravity in the loop.

Temporalworkflowsidempotency6SagaretriesAntigravity233distributed systems

Premium Article

It started with a duplicate-charge ticket at midnight

A few weeks after moving a payment-and-provisioning flow onto Temporal, I got a ticket: "I was billed twice for the same order." The logs told a familiar story. The charge activity had been judged timed out and retried, but the first attempt had actually succeeded. The response just never made it back.

Temporal is powerful because it treats the workflow code itself as durable execution state—if a worker dies, it resumes from exactly the right point. That same strength cuts both ways. Activities will happily run more than once depending on how you wrote them, and the moment you forget that, side effects double up.

These are field notes from putting Temporal under a real backend, organized around four things I actually tripped on: idempotency, retry triage, Saga compensation, and observability. This isn't an introduction—it's meant for people already running Temporal who want fewer "we got burned here" moments. I build most of this with an Antigravity agent, so I'll also point out which decisions a human still needs to keep hold of.

Designing for at-least-once changes everything

The guarantee Temporal gives an activity is at-least-once, not exactly-once. A timeout, a worker crash, a transient network failure—any of them and Temporal calls the activity again.

The subtle part is that a retry can happen not because the activity failed, but because it succeeded and the result didn't come back. If a payment API completes the work but the connection drops before the response returns, Temporal sees a failure and tries again. So every activity with a side effect has to be idempotent, full stop.

How you achieve that depends on the side effect. For writes to your own database, a unique constraint plus ON CONFLICT absorbs duplicates. For an external API, riding on the provider's idempotency-key feature is the reliable path.

// src/temporal/activities/billing.ts
import { ApplicationFailure } from '@temporalio/activity';
import { db, charges } from '../../db';
import { stripe } from '../../lib/stripe';
 
interface ChargeInput {
  orderId: string;   // a stable ID already fixed by the workflow
  customerId: string;
  amount: number;
  currency: string;
}
 
/**
 * Charge activity. Idempotency is enforced in two layers:
 *   1) Stripe's idempotencyKey makes the provider process one request once
 *   2) Our own charges table records the result under a unique key,
 *      so a re-run trusts the record over re-calling the API
 */
export async function chargeCustomer(input: ChargeInput): Promise<string> {
  const { orderId, customerId, amount, currency } = input;
 
  // Check our own record first. If a success already exists, return without calling out.
  const existing = await db.query.charges.findFirst({
    where: (c, { eq }) => eq(c.orderId, orderId),
  });
  if (existing?.status === 'succeeded') {
    return existing.stripeChargeId;
  }
 
  // Use orderId itself as the idempotency key—retries won't double-charge.
  const intent = await stripe.paymentIntents.create(
    { amount, currency, customer: customerId, confirm: true },
    { idempotencyKey: `charge-${orderId}` },
  );
 
  if (intent.status !== 'succeeded') {
    // A business failure; retrying won't change the outcome, so make it non-retryable.
    throw ApplicationFailure.create({
      message: `Payment did not complete: ${intent.status}`,
      type: 'PaymentNotCompleted',
      nonRetryable: true,
    });
  }
 
  await db
    .insert(charges)
    .values({ orderId, stripeChargeId: intent.id, status: 'succeeded', amount })
    .onConflictDoNothing();
 
  return intent.id;
}

What makes this work is that the idempotency key is a stable ID fixed by the workflow. If you call crypto.randomUUID() inside the activity and use that as the key, it changes on every retry and idempotency collapses. Generate the key-bearing ID in the workflow body and pass it as an argument. Temporal replays workflows deterministically, so an ID generated in the workflow stays the same across retries.

My midnight duplicate charge was exactly this: the "check our own record first" step was missing, and the key was being generated inside the activity. Leaning on Stripe's key alone wasn't enough once the key itself wasn't stable.

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN
Concrete ways to make activities idempotent under at-least-once execution, and how to choose between your own dedupe and an external idempotency key
A type-driven split between errors worth retrying and ones to stop immediately, operated through nonRetryableErrorTypes
Keeping Saga compensation safe when it runs partially: idempotent rollbacks and tracing them through OpenTelemetry
Secure payment via Stripe · Cancel anytime

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

or
Unlock all articles with Membership →
Share

Thank You for Reading

Antigravity Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

Related Articles

Integrations2026-06-09
Rebuilding Wallpaper Image Delivery Around Resolution Buckets — Letting an Antigravity Agent Own Conversion and Validation
Every new device resolution quietly makes a wallpaper app heavier. I stopped shipping one master image to every device and rebuilt delivery around resolution buckets, WebP/AVIF, and an edge redirect — then handed conversion and validation to an Antigravity agent. Real code and thresholds included.
Integrations2026-06-02
Fixing self-signed certificate in chain When Antigravity Can't Connect
On networks with a corporate proxy or antivirus TLS inspection, Antigravity may log self-signed certificate in chain or unable to verify the first certificate and fail to reach the model. Here is what causes it and how to fix it.
Integrations2026-06-01
Fixing spawn npx ENOENT When an Antigravity MCP Server Won't Start
Your MCP server config JSON looks correct, but Antigravity logs spawn npx ENOENT and the server stays gray. This is almost always a PATH inheritance problem, not a broken server. Here is how to diagnose and fix it.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →