Articles/Agents & Manager

◈ Agents & Manager/2026-06-18Advanced

When Your Antigravity Agent's Usage Ledger Quietly Drifts From Stripe's Bill — Field Notes on Idempotency, Late Events, and Reconciliation

Usage-based billing for Antigravity agents fails silently when your internal usage ledger and Stripe's Meter Events aggregation drift apart. Field notes on idempotency keys, absorbing late events, the 35-day window, and a daily reconciliation job.

Antigravity²⁴⁴ AgentKit¹⁷ Stripe¹⁴ Meter Events Usage Billing Idempotency² Reconciliation Audit Trail

✦ Premium Article

You Find Out at Invoice Time

The scariest failure in running a usage-billed agent isn't a crash. A crash you notice. It's the slow divergence between what your own dashboard says a customer used and what Stripe finalizes on the invoice at month end.

Nothing stops. One customer gets billed for more than they ran; another gets undercharged. You learn about it when a support ticket arrives — never before.

These are field notes on keeping your internal usage ledger and Stripe's meter aggregation in agreement when you bill Antigravity AgentKit 2.0 agent execution through Stripe Meter Events. We'll work through idempotent metering, absorbing late events, handling the month boundary, and a daily reconciliation job — closing the paths where drift creeps in, one at a time.

As an indie developer, I've put agent features behind usage billing across a few of my own projects, and I've come to think the hard part isn't the billing itself. It's staying in a state where you can prove the billing is correct. Transparency isn't a feature you ship; it's something you defend in operations.

Drift Is Born in Three Layers

When the ledger and Stripe's totals disagree, the cause is almost always in one of three layers. Mapping them first tells the reconciliation job what to look at.

Layer	Typical drift	Direction
Measurement	Retries or parallel runs record the same step twice	Overbilling
Delivery	Send fails and is dropped, or buffer overflows and loses events	Underbilling
Aggregation boundary	35-day window exceeded, or month-boundary misfiling	Underbilling

Overbilling destroys trust in one stroke; underbilling quietly erodes margin. Neither is acceptable. The reconciliation job's role is to compare, at each of these layers, "what the internal side knows" against "what Stripe accepted."

✦

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN

✦An idempotency key built from execution ID and step number that defeats double-counting from retries and parallel runs

✦A daily job that reconciles your own usage ledger against Stripe's meter summary and flags only drift past a threshold

✦The exact paths by which usage never lands on the invoice — latency, month boundaries, the 35-day window — and how to close each one

Secure payment via Stripe · Cancel anytime

✦

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

Unlock all articles with Membership →

First, Pin the Unit to One Thing

Before you can argue about drift, an unstable unit of measurement makes reconciliation impossible. You can meter agent usage by steps, output tokens, or wall-clock time — but cost-correlation and explainability never come for free in the same unit.

What I settled on in practice is steps as the base unit, with only heavy operations (image generation, large-scale search) split into a separate token-based meter. You explain everything to users with one word — "steps" — and absorb cost variance by billing heavy work separately.

The crucial part: keep that conversion rule identical on both the internal ledger and the Stripe meter. If the ledger counts a heavy operation as three steps, send the same post-conversion value to the meter. Apply the conversion on only one side and reconciliation will never balance.

Decide the Idempotency Key by Execution ID × Step Number

The entrance to overbilling is almost always retries. When the network wobbles, the meter send is retried and the same step gets counted twice. Stripe Meter Events deduplicates when the identifier (idempotency key) matches, so make that key deterministic and unguessable.

import crypto from 'node:crypto';
 
function meterIdentifier(executionId, stepNumber) {
  // Give executionId UUIDv4-or-better entropy. A guessable value like
  // timestamp + userId can collide across parallel executions.
  return crypto
    .createHash('sha256')
    .update(`${executionId}:${stepNumber}`)
    .digest('hex')
    .slice(0, 40);
}

The same execution and the same step always produce the same key, so no matter how many times you resend, Stripe counts it once. Conversely, weak executionId generation lets a different execution produce the same key and "swallow" one of them — that's underbilling. Don't cut corners on execution-ID minting.

Then store that identifier in the internal ledger as its primary key. A unique constraint on the same key means the ledger and the meter deduplicate on the same unit, which makes later reconciliation straightforward.

CREATE TABLE usage_ledger (
  identifier   TEXT PRIMARY KEY,      -- identical to meterIdentifier
  customer_id  TEXT NOT NULL,
  event_name   TEXT NOT NULL,
  value        INTEGER NOT NULL,
  occurred_at  INTEGER NOT NULL,      -- event occurrence (UNIX seconds)
  sent_at      INTEGER,               -- Stripe-accepted time. NULL = unsent
  billing_month TEXT NOT NULL         -- 'YYYY-MM' derived from occurred_at
);

Rows whose sent_at is still NULL are exactly the underbilling-in-waiting. The reconciliation job keys off this column.

Assume Latency, Insert a Buffer

Sending the meter the instant an execution finishes is ideal, but long-running or offline executions can't deliver the completion event to Stripe right away. Decouple the send from the execution path: treat the write to the ledger as canonical, and hand delivery to an async worker.

// Execution side: once the ledger write succeeds, measurement is "final"
async function recordUsage(db, { executionId, stepNumber, customerId, value, occurredAt }) {
  const identifier = meterIdentifier(executionId, stepNumber);
  const month = new Date(occurredAt * 1000).toISOString().slice(0, 7);
  await db.prepare(
    `INSERT INTO usage_ledger (identifier, customer_id, event_name, value, occurred_at, billing_month)
     VALUES (?, ?, 'agent_steps', ?, ?, ?)
     ON CONFLICT(identifier) DO NOTHING`
  ).bind(identifier, customerId, value, occurredAt, month).run();
}
 
// Worker side: pick up unsent rows, send to Stripe, stamp sent_at on success
async function flushLedger(db, stripe) {
  const { results } = await db.prepare(
    `SELECT * FROM usage_ledger WHERE sent_at IS NULL ORDER BY occurred_at LIMIT 200`
  ).all();
  for (const row of results) {
    try {
      await stripe.billing.meterEvents.create({
        event_name: row.event_name,
        payload: { stripe_customer_id: row.customer_id, value: String(row.value) },
        identifier: row.identifier,
        timestamp: row.occurred_at,
      });
      await db.prepare(`UPDATE usage_ledger SET sent_at = ? WHERE identifier = ?`)
        .bind(Math.floor(Date.now() / 1000), row.identifier).run();
    } catch (e) {
      // Leave sent_at unstamped on failure. The idempotency key makes a resend safe.
      console.error('meter flush failed', row.identifier, e.message);
    }
  }
}

This way the execution side never depends on Stripe's availability. If Stripe is down, measurement stays in the ledger and the worker sends it in order once it recovers. Sorting by occurred_at keeps the month-boundary logic below clean.

The 35-Day Window and the Month Boundary

Stripe Meter Events, by default, only accepts events that occurred within the past 35 days. If your buffer backs up past that window, the send appears to succeed but never lands in the aggregation — the worst kind of silent underbilling.

Two defenses. First, raise an alert when an unsent row's occurred_at falls more than 30 days behind now, and clear it before the window runs out. Second, unify month attribution on occurred_at — decide the month by occurrence time, not send time. That's why the ledger derives billing_month from the occurrence time.

Even for an execution that starts at 23:59 on the last day and finishes at 00:03 on the first, each step's month is fixed by its occurrence time, so the call never wavers. A delayed send still lands in the right month because Stripe uses the timestamp you pass. Lean on send time instead, and you create after-the-fact deltas on an already-finalized prior-month invoice — the thing accounting hates most.

The Daily Reconciliation Job — Catch Only Drift Past the Threshold

This is the heart of operations. Every day, pull the internal ledger's month-to-date total and Stripe's meter summary, and compare them. Demanding an exact match means a few in-flight events trip an alert every single day until you're numb to it, so set a tolerance and surface only meaningful drift.

async function reconcile(db, stripe, { meterId, month, customerId }) {
  // 1) Internal ledger: sent total for the month
  const ledger = await db.prepare(
    `SELECT COALESCE(SUM(value), 0) AS total
       FROM usage_ledger
      WHERE customer_id = ? AND billing_month = ? AND sent_at IS NOT NULL`
  ).bind(customerId, month).first();
 
  // 2) Stripe meter summary (month total)
  const start = Math.floor(new Date(`${month}-01T00:00:00Z`).getTime() / 1000);
  const end = Math.floor(Date.now() / 1000);
  const summaries = await stripe.billing.meters.eventSummaries.list(meterId, {
    customer: customerId,
    start_time: start,
    end_time: end,
    value_grouping_window: 'day',
  });
  const stripeTotal = summaries.data.reduce((s, x) => s + x.aggregated_value, 0);
 
  // 3) Evaluate drift. Threshold on both absolute and relative difference.
  const drift = ledger.total - stripeTotal;
  const ratio = stripeTotal === 0 ? (ledger.total === 0 ? 0 : 1) : drift / stripeTotal;
  const significant = Math.abs(drift) > 50 || Math.abs(ratio) > 0.02;
 
  return { customerId, month, ledgerTotal: ledger.total, stripeTotal, drift, ratio, significant };
}

Watching both the absolute difference (over 50 steps) and the relative one (over 2%) matters because the size of a "difference worth caring about" scales with usage. A 50-step gap on a customer running 100k steps a month is noise; the same 50-step gap on a 300-step customer is a 16% over- or under-charge. Either threshold alone misses one of those regimes.

Surface only customers where significant is set in the daily report, and cross-check against the backlog of unsent rows (sent_at IS NULL). That tells you whether the drift is "temporary, from send latency" or "genuinely lost measurement." The former clears by the next day; the latter needs a fix.

The Audit Trail Is the Premise of Billing

Usage billing lives or dies on whether you can show a breakdown the instant a user says "this month is too high." Keep identifier, occurrence time, step value, the tool used, and the execution ID in the internal ledger and that record is your audit trail.

For support, having one small feature that exports the month's ledger as a per-day, per-agent CSV ends the explanation in a single round trip. In my experience, most complaints settle the moment the breakdown is on screen. People distrust an amount they can't have explained, more than the amount itself.

Keep the history of drift the reconciliation job flagged, too. Being able to answer "what was that gap last month?" later is what compounds internal trust in the metering platform.

The Next Step

If you have an agent running today, start by adding a sent_at column to your ledger and writing one query that counts rows still unsent after 24 hours. If that count isn't zero, you already have the seed of underbilling.

You can begin the real reconciliation job with a loose threshold — say 5% relative — and tighten it later. Watching the daily report until you learn where your service's "normal jitter" lives, then narrowing in, avoids needless alert fatigue. Correctness in metering isn't built once and forgotten; I treat it as something you keep verifying, a little every day.

Thank You for Reading

Antigravity Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.