When a Cloud Nightly Batch Drifts From Yesterday's Result — An Input Contract and Snapshot Design for Reproducibility

When you push a batch to a cloud ephemeral worker via the Managed Agents API, the environment assumptions you took for granted locally vanish. Here is a three-layer design — environment snapshot, input contract, seed pinning — that keeps the same input producing the same result.

antigravity⁴³⁶ Managed Agents⁴ reproducibility⁵ batch² cloud² operations²⁶ idempotency¹¹

✦ Premium Article

This is about moving a batch I had been running on my own machine to a cloud worker via the Managed Agents API. The first few days were comfortable: it did not occupy my local CPU, it ran overnight on its own, and the report was ready by morning. But after about a week I noticed something odd. The input was supposedly identical, yet the report's structure differed between yesterday and today.

An ephemeral worker spins up in a fresh environment every time and disappears when done. That is a design benefit, but the flip side is that "everything that was obvious locally" disappears too. The config files, environment variables, caches, and small bits of input state that lived on my machine are not carried over in the cloud.

Here I share how I separated the causes of drift in a cloud batch and protected reproducibility across three layers: input contract, environment snapshot, and seed pinning. This batch also rolls up AdMob mediation numbers alongside each site's weekly report. I want the rollup to make progress between App Store and Google Play releases without me stopping to babysit it — which is exactly why daily drift was unacceptable. This is the practical assembly I arrived at as an indie developer auto-generating reports across several sites.

First, split "drift" into four causes

"It is not reproducible" cannot be fixed as a single lump. I split the cause into four and closed them one at a time.

Environment differences. The ephemeral worker's runtime and library versions differ subtly on each spin-up — a real gotcha.
Missing implicit context. Files and data I could reference locally were never handed to the cloud.
Model updates. An alias like gemini-3.5-flash can have its underlying model quietly swapped.
The model's inherent non-determinism. Temperature and sampling jitter remain.

These four call for entirely different fixes. Environment differences are handled by snapshots, implicit context by the input contract, model updates by version pinning, and non-determinism by fixing seed and temperature. Mix them together and each gets only half-fixed.

Input contract: turn implicit context into explicit arguments

What helped most was pinning the input contract as JSON. A cloud worker can assume nothing about "what should be on the machine." So I write everything the job needs into one contract object and pass only that.

// Input contract — state everything this job requires
interface BatchInputContract {
  contractVersion: "1";          // version of the contract itself
  task: "weekly-report";
  // Pass data as content or an immutable pointer, never as a loose reference
  inputs: {
    siteId: string;
    periodStart: string;         // ISO 8601 — no vague "last week"
    periodEnd: string;
    datasetUri: string;          // fixed URI on immutable object storage
    datasetSha256: string;       // hash to verify after fetch
  };
  // Specify the model by a pinned version, not an alias
  model: {
    id: "gemini-3.5-flash";
    pinnedVersion: string;       // e.g. "gemini-3.5-flash-002"
    temperature: 0;
    seed: number;
  };
  // Pin the prompt template by version as well
  promptTemplateId: string;      // e.g. "weekly-report@7"
}

Three points matter. Do not pass the period as a relative expression like "last week." Pass data as an immutable URI you can verify by hash, not a reference. Specify the model by a pinned version, not an alias. These alone sharply raised the odds that the same contract yields the same result.

The dataset hash check especially mattered. Even if datasetUri is the same, the result changes if its contents were swapped. The worker verifies datasetSha256 right after fetching and halts the job on mismatch. This catches the hardest-to-notice drift — "the input changed without anyone noticing."

✦

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN

✦Get a diagnosis flow that separates the four causes of drift on cloud ephemeral workers (environment, implicit context, model updates, non-determinism)

✦Learn a manifest design (TypeScript) that pins the input contract as JSON and stamps the environment digest and served model version onto the artifact

✦See the operating rules that dropped non-reproducible output in weekly reports from about 15% to under 1%

Secure payment via Stripe · Cancel anytime

✦

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

Unlock all articles with Membership →

Environment snapshot: stamp what it ran on onto the artifact

Even with fixed inputs, the result drifts if the environment underneath differs each time. Since you cannot always fully pin the environment on an ephemeral worker, at least stamp "what it ran on" onto the artifact.

import { createHash } from "node:crypto";
 
// Build a digest of the execution environment
function environmentDigest(): { digest: string; detail: Record<string, string> } {
  const detail: Record<string, string> = {
    runtime: process.version,                      // Node version
    platform: `${process.platform}-${process.arch}`,
    // Pinned snapshot of dependencies (lockfile hash)
    lockfileSha: process.env.LOCKFILE_SHA ?? "unknown",
    // Worker image tag (injected by Managed Agents)
    workerImage: process.env.AGY_WORKER_IMAGE ?? "unknown",
    tz: Intl.DateTimeFormat().resolvedOptions().timeZone,
  };
  const digest = createHash("sha256")
    .update(JSON.stringify(detail))
    .digest("hex")
    .slice(0, 16);
  return { digest, detail };
}

I include the time zone because it caused real harm. My machine was on Japan time, but the cloud worker came up in UTC. A process using "today's date" sometimes crossed the date boundary and picked up the previous day's data. I made the worker's time zone explicit and computed dates from the contract's periodStart / periodEnd, eliminating that class of boundary accident.

manifest: bundle input, environment, and output into one

Once the input contract and environment digest are in place, I bundle them with the output into one manifest placed next to the artifact. Reproducibility is the state where you can confirm "same input and same environment yields the same output." The manifest is the ledger that makes that confirmation possible.

interface RunManifest {
  runId: string;
  contractHash: string;          // sha256 of the whole input contract
  environment: ReturnType<typeof environmentDigest>;
  modelResolved: {
    requested: string;           // "gemini-3.5-flash"
    served: string;              // version the API actually responded with
  };
  outputSha256: string;          // hash of the generated artifact
  startedAt: string;
  finishedAt: string;
}
 
async function writeManifest(m: RunManifest, store: ObjectStore) {
  // Place manifest.json in the same location as the artifact
  await store.put(`reports/${m.runId}/manifest.json`,
    JSON.stringify(m, null, 2));
}

Always recording modelResolved.served was the crux. Even when you request the alias gemini-3.5-flash, the version the API actually responded with is in the response metadata. Record it, and when you notice "the output differs from last week," the first thing to check — "did the model change underneath?" — is visible at a glance. In fact, most non-reproducible output traced back not to input or environment, but to this quiet model swap.

How much non-determinism to allow

Even at temperature 0 with a fixed seed, a generative model's output is not guaranteed to be fully deterministic. Cloud-side parallelism and routing can leave slight jitter. Chasing "perfect reproduction" here costs more than it returns.

I redefined reproducibility as "no meaningful difference" rather than "byte-for-byte identical." It is enough that the report's numbers, conclusions, and structure are the same; wording variance is allowed. Matching outputSha256 in the manifest is ideal, but when it does not match, I separately check whether the extracted key metrics agree.

// Verify semantic reproducibility even when the strict hash diverges
function semanticMatch(a: ReportMetrics, b: ReportMetrics): boolean {
  return a.totalClicks === b.totalClicks
    && a.topQuery === b.topQuery
    && Math.abs(a.ctr - b.ctr) < 0.001;   // tolerate display-digit rounding
}

After drawing this line, operations became realistic. The share of weekly reports judged "non-reproducible" dropped from about 15% before the fix to under 1%. The remaining 1% were cases where the data genuinely changed — which means it is being correctly detected.

The last gotcha that stuck around in production was that raising worker concurrency slightly increases non-determinism. Distributing the same job to different workers shakes the response through routing differences. I recommend pinning concurrency to 1 for jobs where reproducibility matters. Being able to choose speed versus reproducibility per job makes operations easier.

Whether to push to the cloud at all

Finally, the decision of whether to push to the cloud in the first place. Ephemeral workers are convenient, but they require this kind of work to preserve reproducibility. If your local CPU has headroom and the batch input is small, running it locally instead of forcing it to the cloud can be cheaper and faster.

What I push to the cloud is only batches that take too long locally or that I want to run several at once. On top of that, I always attach the three pieces above — input contract, environment snapshot, and manifest. These three are the foundation that lets you later ask "why did the result change?" Reproducibility is not a constraint that sacrifices speed; I see it as the investment that turns cloud execution into something you can trust.

I hope this helps with the triage if you have just moved a batch to the cloud and are wrestling with drift.

Making re-runs safe

Once you have reproducibility in place, safe re-runs come as a byproduct. When a job fails, re-submitting the same contract hash runs it under the same conditions as before. That is the foundation of idempotency.

I key the artifact's storage location on the manifest's contractHash. If an artifact for the same contract hash already exists, a re-run does not overwrite it but keeps it as a generation alongside. That way I can later line up "last week's output" against "the re-run output" and confirm on the spot whether there was any drift.

The thing to watch is to re-verify datasetSha256 before every re-run. If the cause of failure was a change in the data, the hash will differ, so even the same contract halts at verification. The halt itself is the correct signal that "the input changed." Not letting re-runs succeed carelessly is, in the end, what protects reproducibility.

Thank You for Reading

Antigravity Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.