When a Timed-Out Unattended Agent Leaves a Half-Written File Behind

When a scheduled agent gets killed on timeout, it can leave a half-written file that silently poisons the next stage. Here is the atomic write, stale-temp cleanup, and post-write content assertion I use to keep unattended pipelines from breaking.

Antigravity²⁴⁵ agents⁹⁹ unattended execution reliability⁹ file writes

✦ Premium Article

One morning I checked the output of an agent I run on a schedule, and part of the generated file was cut off mid-content. The log said the run had failed, yet the truncated file was still sitting in the directory, and the next build step had happily picked it up as valid input. The cause was clear: the moment the run hit its timeout and the process was killed with SIGTERM, the agent had been in the middle of writing that file.

As an indie developer, I run several content pipelines on a nightly schedule and hand the work to agents while I sleep. When I drive things interactively, I notice if something stalls — I'm looking at the screen. Unattended, a different failure mode appears, one that never existed in interactive runs: a half-finished state lingers quietly, and nothing flags it before the next stage trusts it. The more you wire the Antigravity 2.0 CLI into cron-style unattended runs, the harder this is to avoid.

Here is how I settled on eliminating that half-written file with an atomic write, plus the two companion traps it forces you to confront: stale fixed-name temp files, and verification that an exit code simply cannot give you.

What a Timed-Out Agent Leaves Behind

When you run an unattended agent with a time limit, the platform sends the process a signal once the limit is reached. The problem is that you have no control over what the agent was doing at that exact instant.

Three kinds of debris typically remain.

What is left	What happens	Downstream impact
Half-written output file	Hit by SIGTERM mid-write; only the start is on disk	The next step treats it as valid input and propagates a broken artifact
Fixed-name temp file	A temp left over from a prior failure is read as-is this time	The prior run's content is silently mixed into this run's output
Lock file	A lock meant to be cleared on clean exit survives	The next start misreads it as "already running" and skips

The second is the nastiest. A half-written file (the first row) looks wrong the moment you read it, so you notice. But a stale fixed-name temp wears a mask of plausible correctness: the file count checks pass, the integrity checks pass, and you don't catch it until you compare the content line by line. I once nearly published an output that had a paragraph from a different article bleed into it through exactly this path.

Why an Exit Code Alone Cannot Protect You

Leaning your unattended pipeline's defenses on the exit code is a natural instinct. But none of the three cases above is fully captured by it.

Half-written file: the process exits non-zero, yet the file already partially exists. Stopping the downstream "because it failed" never cleans up the file itself.
Stale temp contamination: this run exits zero. It merely read last time's debris, so the exit code raises no alarm.
Empty file: if the destination is opened but the process dies before content is written, a zero-byte file can be left behind and treated as success.

So the defense needs two independent axes. One is atomicity: never expose a write in its half-finished state. The other is a content assertion — independent of the exit code — that confirms what landed is actually what this run intended.

✦

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN

✦A concrete temp-plus-rename pattern that eliminates the 'half-written file' a timed-out agent leaves behind

✦How unique temp names and a startup sweep stop a previous run's fixed-name temp from silently contaminating the next run

✦A content assertion that catches 'succeeded but empty / carried over last run' — failures an exit code can never see

Secure payment via Stripe · Cancel anytime

✦

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

Unlock all articles with Membership →

Make the Write Atomic — Temp and Rename

The most reliable atomic write on a filesystem is the old discipline: write everything into a differently named temp file, then rename it onto the destination path. A rename (mv) on the same filesystem is atomic, so an observer of the destination path only ever sees either "fully written" or "not there yet." The in-between state is never visible from outside.

Wired into an unattended bash run, it looks like this.

#!/usr/bin/env bash
set -euo pipefail
 
# Destination, and a unique temp in the SAME directory
DEST="out/articles/result.json"
RUN_ID="$(date +%s)-$$"                 # seconds + PID = unique per run
TMP="$(dirname "$DEST")/.tmp.${RUN_ID}.json"
 
# Always clean up the temp, however we die
cleanup() { rm -f "$TMP"; }
trap cleanup EXIT INT TERM
 
# Have the agent write into the temp
agy run generate-article --out "$TMP"
 
# Validate what landed BEFORE publishing atomically
[ -s "$TMP" ] || { echo "empty output"; exit 1; }   # reject zero-byte
python3 -c "import json,sys; json.load(open(sys.argv[1]))" "$TMP"   # reject broken JSON
 
mv -f "$TMP" "$DEST"     # rename on same FS is atomic
trap - EXIT
echo "published: $DEST"

Three things carry the weight here. First, have the agent write into the temp — let it write the destination directly and that path is corrupted the instant it is killed. Second, trap ... EXIT INT TERM removes the temp no matter how the run ends; the timeout's SIGTERM is caught here too. Third, validate size and content right before mv, and only proceed to publish if validation passes.

In Node the idea is identical — slip an fsync in before the rename.

import { writeFile, rename, open } from "node:fs/promises";
import { dirname, join } from "node:path";
 
async function atomicWrite(dest, data) {
  const runId = `${Date.now()}-${process.pid}`;
  const tmp = join(dirname(dest), `.tmp.${runId}`);
  await writeFile(tmp, data, "utf8");
 
  // Flush the OS buffer to disk before renaming
  const fh = await open(tmp, "r");
  await fh.sync();
  await fh.close();
 
  await rename(tmp, dest);   // atomic on the same FS
}

The reason for fsync (fh.sync() in Node) is that even though rename itself is atomic, a power loss could in theory leave you with "the rename committed but the file's data has not reached the disk yet." For something running unattended around the clock, I prefer not to skip it.

Two Disciplines to Avoid Stepping on Last Run's Debris

Atomic writes eliminate the half-written file, but stale fixed-name temp contamination needs separate discipline. After getting burned by it, I now keep two rules without exception.

1. Never give a temp file a fixed name

A fixed name like /tmp/insert.txt is a breeding ground: the debris from a run that failed to write gets read verbatim next time. Give each run a unique name (seconds + PID, or a name that includes the target slug) as the code above does, and there is structurally no way for this run to pick up last run's temp. Also keep the temp in the same directory as the destination. Spanning two filesystems (say /tmp and your working directory) makes rename non-atomic — it degrades to copy-then-delete and the atomicity is gone.

2. Sweep old temps at startup

The trap handles the common case, but if the process is force-killed with SIGKILL (kill -9), even the trap does not run. So I add a safety net that sweeps my own temps older than a threshold at startup.

# At startup: remove abandoned temps older than 1 hour (only my naming scheme)
find out/articles -name '.tmp.*' -mmin +60 -delete 2>/dev/null || true

Restricting the sweep to "only files matching my own naming scheme" is essential. A broad pattern risks deleting an in-progress file from another agent working in the same directory.

Separate the Post-Write Content Assertion From the Exit Code

The final defense is content verification independent of the exit code. Before publishing with mv, confirm that what landed is what this run intended. Beyond rejecting zero-byte or broken syntax, I check run-specific markers: that this run's slug appears in the body, and that the previous run's slug did not sneak in.

# Just before publish: confirm this run's slug is present and no prior-run trace
grep -q "$SLUG" "$TMP"               || { echo "self-marker missing"; exit 1; }
grep -q "$PREV_SLUG" "$TMP" 2>/dev/null && { echo "previous-run contamination"; exit 1; }

There is one more operational call worth making. Do not fold verification and publishing into a single step. If you write "verify, and on pass mv in the same flow," a mid-step failure in the verification can be swallowed, and you proceed to publish unverified content. I keep verification as a gate I confirm has passed, then publish as a separate step. It is unglamorous, but across hundreds of unattended runs that separation earns its keep.

Where to Draw the Line

Almost none of this is needed for interactive runs. If you are watching the screen, you spot a half-written file immediately and delete it by hand. Atomic writes and content assertions truly pay off when the run is unattended and the output becomes the input of a later automated stage. Pipelines that chain artifacts through the Antigravity CLI or scheduled tasks are exactly that case.

Conversely, for a one-shot where a human inspects the result before moving on, all of this is overkill. My rule for whether to spend the cost is simple: does a later stage trust this output before anyone looks at it? Building pipelines as an indie developer, the temptation is to harden everything — but put the hardening in the wrong place and you have only added complexity.

Pick one artifact — the most downstream one nobody inspects — and replace just its write with temp-plus-rename. That alone should sharply cut the "wake up to a broken artifact" class of incident.

Thank You for Reading

Antigravity Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.