When a Timed-Out Unattended Agent Leaves a Half-Written File Behind
When a scheduled agent gets killed on timeout, it can leave a half-written file that silently poisons the next stage. Here is the atomic write, stale-temp cleanup, and post-write content assertion I use to keep unattended pipelines from breaking.
One morning I checked the output of an agent I run on a schedule, and part of the generated file was cut off mid-content. The log said the run had failed, yet the truncated file was still sitting in the directory, and the next build step had happily picked it up as valid input. The cause was clear: the moment the run hit its timeout and the process was killed with SIGTERM, the agent had been in the middle of writing that file.
As an indie developer, I run several content pipelines on a nightly schedule and hand the work to agents while I sleep. When I drive things interactively, I notice if something stalls — I'm looking at the screen. Unattended, a different failure mode appears, one that never existed in interactive runs: a half-finished state lingers quietly, and nothing flags it before the next stage trusts it. The more you wire the Antigravity 2.0 CLI into cron-style unattended runs, the harder this is to avoid.
Here is how I settled on eliminating that half-written file with an atomic write, plus the two companion traps it forces you to confront: stale fixed-name temp files, and verification that an exit code simply cannot give you.
What a Timed-Out Agent Leaves Behind
When you run an unattended agent with a time limit, the platform sends the process a signal once the limit is reached. The problem is that you have no control over what the agent was doing at that exact instant.
Three kinds of debris typically remain.
What is left
What happens
Downstream impact
Half-written output file
Hit by SIGTERM mid-write; only the start is on disk
The next step treats it as valid input and propagates a broken artifact
Fixed-name temp file
A temp left over from a prior failure is read as-is this time
The prior run's content is silently mixed into this run's output
Lock file
A lock meant to be cleared on clean exit survives
The next start misreads it as "already running" and skips
The second is the nastiest. A half-written file (the first row) looks wrong the moment you read it, so you notice. But a stale fixed-name temp wears a mask of plausible correctness: the file count checks pass, the integrity checks pass, and you don't catch it until you compare the content line by line. I once nearly published an output that had a paragraph from a different article bleed into it through exactly this path.
Why an Exit Code Alone Cannot Protect You
Leaning your unattended pipeline's defenses on the exit code is a natural instinct. But none of the three cases above is fully captured by it.
Half-written file: the process exits non-zero, yet the file already partially exists. Stopping the downstream "because it failed" never cleans up the file itself.
Stale temp contamination: this run exits zero. It merely read last time's debris, so the exit code raises no alarm.
Empty file: if the destination is opened but the process dies before content is written, a zero-byte file can be left behind and treated as success.
So the defense needs two independent axes. One is atomicity: never expose a write in its half-finished state. The other is a content assertion — independent of the exit code — that confirms what landed is actually what this run intended.
✦
Thank you for reading this far.
Continue Reading
What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.
WHAT YOU'LL LEARN
✦A concrete temp-plus-rename pattern that eliminates the 'half-written file' a timed-out agent leaves behind
✦How unique temp names and a startup sweep stop a previous run's fixed-name temp from silently contaminating the next run
✦A content assertion that catches 'succeeded but empty / carried over last run' — failures an exit code can never see
Secure payment via Stripe · Cancel anytime
✦
Unlock This Article
Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.
The most reliable atomic write on a filesystem is the old discipline: write everything into a differently named temp file, then rename it onto the destination path. A rename (mv) on the same filesystem is atomic, so an observer of the destination path only ever sees either "fully written" or "not there yet." The in-between state is never visible from outside.
Wired into an unattended bash run, it looks like this.
#!/usr/bin/env bashset -euo pipefail# Destination, and a unique temp in the SAME directoryDEST="out/articles/result.json"RUN_ID="$(date +%s)-$$" # seconds + PID = unique per runTMP="$(dirname "$DEST")/.tmp.${RUN_ID}.json"# Always clean up the temp, however we diecleanup() { rm -f "$TMP"; }trap cleanup EXIT INT TERM# Have the agent write into the tempagy run generate-article --out "$TMP"# Validate what landed BEFORE publishing atomically[ -s "$TMP" ] || { echo "empty output"; exit 1; } # reject zero-bytepython3 -c "import json,sys; json.load(open(sys.argv[1]))" "$TMP" # reject broken JSONmv -f "$TMP" "$DEST" # rename on same FS is atomictrap - EXITecho "published: $DEST"
Three things carry the weight here. First, have the agent write into the temp — let it write the destination directly and that path is corrupted the instant it is killed. Second, trap ... EXIT INT TERMremoves the temp no matter how the run ends; the timeout's SIGTERM is caught here too. Third, validate size and content right before mv, and only proceed to publish if validation passes.
In Node the idea is identical — slip an fsync in before the rename.
import { writeFile, rename, open } from "node:fs/promises";import { dirname, join } from "node:path";async function atomicWrite(dest, data) { const runId = `${Date.now()}-${process.pid}`; const tmp = join(dirname(dest), `.tmp.${runId}`); await writeFile(tmp, data, "utf8"); // Flush the OS buffer to disk before renaming const fh = await open(tmp, "r"); await fh.sync(); await fh.close(); await rename(tmp, dest); // atomic on the same FS}
The reason for fsync (fh.sync() in Node) is that even though rename itself is atomic, a power loss could in theory leave you with "the rename committed but the file's data has not reached the disk yet." For something running unattended around the clock, I prefer not to skip it.
Two Disciplines to Avoid Stepping on Last Run's Debris
Atomic writes eliminate the half-written file, but stale fixed-name temp contamination needs separate discipline. After getting burned by it, I now keep two rules without exception.
1. Never give a temp file a fixed name
A fixed name like /tmp/insert.txt is a breeding ground: the debris from a run that failed to write gets read verbatim next time. Give each run a unique name (seconds + PID, or a name that includes the target slug) as the code above does, and there is structurally no way for this run to pick up last run's temp. Also keep the temp in the same directory as the destination. Spanning two filesystems (say /tmp and your working directory) makes rename non-atomic — it degrades to copy-then-delete and the atomicity is gone.
2. Sweep old temps at startup
The trap handles the common case, but if the process is force-killed with SIGKILL (kill -9), even the trap does not run. So I add a safety net that sweeps my own temps older than a threshold at startup.
# At startup: remove abandoned temps older than 1 hour (only my naming scheme)find out/articles -name '.tmp.*' -mmin +60 -delete 2>/dev/null || true
Restricting the sweep to "only files matching my own naming scheme" is essential. A broad pattern risks deleting an in-progress file from another agent working in the same directory.
Separate the Post-Write Content Assertion From the Exit Code
The final defense is content verification independent of the exit code. Before publishing with mv, confirm that what landed is what this run intended. Beyond rejecting zero-byte or broken syntax, I check run-specific markers: that this run's slug appears in the body, and that the previous run's slug did not sneak in.
# Just before publish: confirm this run's slug is present and no prior-run tracegrep -q "$SLUG" "$TMP" || { echo "self-marker missing"; exit 1; }grep -q "$PREV_SLUG" "$TMP" 2>/dev/null && { echo "previous-run contamination"; exit 1; }
There is one more operational call worth making. Do not fold verification and publishing into a single step. If you write "verify, and on pass mv in the same flow," a mid-step failure in the verification can be swallowed, and you proceed to publish unverified content. I keep verification as a gate I confirm has passed, then publish as a separate step. It is unglamorous, but across hundreds of unattended runs that separation earns its keep.
Where to Draw the Line
Almost none of this is needed for interactive runs. If you are watching the screen, you spot a half-written file immediately and delete it by hand. Atomic writes and content assertions truly pay off when the run is unattendedand the output becomes the input of a later automated stage. Pipelines that chain artifacts through the Antigravity CLI or scheduled tasks are exactly that case.
Conversely, for a one-shot where a human inspects the result before moving on, all of this is overkill. My rule for whether to spend the cost is simple: does a later stage trust this output before anyone looks at it? Building pipelines as an indie developer, the temptation is to harden everything — but put the hardening in the wrong place and you have only added complexity.
Pick one artifact — the most downstream one nobody inspects — and replace just its write with temp-plus-rename. That alone should sharply cut the "wake up to a broken artifact" class of incident.
Share
Thank You for Reading
Antigravity Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.