When Background Agents Run Twice — Stopping Double Execution with Leases and Fencing Tokens

The same scheduled job fires from two machines at once and they overwrite each other's output. Here is how to stop that failure mode at the root in Antigravity 2.0 background agents, using leases and fencing tokens, with working code.

background agents³ Antigravity²⁸⁸ distributed lock fencing token operations¹⁹

✦ Premium Article

One morning, an artifact that should have been freshly generated was half-overwritten with stale content. The logs told the story: the same scheduled job had fired from two machines at nearly the same minute, and one started writing before the other had finished.

As an indie developer at Dolice Labs, I run several blog operations on background agents. The intent was redundancy — while one machine sleeps, another picks up the work. But in the instant both were awake, both grabbed the same job. This is not a "forgot to take the lock" story. It happens even when you do take the lock. Let me walk through why, and how to stop it.

Why mutual exclusion alone does not stop double execution

The intuition is simple: take a lock at the start of the job, release it at the end. In the world of background agents, that premise quietly collapses.

A process can hold a lock and then stall for a long time — a long garbage-collection pause, an OS wake from sleep, a slow model call. Meanwhile the lock's TTL expires and another machine legitimately acquires it. The first machine wakes up still believing it owns the lock and begins writing. At that moment there are two lock holders in the world.

So the problem is not acquiring exclusivity; it is being unable to guarantee continued possession. Miss this distinction and every patch — longer TTLs, more heartbeats — only lowers the probability instead of eliminating it.

The idea of a lease

Reframe the lock as a lease: time-bounded ownership that is always assumed to expire. The holder must explicitly renew it before it lapses. The moment renewal stops, ownership is considered surrendered automatically.

The key property is that each time a lease is granted, the issued fencing token increases monotonically. Every acquisition produces a strictly larger token. That lets you distinguish an "old owner" from a "new owner" right before a write, using nothing but a numeric comparison.

Aspect	Plain mutual-exclusion lock	Lease + fencing token
Recovery from a stall	The old holder writes anyway	The old token is rejected at the write site
Ownership decision	Relies on "I think I hold it"	Decided mechanically by number size
Effect of clock skew	Breaks if TTL judgment drifts	Order is set by the token, not the clock
Required assumption	Everyone behaves honestly	The write target can verify the token

✦

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN

✦Understand exactly why 'I acquired the lock, so I'm safe' breaks down (GC pauses, sleep/wake, clock skew) — and why a plain mutual-exclusion lock cannot prevent double execution

✦A complete implementation of double-execution prevention with a lease plus a monotonically increasing fencing token (acquire, renew, expire, and write-side verification). The bash and Python are copy-paste ready

✦From the real experience of running scheduled jobs across two Macs as an indie developer and watching output get corrupted, a clear rule for where the verification gate must live

Secure payment via Stripe · Cancel anytime

✦

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

Unlock all articles with Membership →

Implementing lease acquire and renew

First, represent the lease on shared storage. For clarity I show a file-based version (a shared directory or a small KV). The essentials: increase the token monotonically on acquire, and keep renewing via heartbeat while you hold it.

#!/usr/bin/env bash
# acquire_lease.sh — acquire a lease (exit non-zero if not acquired)
set -euo pipefail
 
LEASE_DIR="${LEASE_DIR:-/shared/leases}"
JOB="$1"                       # job name
HOLDER="$(hostname)-$$"        # machine + PID
TTL_SEC="${TTL_SEC:-120}"      # lease lifetime
NOW="$(date +%s)"
 
LEASE="${LEASE_DIR}/${JOB}.lease"
TOKENF="${LEASE_DIR}/${JOB}.token"
mkdir -p "$LEASE_DIR"
 
# Is an existing lease still alive?
if [ -f "$LEASE" ]; then
  EXP="$(awk -F= '/^expires=/{print $2}' "$LEASE")"
  if [ "${EXP:-0}" -gt "$NOW" ]; then
    echo "lease held until ${EXP}, now ${NOW}" >&2
    exit 11                    # still valid -> give up acquiring
  fi
fi
 
# It had expired, so acquire. Increase the token monotonically (this is the point).
TOKEN="$(( $(cat "$TOKENF" 2>/dev/null || echo 0) + 1 ))"
echo "$TOKEN" > "$TOKENF"
 
cat > "${LEASE}.tmp" << LEASEDATA
holder=${HOLDER}
token=${TOKEN}
expires=$(( NOW + TTL_SEC ))
LEASEDATA
mv -f "${LEASE}.tmp" "$LEASE"   # atomic swap
 
echo "$TOKEN"                    # return the acquired token

The atomic swap via mv -f and the monotonic token file are what matter. The token is "how many times the lease has been freshly granted," and it never decreases.

While holding, a separate process (or a background loop in the same one) sends heartbeats, pushing expires forward.

#!/usr/bin/env bash
# renew_lease.sh — extend the TTL only if you are still the holder
set -euo pipefail
LEASE="${LEASE_DIR}/$1.lease"; HOLDER="$2"; MY_TOKEN="$3"; TTL_SEC="${TTL_SEC:-120}"
 
CUR_HOLDER="$(awk -F= '/^holder=/{print $2}' "$LEASE")"
CUR_TOKEN="$(awk -F= '/^token=/{print $2}'  "$LEASE")"
 
# If either the holder or token differs, you no longer own it
if [ "$CUR_HOLDER" != "$HOLDER" ] || [ "$CUR_TOKEN" != "$MY_TOKEN" ]; then
  echo "lost lease (holder/token changed)" >&2
  exit 12
fi
 
cat > "${LEASE}.tmp" << LEASEDATA
holder=${HOLDER}
token=${MY_TOKEN}
expires=$(( $(date +%s) + TTL_SEC ))
LEASEDATA
mv -f "${LEASE}.tmp" "$LEASE"

When renewal returns exit 12, stop the work right there. Writing an artifact after losing the lease is exactly the moment double execution corrupts output.

Verifying the fencing token at the write site

This is the heart of the design. Holding the lease does not mean "you may write." Write only when the target confirms your token is at least the largest it has seen. That reliably rejects an old holder returning from a stall.

# fenced_write.py — write an artifact with token verification
import os, tempfile
 
def fenced_write(target_path: str, token: int, payload: bytes) -> bool:
    """Write only if token is >= the largest token observed so far."""
    guard = target_path + ".fence"          # records the last token written
    last = 0
    if os.path.exists(guard):
        last = int(open(guard).read().strip() or "0")
 
    if token < last:
        # Reject a write from an old holder (a machine returning from a stall)
        raise PermissionError(f"stale token {token} < last {last}; refusing write")
 
    # Advance the fence first, then atomically replace the body
    with open(guard, "w") as g:
        g.write(str(token))
    fd, tmp = tempfile.mkstemp(dir=os.path.dirname(target_path) or ".")
    with os.fdopen(fd, "wb") as f:
        f.write(payload)
    os.replace(tmp, target_path)            # atomic swap
    return True

By rejecting token < last, even a stale machine that still believes it is the holder cannot land its write. Deciding ownership at the instant of the write, using only a numeric comparison, is the strength of this pattern.

Wrapping the whole job into one flow

Lining up acquire, renew, and verified write, the agent's job body wraps like this.

TOKEN="$(LEASE_DIR=/shared/leases ./acquire_lease.sh daily-publish)" || {
  echo "another holder is running; exiting cleanly"; exit 0; }
HOLDER="$(hostname)-$$"
 
# heartbeat in the background
( while sleep 45; do ./renew_lease.sh daily-publish "$HOLDER" "$TOKEN" || exit; done ) &
HB=$!
trap 'kill "$HB" 2>/dev/null || true' EXIT
 
# do the agent's real work here (generate, format)
run_agent_job
 
# the artifact write must go through token verification
python3 fenced_write.py --token "$TOKEN" --target /shared/out/today.json

If acquisition fails, exit cleanly (exit 0). It simply was not your turn, so there is no reason to raise an error. An operation where error alerts keep ringing buries the truly abnormal ones.

A small judgment that paid off in solo operation

Running scheduled jobs across two Macs taught me how much it matters to place the verification gate as far downstream as possible. Early on I was content with a lock at job start and did nothing right before the write. Accidents always happen in the long gap between start and write.

The other lesson: keep the fence file (.fence) in the same place and with the same permissions as the artifact. Put it elsewhere and one side may sync while the other does not, rewinding the token memory and turning the verification itself into a lie. In the Dolice Labs setup, things stabilized once I always co-located artifact, fence, and lease on the same shared target, with a small rule to exclude only the fence from backups.

Deciding how far to take it

Not every job needs this. For idempotent read-only jobs, or ones where "last writer wins" is fine, a lease is overkill. The cost is worth it when three conditions overlap: artifacts that corrupt if written halfway, schedules that can fire from two or more machines, and devices where stalls and sleep happen daily.

Conversely, when all three are present and you run on a plain lock, the accident is only a matter of time. Rather than lowering the probability, make old writes structurally unreachable — that, I believe, is the role of leases and fencing tokens.

As a first step, add fence verification to just the single artifact whose corruption hurts the most. Run it overnight, watch the number in .fence climb quietly, and you will learn whether double execution had been happening at all.

Thank You for Reading

Antigravity Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.