ANTIGRAVITY LABJP
Articles/Agents & Manager
Agents & Manager/2026-06-13Advanced

Instruction Drift in Scheduled Agents — A Three-Layer Design for Keeping Definitions, Docs, and Reality Aligned

Scheduled agents keep logging success even after their instructions diverge from reality. Here is the three-layer drift-detection design — definition, documentation, reality — I built after silent failures in my own operations.

agents92scheduled-execution3operations14documentation7drift-detection

Premium Article

I was reading back through the logs of my overnight scheduled runs when something caught my eye. As an indie developer I have Antigravity agents run recurring jobs against several of my apps — dependency updates, crash-report triage, that sort of thing — and one task whose runbook said "runs twice a week" was, according to the scheduler, running every single day.

I had changed the frequency myself a few weeks earlier. I updated the scheduler. I forgot to update the runbook. And because the task printed a cheerful success log every night, nothing ever prompted me to look.

That particular mismatch was harmless. But once I started digging, I found two uglier ones. A data file the runbook referenced had been silently reading as empty ever since a folder rename. And one "zombie task" was still running on schedule even though its instruction document had been deleted in a reorganization.

A scheduled agent does not stop when its instructions diverge from reality. Now that Antigravity 2.0 has made scheduled and background agents an everyday tool, I have come to treat this "instruction drift" as a design problem that needs detection machinery — not a writing problem that careful documentation will solve.

Instructions start aging the moment you write them

After running these pipelines for a while, I noticed that drift arrives through essentially three paths.

1. Stranded definition changes. You change a schedule's frequency, time, or enabled state, but the runbook or AGENTS.md keeps describing the old behavior. The main goal of the change was changing the behavior, so syncing the docs gets deferred — and in my experience, deferred syncs almost never happen.

2. Moved or renamed references. A data file or sub-document the runbook reads gets relocated during a refactor. The scary part is that many shell-based procedures do not stop on an empty read. Depending on how pipes and redirects are arranged, the whole job still "succeeds." And agents lean toward completing the work with whatever information is at hand rather than reporting that something is missing.

3. Vanished documents. You consolidate instruction documents and an old schedule definition survives on its own — a zombie task. The mirror image also appears: orphan documents that no definition references anymore. Each instance is small, but uninventoried they accumulate until you can no longer say how trustworthy the operation is.

What all three share is this: a success log proves nothing about alignment. An agent does its best with the situation it is given, so it produces plausible output even with missing references and stale instructions. "It's running, so it's fine" is exactly the assumption this problem exploits.

Definition, documentation, reality — a three-layer model

To design countermeasures, I split the operation into three layers.

  • Layer 1: Definition — what the scheduler actually runs, when, and whether it is enabled
  • Layer 2: Documentation — what AGENTS.md, runbooks, and procedures claim
  • Layer 3: Reality — what execution logs and artifacts show

Integrity checking decomposes into the three pairwise comparisons: definition versus documentation, documentation versus reality, definition versus reality. My "twice a week versus daily" was a definition–documentation mismatch; the empty reads were documentation–reality; the zombie task was a missing link between definition and documentation.

Before comparing anything, one design decision matters: decide which layer is canonical. I treat the definition (layer 1) as truth and documentation as subordinate. The scheduler is what actually runs; it does not lie. Without a declared source of truth, every discovered mismatch triggers a debate about which side to fix.

Then make the canonical layer machine-readable in one place. If frequencies exist only inside a scheduler's admin UI, cross-checking cannot be automated, so I keep a tasks.yaml in the repository as the single source of truth.

# tasks.yaml — the canonical record of scheduled runs (frequencies live here and nowhere else)
tasks:
  - name: nightly-dependency-update
    schedule: "0 3 * * *"        # daily at 03:00
    doc: docs/agents/nightly-dependency-update.md
    refs:
      - data/allowlist.json
      - docs/shared/update-policy.md
 
  - name: crash-triage
    schedule: "0 6 * * 1,4"      # Mon & Thu at 06:00
    doc: docs/agents/crash-triage.md
    refs:
      - data/crash-thresholds.yaml

The important fields are doc and refs. Recording which document a task obeys and which files it reads lets the checker script walk all three layers from a single starting point.

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN
Understand the three typical paths instruction drift takes — stranded definition changes, moved references, and vanished documents — and where to detect each one
Take home working bash and Python scripts that cross-check definitions, documentation, and reality, ready to drop into your own operation
Learn how to run a weekly drift review in ten minutes by letting machines enumerate problems and showing humans only the diff
Secure payment via Stripe · Cancel anytime

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

or
Unlock all articles with Membership →
Share

Thank You for Reading

Antigravity Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

Related Articles

Agents & Manager2026-06-13
When a Scheduled Agent Runs Twice — Designing for Idempotency Against Overlap and Retry
A scheduled agent can do the same work twice when the next run triggers before the last one finishes. Here is a design with an overlap lock and an idempotency guard that survives mid-run failures, drawn from a double-publish incident I ran into in production.
Agents & Manager2026-06-13
Building Idempotent Scheduled Agents with the Antigravity SDK
Scheduling an Antigravity SDK agent is almost a one-liner. The hard part is making it idempotent — so a double trigger never runs the job twice, a crash never drops a day, and the result always converges to one. Here is how I build idempotent scheduled agents, learned from the maintenance jobs I run as an indie developer.
Agents & Manager2026-06-02
Rehearsing an Agent's Actions Before They Touch Production — Designing a Zero-Side-Effect Dry-Run Layer
Some accidents survive shadow mode and canaries: the very first time an agent touches an external API. This is the design and TypeScript implementation of a zero-side-effect dry-run layer you can bolt onto Antigravity's parallel agents, with the real numbers from running six sites autonomously.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →