Keep a Tamper-Evident Audit Log of Your Autonomous Agent's Actions

To record the decisions and actions an Antigravity agent takes autonomously in a form you can trace and verify later, design an append-only audit log whose hash chain detects tampering. Includes the implementation.

Antigravity²⁷⁹ agent-design⁹ audit-log² observability¹⁶ operations¹⁵

✦ Premium Article

Once an agent starts acting autonomously, a morning comes when you ask yourself: "Who made this change?" The answer is "the agent," but the problem is what comes after. When, on what input, by what reasoning, did it execute what? Unless you can retrace that precisely later, you cannot hand off autonomous operation with peace of mind.

I run several sites on my own, with an agent that generates articles overnight, runs them through quality gates, and proceeds all the way to push autonomously. Handy as that is, when something goes wrong, if I cannot reconstruct "since when, and why," I cannot reach the cause. This is exactly where ordinary app logs fall short.

App logs and audit logs serve different purposes

An app log is for a developer to understand behavior. It can be deleted once debugging is done, and its format changes casually. An audit log, by contrast, is for verifying after the fact, from a third party's viewpoint, that "it really did behave that way."

This difference maps straight to the requirements. An audit log is append-only; you must not rewrite past entries. Order must be guaranteed, omission must be detectable, and you must be able to confirm later that it has not been tampered with. Since the agent records its own actions, you want to technically guarantee that "the record has not been conveniently rewritten afterward."

Detect tampering with a hash chain

The core of tamper detection is weaving "the previous entry's hash" into each entry. Same idea as a blockchain, but you need no distributed consensus. A single file works perfectly well.

import hashlib
import json
import time
 
GENESIS = "0" * 64
 
def make_entry(prev_hash: str, payload: dict) -> dict:
    entry = {
        "ts": time.time(),
        "prev_hash": prev_hash,
        "payload": payload,
    }
    # compute the hash over the whole entry (including prev_hash)
    serialized = json.dumps(entry, sort_keys=True, ensure_ascii=False)
    entry["hash"] = hashlib.sha256(serialized.encode("utf-8")).hexdigest()
    return entry

Taking the hash with prev_hash included is the crux. Rewrite even one entry in the middle and its hash changes, conflicting with the prev_hash the next entry holds. Look at the point where the chain breaks and you know where the tampering happened.

✦

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN

✦How a normal app log differs from an audit log, and why agent operations need the latter

✦An append-only log that weaves each entry's hash into the next to detect tampering, omission, and reordering

✦What to record and what not to, keeping traceability while holding down PII and cost

Secure payment via Stripe · Cancel anytime

✦

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

Unlock all articles with Membership →

Write it append-only

Writing is just adding one JSON Lines row at a time to the end of the file. Read the previous row's hash and pass it into the next entry.

from pathlib import Path
 
class AuditLog:
    def __init__(self, path: str):
        self.path = Path(path)
        self.path.touch(exist_ok=True)
 
    def _last_hash(self) -> str:
        last = GENESIS
        with self.path.open("r", encoding="utf-8") as f:
            for line in f:
                if line.strip():
                    last = json.loads(line)["hash"]
        return last
 
    def append(self, action: str, **fields) -> str:
        entry = make_entry(self._last_hash(), {"action": action, **fields})
        with self.path.open("a", encoding="utf-8") as f:
            f.write(json.dumps(entry, ensure_ascii=False) + "\n")
        return entry["hash"]

You call append at each step of the agent: milestones like "chose a topic," "passed the gate," "pushed." What matters is recording the input and the result of a decision as a pair. Beyond "pushed," recording "which commit SHA was pushed on the basis of which gate results" lets you trace causality later.

Verification is a single pass

The value of an audit log is that you can verify it later. Verification is just recomputing hashes from the start and confirming the chain.

def verify(path: str) -> tuple[bool, int]:
    """If tampered or missing, return the first broken line number."""
    prev = GENESIS
    with open(path, "r", encoding="utf-8") as f:
        for i, line in enumerate(f, start=1):
            if not line.strip():
                continue
            entry = json.loads(line)
            if entry["prev_hash"] != prev:
                return False, i  # chain broken = omission or reorder
 
            stored = entry.pop("hash")
            recomputed = hashlib.sha256(
                json.dumps(entry, sort_keys=True, ensure_ascii=False).encode("utf-8")
            ).hexdigest()
            if stored != recomputed:
                return False, i  # contents were rewritten
 
            prev = stored
    return True, 0

I wire this verify into my morning integrity check. Even at the scale of 10,000 rows it takes under a second. If something is broken, it returns the first broken line number, so I can investigate from there. In half a year of operation no tampering has occurred, but the very fact that "I can verify at any time" underpins my trust in autonomous operation.

What to record, and what not to

The thorny part of an audit log is the scope of recording. Too much and cost and PII risk grow; too little and you cannot trace later.

My criteria are three. First, record only decision branch points. Not every loop iteration, but the milestones of "which option, and why." Second, do not keep the body itself; keep the body's hash and metadata. Putting full article text into the audit log bloats it and makes later deletion hard. Third, secrets like email addresses and tokens get hashed, or never enter the payload at all.

audit.append(
    "article_pushed",
    site="antigravitylab",
    commit_sha="a1b2c3d",
    gate_results={"article_gate": "pass", "templating_gate": "pass"},
    body_sha256=hashlib.sha256(body.encode()).hexdigest(),  # don't keep the body
)

Keep only body_sha256 and you can later verify "is the body pushed back then identical to this one now in the repo?" You can prove identity without hoarding the full text.

The pitfalls of rotation and long-term retention

An append-only log grows without limit if left alone. Rotation is necessary, but splitting it naively breaks the chain.

So when I switch files, I carry the last hash of the old file into the prev_hash of the new file's first entry. Even splitting files monthly, the hash chain stays connected as one line across months. Verification scans each file in order and also checks that the hash is continuous at file boundaries.

def rotate(old_path: str, new_path: str):
    last = AuditLog(old_path)._last_hash()
    AuditLog(new_path).append("rotation_marker", carried_prev_hash=last)

Without this one touch, the chain snaps at each rotation and tamper detection loses its meaning. In my first design the chain broke at the start of each month, and verification always returned "mismatch at the head." Only by designing across the boundary too does it become complete as an audit log.

Trust in autonomous operation comes from verifiability

If you want to delegate more to an agent, you need to give it, as a mechanism, as much accountability as you delegate. A tamper-evident audit log is that foundation. When something happens, you reconstruct "when, what, why" from a single chain, and you can prove the record has not been rewritten since. This reassurance is what gives you the courage to delegate greater autonomy.

I use the same audit log for automatic updates to the apps I distribute on AdMob. As much as the investment in making an agent smarter, the investment in making its actions verifiable after the fact is, I believe, the condition for delegating with lasting peace of mind.

Thank you for staying with me to the end. I hope it supports those taking a step toward autonomous operation.

Thank You for Reading

Antigravity Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.