One morning I opened the site and noticed that not a single new article had appeared.
The previous night's background agent log clearly said "Done." Not one error. And yet the article that should have been added was simply not there.
Green log, empty result. As an indie developer running the Dolice Labs sites on autopilot, I have tasted this mismatch a few times since I started running background and scheduled agents in earnest on Antigravity 2.0. Today I want to write about what it really is, and the operating habit I settled on to handle it.
"Done" Does Not Mean the Work Succeeded
There is a premise to accept first. When an agent says "Done," it means "I reached the end of the steps I was given," not "the output landed where it was supposed to."
The two rarely need separating while you work interactively. Sitting in front of the screen, you can confirm on the spot whether a file changed or a commit was added.
In background execution, though, the two quietly come apart. The agent runs the steps and judges "success" by the fact that each command returned a non-zero exit code never appeared. When a command in the middle "did not error but did nothing," that no-op gets recorded as a success.
The gap opens precisely during the hours when nobody is watching.
Three Paths That Slip Through Quietly
Every pitfall I actually fell into was a "failure that throws no exception." Here are three common ones.
The first is the path where a Git commit never gets committed. A freshly cloned repository has user.email and user.name unset, and running git commit in that state proceeds without creating a commit. The following git push has nothing to send, so it returns exit code 0. From the agent's point of view, both commit and push succeeded. But nothing was added on the remote.
The second is the path where a fixed temporary file path is reused. If you write intermediate output to a hard-coded name like /tmp/insert.txt, a failed write leaves last run's leftovers to be read as-is. No error appears. Old content you never produced this time simply blends in.
The third is the path where a patch hit-or-miss gets swallowed. When an agent fails to apply a diff to a file, it may keep a hunk mismatch as a mere warning and move on. It "edited" in its own mind and reports done, while the file stays exactly as it was before.
What the three share: in every case "thought I did it" and "the actual state" diverge, and nothing turns red on the spot.
Look at Ground Truth, Not the Self-Report
The remedy is simple. Stop trusting the agent's report, and insert one independent check at the very end. The report and the check are meaningless unless they come from different actors.
For Git, the most reliable approach is to compare the local and remote commit hashes. After the push, mechanically inspect whether the two match.
# Always pin identity before committing
git -c user.email="you@example.com" \
-c user.name="Your Name" \
commit -m "Add: new article"
LOCAL=$(git rev-parse HEAD)
git push origin main
REMOTE=$(git ls-remote origin -h refs/heads/main | cut -f1)
if [ "$LOCAL" != "$REMOTE" ]; then
echo "❌ push did not land (local=$LOCAL remote=$REMOTE)"
exit 1
fi
echo "✅ landed on the remote: $LOCAL"The key here is that we are not looking at git push's exit code. The exit code only tells us "did the command avoid returning an error." What we want to know is "is the remote tip now the commit I just made locally," and that can only be confirmed by a hash match.
For file outputs, peek into the artifact itself and confirm that a marker unique to this run is present.
# Avoid fixed names for temp files; make each one unique
TMP="$HOME/work/insert-${SLUG}-$(date +%s).txt"
# After generating, check that this article's own marker really made it in
if ! grep -q "$SLUG" "$TARGET_FILE"; then
echo "❌ $SLUG not found in the output; suspect leftover contamination or a failed write"
exit 1
fiSimply changing the fixed name /tmp/insert.txt to a unique name closes the path where last run's leftovers blend in. Then grepping for the marker captures the success of the write itself at the same time.
The official docs do not state that an agent judges success by each step's exit code. But run unattended for long enough and situations will certainly arise where that thin evidence cannot catch a "thought I did it."
Automate Verification Precisely Because It Runs in the Background
Interactively, the human was the final verifier. In the background, nobody plays that role. So the verification step itself has to be built into the agent's procedure.
In indie development I am the only operator, and the last verifier. So I made a habit of always writing the "make it" instruction and the "confirm it" instruction separately. Within the same single run, I place an independent check command after generation, and if the check fails, it stops right there. With this, the mismatch of a green log and an empty result becomes, at least, a red log and an empty result. Just having failure show up as failure makes pinpointing the cause far easier.
Verification does not need to be a heavy mechanism. Three lines to check a hash match, one line to grep a marker, and the reliability of unattended operation changes considerably. Rather than building an elaborate monitoring platform, adding one small check at the tail of each task has, in my hands, kept working far longer.
The Next Step
Pick one background task you have running now, and add just one line at its end that checks whether the artifact really exists. For Git, a post-push hash match; for files, a grep of a unique marker.
That alone changes the weight of the word "Done." The agent's report is about the steps; verification is about the result. Keeping these two separate is, I believe, the foundation for running unattended over the long haul.
I hope this serves as an entry point to verification for anyone else troubled by quiet slip-throughs in their automation.