Articles/Integrations

⬡ Integrations/2026-06-19Intermediate

Running Antigravity CLI Unattended: Notify Only on Real Failures

A small wrapper for scheduled Antigravity CLI runs that stays silent on success and alerts you only on failures a human needs to act on, covering exit codes, transient-error triage, and duplicate suppression.

Antigravity CLI⁹ automation⁵⁴ notifications scheduled tasks² shell scripting

✦ Premium Article

A few nights ago, my own automation kept me from sleeping. A scheduled job stalled for just a moment, and my phone buzzed again and again. When I opened the logs the next morning, exactly one failure had needed my attention. Everything else was a transient hiccup that had cleared itself within minutes.

When Gemini CLI was retired on June 18 and I moved my unattended scripts over to the Go-rewritten Antigravity CLI, all of those scripts had to be ported at once. The new CLI is fast and starts lightly. But the question of what to notify on is still mine to design, regardless of which CLI sits underneath. Today I want to share the small mechanism I use to stay silent on success and quietly surface only the failures that actually need a person.

Over-alerting is as dangerous as under-alerting

When I first started building unattended jobs, I set them to "notify on everything." Success and failure alike landed in my pocket. It felt reassuring; in practice it was the opposite.

Once your eyes adjust to a constant stream of success pings, the single failure buried among them slips past. During a stretch when I was running automated updates for four blogs alongside AdMob revenue checks for several apps, dozens of notifications piled up each day, and the one anomaly that mattered scrolled off into the distance.

The value of a notification lives not in volume but in trust — the trust that when it arrives, you will stop what you're doing. So the design starts with subtraction, not addition. Decide first what you will never send, then ring only what's left.

Anchor silence-on-success in the exit code

The first foundation is the Antigravity CLI's exit code. Launched from a shell, the CLI returns 0 on success and non-zero on failure. That value is the primary gate that decides whether to notify at all.

Collect stdout and stderr into a single log file. Whether you're reading the failure later or assembling the body of an alert, that one log is what you'll lean on.

#!/usr/bin/env bash
set -uo pipefail
 
LOG_DIR="${HOME}/.agy-runs"
STATE_DIR="${HOME}/.agy-state"
mkdir -p "$LOG_DIR" "$STATE_DIR"
 
# Take the first argument as the task name; pass the rest through to the CLI
TASK="${1:?task name required}"
shift
 
LOG="${LOG_DIR}/${TASK}-$(date +%Y%m%d-%H%M%S).log"
 
# Run the Antigravity CLI, funneling all output into the log
agy run "$@" >"$LOG" 2>&1
CODE=$?
 
echo "task=${TASK} exit=${CODE} log=${LOG}"

The deliberate choice here is to not use set -e. If the whole script halted the moment the CLI failed, you'd never reach the part that classifies the failure and decides whether to notify. A failure isn't something to halt on — it's something to observe and handle. Keep only set -uo pipefail and capture the exit code yourself.

✦

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN

✦An unattended-run wrapper built on exit codes and logs that stays silent on success

✦Classification logic that separates transient rate limits and auth expiry from failures worth a human's time

✦Duplicate suppression that won't re-alert the same failure for 6 hours, preventing alert fatigue

Secure payment via Stripe · Cancel anytime

✦

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

Unlock all articles with Membership →

Separate "transient hiccups" from "failures worth a human"

A non-zero exit code doesn't always mean a person is needed. Rate limits and brief network drops are the kind of wobble that passes on a retry a few minutes later. An expired auth token or a syntax error, by contrast, won't fix itself no matter how many times the job runs.

So I sort failures by character into three kinds: transient, auth, and the ones worth reading.

classify() {
  local code="$1" log="$2"
 
  # Exit code 0 is success. Not a notification candidate
  [ "$code" -eq 0 ] && { echo "ok"; return; }
 
  # Transient: rate limits, timeouts, brief disconnects
  if grep -qiE 'rate limit|429|timeout|temporarily|ETIMEDOUT|ECONNRESET|50[23]' "$log"; then
    echo "transient"; return
  fi
 
  # Auth: can't recover unattended; needs a human to re-login
  if grep -qiE '401|unauthorized|token .*(expired|invalid)|re-?auth' "$log"; then
    echo "auth"; return
  fi
 
  # Everything else is a failure you should read in the morning
  echo "actionable"
}

This sorting doesn't need to be perfect. Rough is fine at the start. As you run it, every time you find a real failure you mistakenly filed as transient — or the reverse — you grow the grep patterns one at a time. In my case, over about half a year of running this, the patterns grew to roughly a dozen.

Laying the handling out per class makes the intent easier to see.

Failure kind	Typical cause	Handling when unattended
ok	Normal exit (code 0)	Silence. Reset the transient counter
transient	Rate limit, timeout, brief disconnect	Skip by default. Notify only after 3 in a row
auth	Token expired, re-login required	Notify immediately, suppressed for 6 hours
actionable	Syntax error, unexpected exception	Notify immediately; read it in the morning

Ring on transient failures only when they persist

If you notify on the very first transient failure, you slide right back to over-ringing. Ignore the wobble, but do learn when the wobble is continuing. Express that line with a consecutive-count counter.

# Decide whether transient failures are persisting. True only at 3 in a row
transient_persisted() {
  local task="$1"
  local f="${STATE_DIR}/${task}.transient"
  local n
  n="$(cat "$f" 2>/dev/null || echo 0)"
  n=$((n + 1))
  echo "$n" > "$f"
  [ "$n" -ge 3 ]
}
 
# Reset the transient counter on success
reset_transient() {
  rm -f "${STATE_DIR}/${1}.transient"
}

A single success in between wipes the counter. So a one-off stall is quietly let go, and only when the job stalls three times in a row does the signal — "maybe this isn't just a wobble" — reach your hand.

Don't ring the same failure twice or three times

Failures that reproduce until a human fixes them — auth expiry, a syntax error — will generate a notification on every run if left alone. Ten identical alerts in one night is noise. So take a "fingerprint" of the failure and suppress the same fingerprint for a set window.

# Fingerprint: kind + tail of the log, hashed short
should_notify() {
  local task="$1" kind="$2" log="$3"
  local fp f now last
  fp="$(printf '%s|%s' "$kind" "$(tail -n 20 "$log")" | shasum | cut -c1-12)"
  f="${STATE_DIR}/${task}.${fp}"
  now="$(date +%s)"
  last="$(cat "$f" 2>/dev/null || echo 0)"
 
  # Don't re-notify the same fingerprint within 6 hours (21600 seconds)
  if [ "$((now - last))" -lt 21600 ]; then
    return 1
  fi
  echo "$now" > "$f"
  return 0
}

Including the log tail in the fingerprint means that even if the error type matches, a different target file is treated as distinct. The six-hour window is simply tuned to how often I run jobs. Shorten it for hourly jobs, lengthen it for daily ones — set it to your own schedule.

Write the alert body for your half-asleep self

The person an alert reaches is usually a slightly earlier version of you. So that the situation can be taken in at a glance even with the context forgotten, pack the body with only the hostname, task name, failure kind, and the log tail — kept short.

notify() {
  local task="$1" kind="$2" log="$3"
  local tail_text
  tail_text="$(tail -n 8 "$log" | sed 's/"/\\"/g')"
 
  # Pass WEBHOOK_URL via env (e.g. a Slack Incoming Webhook)
  curl -fsS -X POST "$WEBHOOK_URL" \
    -H 'Content-Type: application/json' \
    -d "$(printf '{"text":"[%s] %s failed (%s)\n%s"}' \
          "$(hostname)" "$task" "$kind" "$tail_text")" \
    >/dev/null || echo "notify itself failed" >&2
}

Finally, thread the parts into one flow.

run_and_watch() {
  local task="$1"; shift
  local log code kind
  log="${LOG_DIR}/${task}-$(date +%Y%m%d-%H%M%S).log"
 
  agy run "$@" >"$log" 2>&1
  code=$?
  kind="$(classify "$code" "$log")"
 
  if [ "$kind" = "ok" ]; then
    reset_transient "$task"
    exit 0
  fi
 
  # Transient: only proceed once it has happened 3 times in a row
  if [ "$kind" = "transient" ] && ! transient_persisted "$task"; then
    exit 0
  fi
 
  if should_notify "$task" "$kind" "$log"; then
    notify "$task" "$kind" "$log"
  fi
  exit "$code"
}
 
run_and_watch "$@"

After that, call it from cron or macOS launchd as run_and_watch.sh blog-update agy run --task publish. On a successful night, nothing arrives. That quiet is the proof the mechanism is working.

Verify the alert path without waiting for a real failure

The scariest thing about a notification setup is that the alert fails to fire exactly when you need it. But you can't confirm anything while waiting for a real failure to happen. So run a small command that fails on purpose in place of the CLI, and verify just the path first.

# Inject a command that always exits non-zero in place of agy run to test the path
WEBHOOK_URL="$WEBHOOK_URL" \
  run_and_watch "smoke-test" bash -c 'echo "401 unauthorized: token expired" >&2; exit 1'

With 401 in the log, the classifier lands on auth, and exactly one notification fires after passing duplicate suppression. If it reaches your phone, you've confirmed that a real failure will travel the same path reliably. As an indie developer, I've made this smoke test a rule before putting any new task on the schedule. Running it through once changes how safe unattended operation feels.

Start small, and grow the number of silent nights

You don't have to aim for perfect classification from day one. What I'd suggest is starting with just the exit code and duplicate suppression, and running it. Add the transient logic later — grow a pattern only after you see an alert that fired by mistake. Through that repetition, the mechanism settles into your own way of working.

I've come to measure an unattended job not by the hours it ran, but by the number of nights it stayed silent. Tonight, pick one job from your schedule and wrap it in this quiet mechanism. Tomorrow morning should be a little calmer.

Thank You for Reading

Antigravity Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.