Rotate Keys Without Stopping an Unattended Agent: An Overlap-Window Design
API keys and tokens are worth rotating on a schedule before they leak. But an unattended agent goes quietly dead the moment auth breaks during the swap. As an indie developer running several sites on autopilot, I lay out an overlap-window design that rotates keys without downtime.
I once rotated a key and found, the next morning, that the nightly run had quietly stopped. To prevent a leak I had revoked the old API key, but a gap of a few minutes opened between revocation and distributing the new one — and the job that ran at exactly that time fell over with an auth error. I noticed only because the article that should have been generated was not there.
Rotating keys on a schedule is the right habit. The problem is that, for an agent running unattended, the moment of the swap is its weakest point. With a human nearby you see the 401 and fix it by hand; an agent at night or on the move simply stops, unable to tell anyone its auth broke. Here I want to describe a design that rotates keys without stopping.
Stop treating the swap as an instant
The first idea to drop is the instant swap: delete the old key, drop in the new one. No matter how fast you do it, a time gap always opens between revocation and distribution. If a job lands in that gap, it fails.
What you need instead is an overlap window in which both the old and new keys are valid for a while. Enable the new key first, make the readers accept either one, then move the writers to the new key, and only at the end retire the old one. The idea is to replace the instant swap with a gentle migration.
Rotate in four stages
I split key rotation into four stages.
Issue: create the new key, but let nobody use it yet
Accept: bring every reader to a state where it authenticates with either the old or new key
Cut over: move the key that writers (the agent itself) use to the new one
Retire: revoke the old key
The crux is always separating stages 2 and 3. Switch the writers before the readers accept both, and you reproduce exactly the gap from the opening. Do acceptance first, and the cutover can happen at any time without opening a hole.
# secrets.yaml — make the overlap window explicitapi_key: primary: "${API_KEY_NEW}" # used after cutover secondary: "${API_KEY_OLD}" # accepted only during the overlap overlap_until: "2026-06-29T15:00+09:00" # retire at this time
Carrying overlap_until in config lets you decide retirement by a timestamp rather than a hunch. I keep this window at 48 hours: long enough to span a weekend and let a longer job run to completion, yet short enough not to keep the old key alive too long.
✦
Thank you for reading this far.
Continue Reading
What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.
WHAT YOU'LL LEARN
✦A procedure that keeps both the old and new keys valid at once so the switch never opens an auth gap
✦The three pitfalls that turn rotation into an incident under unattended operation (cache, in-flight jobs, re-auth tokens) and how to avoid each
✦A concrete example from automating my own 90-day rotation: a 48-hour overlap window and the cutover steps
Secure payment via Stripe · Cancel anytime
✦
Unlock This Article
Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.
Manual swaps tip toward either forgetting or rushing. I built a mechanism that rotates automatically every 90 days.
# rotate-secret.sh — rotate a key with an overlap windowset -euo pipefailNEW=$(secret-cli generate --scope agent-api)# 1) Accept: make readers dual-capable, then distributesecret-cli set api_key.secondary "$(secret-cli get api_key.primary)"secret-cli set api_key.primary "$NEW"secret-cli publish --wait # wait for every reader to pick it up# 2) Cut over: verify reachability with the new key (do not retire if it fails)agy auth check --key "$NEW" || { echo "❌ new key unreachable; aborting retirement"; exit 1; }# 3) Schedule retirement: revoke the old key in 48 hoursat "now + 48 hours" <<< "secret-cli revoke api_key.secondary"echo "✅ rotation started; retirement at $(TZ=Asia/Tokyo date -d '+48 hours' +%m/%d\ %H:%M)"
The key point is verifying reachability with agy auth check before scheduling retirement. Delete the old key while the new one does not work, and you drop into the worst state where neither works. If the check fails, do not proceed to retirement — stop right there.
Three pitfalls that become incidents when unattended
Even built to design, there are pitfalls you hit precisely because no one is watching. Here are three I actually fell into.
First, caches hold onto the old key. If an edge cache or runtime keeps a key for some time, it keeps hammering with the old key in the field even though the config has switched. I avoid this by always making the overlap window longer than the cache retention.
Second, long in-flight jobs. A job running at the moment of cutover cannot change its key midway and tries to finish on the old one. With an overlap window the old key still works, so you do not have to kill the job. This is the biggest reason to avoid an instant swap.
Third, confusing API keys with re-auth tokens. The two differ in lifespan and in how you swap them. You can rotate an API key on your own plan, but a CLI re-auth token expires and is force-revoked. I rotate the former actively with this design and renew the latter ahead of its deadline — handled separately. Try to manage them together as one kind of "key" and you will jam on one of them every time.
How to choose the rotation interval
I did not pick 90 days with conviction from the start. Shorter raises resilience to leaks, but every rotation brings overlap-window management and makes operations heavier. Longer is easier, but it widens the window in which a leaked key keeps being used unnoticed. As the midpoint of this tug-of-war, I simply began at 90 days.
After running it for about half a year, a 90-day rotation came to about five minutes of real work each time, so four times a year was no burden. Trying to rotate auxiliary keys on the same 90 days, however, bloated the tracking sheet as targets grew and actually produced misses. So I rotate only the high-importance keys every 90 days, and the rest every six months or only when there are signs of a leak.
When deciding the interval, what I watch is the average time to notice if a key leaks. For a key where monitoring catches a leak within hours, a longer interval still limits the damage. Conversely, the fewer the means to notice, the more it makes sense to shorten the interval and narrow the window. The interval is not a uniform rule but something I draw key by key, from detection speed and operational weight.
Where I draw the line as an indie developer
Handling AdMob, App Store, and Google Play credentials as an indie developer, the bind of "I want to rotate but I cannot stop" comes up constantly. I split handling by importance.
Keys on paths where a stoppage hits revenue directly are always rotated with an overlap window. Auxiliary keys whose failure causes little harm I swap simply, tolerating a small gap. Rotating everything with the most careful design makes the operation itself heavy and unsustainable. Keeping it to a weight you can sustain is part of the design too.
When in doubt, write out once what stops and who is inconvenienced the instant a given key breaks. If the inconvenienced party is your readers or your own revenue, the overlap window is worth not skimping on.
Your next step
First, list the keys your agent uses and note the date you last rotated each. You will likely find blank dates, or keys untouched for years. Then rotate just the single most important one through these four stages. Once you have the feel of inserting an overlap window, you can spread the same design to the rest.
Share
Thank You for Reading
Antigravity Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.