Stop Treating Dependency Updates as a Monthly Chore — Weekly Agent Runs with Semver Risk Triage and Verification Gates
Move from batch-updating 47 stale packages at once to a weekly agent-driven routine: semver-based risk tiers, a playbook YAML, hallucination-proof changelog reports, and a lockfile diff gate.
Maintaining several Next.js repositories alone means dependency updates pile up quietly. At one point I ran npm outdated and stared at 47 stale packages. As an indie developer I had settled into a "monthly update day" where I bumped everything at once — and I now think that was the most fragile way to do it.
When you update in one batch and the build breaks, debugging starts with "which of the 47 caused it?" Bisecting works, but the time cost makes you postpone updates, which makes the next batch even bigger. So I restructured the whole thing: split updates into small weekly bundles, and hand the per-bundle work to an Antigravity agent. This article walks through the classification rules, the playbook, and the verification gates, with the actual implementation I run.
Batch Updates Fail Because of Undiagnosability, Not Size
It helps to be precise about what is wrong with batch updates. A large diff is fine if tests pass. The real problem is that when something fails, the number of suspects scales with the batch size, so the cost of failure grows with every postponed week.
Weekly bundles fix exactly that. Keep each bundle at five to eight packages, and — crucially — bundle only packages from the same risk tier. If a patch-only bundle fails, the suspect list is already narrow: it is almost certainly lockfile resolution or a peerDependencies chain, not an API change, because you never mixed majors in.
Even with an agent doing the work, bundle design stays a human job. The agent executes fast, but if you cut the bundles wrong you are back to the same diagnosis problem you had with batches.
Three Risk Tiers
I classify every pending update along two axes: the semver distance, and whether the package affects the build or the runtime.
Tier
Condition
Handling
Tier 1 (automatic)
Patch updates, or minor updates of devDependencies
Agent updates, verifies, and commits unattended
Tier 2 (semi-automatic)
Minor updates of runtime dependencies; majors of type definitions and build tooling
Agent prepares changelog summaries and verification; a human merges
Tier 3 (human)
Framework majors (Next.js, React, etc.); anything touching auth or payments
Agent only writes a research memo; a human does the work
Two details matter. First, separate devDependencies from runtime dependencies: an ESLint plugin moving a minor version cannot change production behavior, but a runtime minor can. Second, some packages go to Tier 3 regardless of semver. I review Stripe SDK updates by hand even for patches — a mistake there is measured in money, not build minutes.
✦
Thank you for reading this far.
Continue Reading
What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.
WHAT YOU'LL LEARN
✦You can replace risky batch updates with a weekly routine that groups packages into risk tiers by semver level and package role
✦You get a working Node script that triages npm outdated output into tiers, plus a playbook YAML you can hand to an Antigravity agent as-is
✦You will be able to run unattended updates that fail safely, with hallucination-resistant changelog reports and a lockfile diff range gate
Secure payment via Stripe · Cancel anytime
✦
Unlock This Article
Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.
Manual classification does not survive contact with a busy week, so a small script turns npm outdated --json into tiered output. This is the entry point of every weekly run.
Running it writes .dep-triage.json and prints only the counts (on my repositories a typical week looks like Tier1: 6 / Tier2: 3 / Tier3: 2). The agent reads that file and touches Tier 1 and Tier 2 only. The ALWAYS_HUMAN list is the safety valve of this whole setup — whenever a payments or auth package enters the project, it must be added here too.
The agent's instructions live in a YAML playbook inside the repository, not in a prompt I retype. Fixing them in a file prevents the slow paraphrase-drift that scheduled runs otherwise accumulate.
# .antigravity/playbooks/weekly-deps.yamltask: weekly-dependency-updateinput: .dep-triage.jsonscope: allowed: - "Update package.json and the lockfile (tier1 and tier2 only)" - "Run npm run build / npm run test and record results" - "Summarize the official changelog for each tier2 package" forbidden: - "Touching any tier3 package" - "Editing application source code to force a failing build to pass" - "Describing changelog content from guesswork when none can be found"procedure: - "Apply tier1 as one bundle via npm install <name>@<latest>, then build and test" - "On failure, bisect the tier1 bundle; commit the passing half, record the rest" - "Apply tier2 one package at a time, verifying the build after each"report: .antigravity/reports/weekly-deps-{date}.md
The second forbidden line earns its keep. Agents are loyal to "make the build pass," and when a dependency bump triggers a type error, they will happily rewrite application code to get green. That exceeds the scope of a dependency update, and if review misses it, a behavior change slips in disguised as maintenance. The correct output for an update that will not pass is a report saying so — and that had to be stated explicitly.
Hallucination-Proofing the Changelog Reports
For Tier 2, a human merges, so the agent prepares changelog summaries. This failed on me once: the agent produced a plausible summary of a release note that did not exist.
The fix was a format requirement: every summary must include the source URL and at least one verbatim quote from the original text. A fabricated quote is exposed the moment you open the URL. Combined with the "no guesswork" rule in the playbook, the report template is fixed as:
### <package>@<version>- source: <URL of the changelog consulted>- quote: "<one verbatim line from the original>"- summary: <breaking changes and impact on this repo, two sentences max>- verdict: merge / hold / needs-human
Since adopting this, I have stopped re-reading everything myself out of distrust — which had defeated the point of delegating. A mechanically checkable constraint (verbatimness of the quote) doubles as a hallucination detector.
Verification Gates — Read the Lockfile Diff by Range, Not by Line Count
Before any commit, the agent must pass three gates: build, tests, and a lockfile diff range check. The third is the one people skip, and the one I value most.
If a bundle of six patch updates produces thousands of changed lockfile lines, transitive resolution has moved substantially. That is not automatically bad, but it breaks the "small bundle" premise, so the unattended run should stop rather than commit. A simple check suffices:
# Extract target package names from .dep-triage.json, then check whether# the lockfile diff contains version changes for unrelated packagesTARGETS=$(node -e "const t=require('./.dep-triage.json');console.log([...t.tier1,...t.tier2].map(e=>e.name).join('|'))")UNRELATED=$(git diff package-lock.json | grep -E '^\+\s+"node_modules/' | grep -vE "node_modules/(${TARGETS})[/\"]" | wc -l)if [ "$UNRELATED" -gt 40 ]; then echo "⚠️ ${UNRELATED} lines of unrelated resolution changes. Aborting unattended commit" exit 1fi
The threshold of 40 lines is an observed value from my repositories, not a universal constant. For the first few weeks I ran the gate in warn-only mode, watched what normal weekly diffs looked like, and then set the cutoff. That order — observe first, then pick the threshold — transfers to any repository.
Running several repositories at once also revives the classic lockfile write contention problem. My arrangement is serial across repositories, parallel only for read-only research within one, for the same reasons described in When Parallel Agents Corrupt Your Lockfile, Serialize Just the Install Step.
Putting It on a Schedule — Double Starts and Empty Weeks
The run is pinned to Friday evenings. The moment you schedule anything, double starts and re-runs appear, and for dependency updates they are uniquely dangerous: applying the same bundle twice corrupts the lockfile. I reused the lock-file-based overlap guard from Scheduled Agents That Fire Twice — Designing for Idempotency and Re-runs unchanged.
One more lesson from actually operating this: empty weeks matter. Even when zero updates are pending, the triage output and the report get written. If a week has no report, you cannot later distinguish "did not run" from "ran and found nothing." Unattended operation starts with leaving evidence of success every single time.
Six Weeks of Numbers
Across four of my repositories (all Next.js + TypeScript), six weeks after switching: the backlog of 47 stale packages was essentially cleared by week six at a pace of five to nine per week (four Tier 3 items remain by choice). A weekly run takes the agent 20–30 minutes; my own review of the Tier 2 reports takes about five. The monthly batch used to cost me two to three hours and still break sometimes — and honestly, the bigger change is not the hours saved but that failure diagnosis stopped being part of the routine.
The unattended gates halted the run twice during that period: once on the lockfile range check (a transitive esbuild move), once on a test failure (a timezone-dependent flaky test unrelated to the update). Both stops were the system working as intended. Tilting toward "stop when in doubt," false positives included, is the right balance for unattended runs.
Where to Start — Just the Triage Script
You do not need the full weekly unattended pipeline on day one. Run dep-triage.mjs once against your own repository and see how your 47 (or more) stale packages split across the three tiers. If Tier 1 dominates, that bundle is something you can hand to an agent today. The classification is the skeleton; the automation is layered on top of it a piece at a time — that is what these six weeks taught me.
Share
Thank You for Reading
Antigravity Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.