ANTIGRAVITY LABJP
Articles/Agents & Manager
Agents & Manager/2026-07-04Advanced

When Nobody Reads Your AI Code Reviewer Anymore — Field Notes on Measuring Actioned-Rate

Our production AI code-review agent quietly went hollow over six months. When the team started silently resolving every comment, we instrumented actioned-rate and false-positive rate to bring it back. These are the field notes.

code-review9ai-agent18production69instrumentationci-cd14

Premium Article

When we first shipped the agent, everyone replied to its review comments. Six months later I opened a PR one morning and froze. All eleven comments the agent had left were marked resolved, in silence.

No discussion. No fix. No reply. Just quietly closed. The comments still looked as reasonable as they had six months earlier, yet nobody was reading them.

This was not a broken tool. It was a team that had been trained, by sheer volume, to close everything on sight. Even a small team of one indie developer runs into this quietly. A quiet kind of hollowing-out that you cannot see without numbers. These are the field notes on catching that signal through measurement and bringing the agent back to life.

"Running" and "Working" Are Different Metrics

For a long time we judged the health of the review agent by whether it was running. Green CI, comments posted, therefore healthy. Or so we thought.

What actually matters is whether a posted comment led to action. Unless you separate these, hollowing-out stays invisible forever.

Dimension"Running" metric"Working" metric
UptimeCI success rate, comment count
AcceptanceActioned-rate (comments that led to a fix or discussion)
PrecisionFalse-positive rate (closed as wontfix / not-applicable)
LoadMedian comments per PR

You can inflate comment count endlessly. The more you inflate it, the lower the actioned-rate falls. Miss that inverse relationship and one day the whole team stops reading.

Harvesting Actioned-Rate From PR Events

The first thing you need is a way to follow what happened to a comment afterward. GitHub review comments retain their resolution state and any replies. We harvest those.

Starting from each comment the agent posted, we classify the human behavior that followed into three buckets. A follow-up commit means actioned, a reply thread means discussed, and resolution with neither means ignored.

# collect_actioned_rate.py
# Measure whether the agent's review comments led to action
import os, requests
from collections import Counter
 
REPO = os.environ["REPO"]           # e.g. "owner/name"
BOT = os.environ["BOT_LOGIN"]       # the review agent's account
TOKEN = os.environ["GITHUB_TOKEN"]
H = {"Authorization": f"Bearer {TOKEN}", "Accept": "application/vnd.github+json"}
 
def paged(url, params=None):
    params = dict(params or {}, per_page=100)
    while url:
        r = requests.get(url, headers=H, params=params, timeout=30)
        r.raise_for_status()
        yield from r.json()
        url = r.links.get("next", {}).get("url")
        params = None  # the next URL already carries the query
 
def classify(comment):
    # actioned: a change commit touched the same file after the comment
    # discussed: a human reply exists in the thread
    # ignored: neither, but resolved
    if comment["reply_count"] > 0:
        return "discussed"
    if comment["path_touched_after"]:
        return "actioned"
    return "ignored"

The key is not to condemn ignored on sight. An INFO-level note deserves to be ignored. It only means something once tied to severity. That is what the next section pulls apart.

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN
A script that harvests actioned-rate and dismissal reasons straight from PR events
A rule for separating false positives from fatigue-driven dismissals by severity
A staged way to throttle comment volume and recover the actioned-rate
Secure payment via Stripe · Cancel anytime

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

or
Unlock all articles with Membership →
Share

Thank You for Reading

Antigravity Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

Related Articles

Agents & Manager2026-06-30
When the Agent Hands You 1,400 Replacements in One Commit, Ask for Batches Instead
Ask Antigravity to run a large codemod and you can get back one unreviewable commit. Here is a small design — ast-grep rules plus a verified batch driver — that splits a mechanical replacement into the machine's job and the human's check, with working code.
Agents & Manager2026-06-21
Letting a Background Agent Work Overnight Without Regretting It by Morning — Guardrails for Unattended Runs
When you hand overnight refactoring to Antigravity's Background Agent, the morning brings as much anxiety as convenience. From three angles — blast radius, completion criteria, and detecting silent regressions — here are the guardrails that let me run unattended jobs with confidence.
Agents & Manager2026-06-16
When Your Antigravity Agent Eval Gate Keeps Flickering — Build Notes on Pass/Fail That Survives Non-Determinism
Same code, yet the eval passes in the morning and fails by noon. The first thing that breaks when you put agent evaluation into CI on Antigravity is the stability of the verdict. Here's how I separate noise from real regression and lock down pass/fail in code.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →