Feeding axe-core Findings to an Antigravity Agent: An Accessibility Fix Loop in CI
Detect accessibility violations with axe-core, hand them to an Antigravity agent as fixable tasks, and wire detection, triage, fixing, and re-verification into a single CI loop. Includes a diff gate that blocks only new violations and a scoping design that keeps the agent from breaking screen readers.
import RelatedArticles from "@/components/RelatedArticles";
A few days after shipping an updated settings screen for one of my wallpaper apps, a VoiceOver user wrote in: "the save button just reads as 'button' — I can't tell what it does." I had forgotten an accessibilityLabel on an icon-only round button. On a single screen you catch that by eye, but as an indie developer juggling several apps and a few websites at once, a routine of screen-reading every screen before every release simply does not survive contact with reality.
So this piece builds a loop. We detect accessibility violations mechanically with axe-core, hand the results to an Antigravity agent as concrete things to fix, and run detection, triage, fixing, and re-verification as one cycle inside CI. The division of labor is the point: tools decide pass/fail, the agent proposes fixes, and CI applies the brakes. The examples target a web frontend (a Next.js app like the Dolice Lab sites), but the same shape carries over to automated accessibility checks on mobile.
Why manual accessibility review never sticks
Accessibility defects don't blank the screen the way a functional bug does. Visually, everything looks fine. That is exactly why they pile up quietly: a button whose contrast ratio sits just under 4.5:1, an input with no label, a custom dialog that focus never reaches. Each one is small on its own, and each one tends to get skipped when the visual review runs out of energy at the end of the day.
The mistake I made for a long time was "fix it all before release." The backlog ballooned to a few hundred items, and the sheer size made it psychologically impossible to start. Debt is far more manageable when you stop adding to it than when you try to pay it all down at once. Putting axe-core in CI is how you hand that "stop adding" job to a machine.
Detection alone isn't enough, though. Handing a reviewer a list of violations still leaves the "okay, but how do I fix it" burden. That's where an Antigravity agent earns its place — drafting the fix per violation type so review can actually move forward.
The loop at a glance: who detects, who fixes, who blocks
Fix the roles up front and you won't get lost. Three actors share the work.
axe-core (detector): scans the rendered DOM and reports which WCAG rules are violated as machine-readable JSON. The verdict is deterministic and doesn't drift.
Antigravity agent (fixer): reads the violation JSON and the relevant source, then drafts fixes. Its job is explanation and proposal, never the pass/fail call.
GitHub Actions diff gate (brake): compares against a baseline and blocks the PR only when violations have increased. Existing debt goes to a separate queue; new regressions are stopped cold.
Keep detection and the brake in deterministic tools, and confine the non-deterministic part — coming up with the fix — to the agent. That's how you keep CI trustworthy. Let the agent own the verdict and the same PR will return different results run to run, and the gate stops being a gate.
✦
Thank you for reading this far.
Continue Reading
What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.
WHAT YOU'LL LEARN
✦Turn raw axe-core violation JSON into fixable agent tasks and narrow them to critical/serious only — dropping a first scan from 240 findings to a workable 38
✦Add a diff gate to GitHub Actions that blocks only newly introduced violations, so you can stop the bleeding today even while carrying existing debt
✦Learn the scoping constraints and review lens that stop an agent from over-adding ARIA and regressing the very screen readers you set out to help
Secure payment via Stripe · Cancel anytime
✦
Unlock This Article
Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.
Step 1: Wire axe-core into Playwright and emit machine-readable violations
Start by emitting violations as JSON files. With @axe-core/playwright you scan the actually-rendered page, which means you catch dynamically injected elements that static analysis would miss.
// tests/a11y/scan.spec.ts// Scan key pages with axe-core and write violations to results/ as JSONimport { test } from "@playwright/test";import AxeBuilder from "@axe-core/playwright";import { mkdirSync, writeFileSync } from "node:fs";// Targets. Starting from the pages that matter on your main paths is realistic.const PAGES = ["/", "/articles", "/membership", "/support"];test.describe("accessibility scan", () => { for (const path of PAGES) { test(`scan ${path}`, async ({ page }) => { await page.goto(path, { waitUntil: "networkidle" }); const results = await new AxeBuilder({ page }) // Limit to WCAG 2.2 A / AA. Adding AAA explodes the noise. .withTags(["wcag2a", "wcag2aa", "wcag21aa", "wcag22aa"]) .analyze(); mkdirSync("results", { recursive: true }); const safe = path.replace(/\W+/g, "_") || "root"; writeFileSync( `results/${safe}.json`, JSON.stringify(results.violations, null, 2), ); }); }});
The waitUntil: "networkidle" is there for a reason. If you scan before client-side modals or toasts have been injected, you miss violations that genuinely exist. I lost half a day to a "doesn't reproduce locally, breaks in production" screen-reader bug before I remembered this.
At this stage, don't produce a verdict — just accumulate JSON. The judgment is centralized in the diff gate later.
Step 2: Shape the violation JSON into tasks an agent can fix
axe-core's raw output is dense, and handing it to an agent verbatim blurs the focus. So filter by impact and reshape it into fix tasks tied to elements. Early on, I strongly recommend limiting to critical and serious only. On my Lab sites, the full set was 240 findings; narrowing to those two tiers brought it down to 38 — roughly 84% of the findings deferred, and finally a number you can act on.
// scripts/build-fix-tasks.mjs// Read results/*.json and reshape only critical/serious violations into fix tasksimport { readdirSync, readFileSync, writeFileSync } from "node:fs";const ALLOW = new Set(["critical", "serious"]);const tasks = [];for (const file of readdirSync("results")) { const violations = JSON.parse(readFileSync(`results/${file}`, "utf8")); for (const v of violations) { if (!ALLOW.has(v.impact)) continue; for (const node of v.nodes) { tasks.push({ rule: v.id, // e.g. "button-name" impact: v.impact, help: v.help, // one-line human-readable summary selector: node.target.join(" "), // CSS selector for the element html: node.html.slice(0, 200), // a fragment to anchor the fix fixHint: node.failureSummary, // axe's own summary of how to fix }); } }}// Grouping by rule lets the agent decide to "fix all of one kind" together.writeFileSync("results/fix-tasks.json", JSON.stringify(tasks, null, 2));console.log(`fixable tasks: ${tasks.length}`);
The failureSummary field carries axe-core's own description of how to fix the issue. Keeping it markedly lowers the odds of an off-target fix. Passing the selector and the HTML fragment alongside it is what determines the accuracy of the next step.
Step 3: Constrain the agent's scope before it fixes anything
This is the crux. Left to its own devices, an Antigravity agent will "helpfully" sprinkle in ARIA attributes. But unnecessary role or aria-label values break assistive-technology output rather than help it. I once let an agent add role="button" to an element that already wrapped a native <button>, and it produced a regression where the control was announced twice.
So spell out the scope constraints in an AGENTS.md at the repo root. The agent reads this before it works.
# AGENTS.md — constraints for accessibility fix tasks## Scope- Fix only the violations listed in `results/fix-tasks.json`- One PR handles a single rule (e.g. button-name) at a time## Fix priority1. Prefer native elements/attributes (e.g. a visible label or aria-label on icon buttons)2. Add minimal ARIA only when native isn't enough3. Adding or overriding `role` is forbidden — escalate to a human instead## Forbidden- Do not change appearance (layout, color, spacing)- Do not change colors for color-contrast issues; report them as a task instead- Do not touch elements that don't match the selector
There's intent behind routing color-contrast to "report, don't fix." Color is a design decision, and a machine quietly bumping the lightness erodes brand consistency. In my own setup, contrast violations are excluded from the agent's scope and collected separately as issues that need a design call. I run the apps the same way — deciding by eye, side by side with the App Store screenshots.
The instruction to the agent can be a single sentence anchored on the task file: "From results/fix-tasks.json, fix only the items whose rule is button-name, following the constraints in AGENTS.md, and add a one-line rationale to each fix." Name the scope explicitly every time.
Step 4: Build a diff gate that blocks only new violations
If you introduce a "must be zero violations" rule while dozens of debt items remain, every PR goes red from day one and people stop looking. The realistic move is a diff gate that compares against the main baseline and blocks only when new violations appear.
# .github/workflows/a11y.ymlname: accessibility-gateon: pull_requestjobs: a11y: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: actions/setup-node@v4 with: node-version: 20 - run: npm ci - run: npx playwright install --with-deps chromium # Scan the PR branch - run: npx playwright test tests/a11y - run: node scripts/build-fix-tasks.mjs # Compare against the baseline count from main and judge only the increase - name: Compare against baseline run: | CURRENT=$(node -e "console.log(require('./results/fix-tasks.json').length)") BASE=$(cat .a11y-baseline 2>/dev/null || echo 0) echo "baseline=$BASE current=$CURRENT" if [ "$CURRENT" -gt "$BASE" ]; then echo "::error::$((CURRENT - BASE)) new accessibility violation(s) introduced" exit 1 fi
.a11y-baseline records the current critical/serious count (38 in this example). Each time you pay down one debt item, you lower this number, ratcheting to prevent backsliding. After putting this in place, I've been able to hold new violations at 0 per week. Fix the "don't add more" first, and let the agent's fix PRs slowly draw down the existing debt — a two-stage approach.
Automating the part that posts the gate result and fix-tasks.json as a PR comment speeds up the first response in review. The comment-assembly approach in Automating Pull Request Review with Antigravity and GitHub drops straight in here.
Things that tripped me up in practice
1. Scanning before dynamic content lands
Even with the networkidle from Step 1, elements that only appear after a user action (the contents of an opened menu, say) won't be caught. For important interactive elements, open them with page.click() first and scan that state separately.
2. The agent over-adding ARIA
Even with the AGENTS.md constraints, complex violations push it toward over-fixing. In review of fix PRs, I check every added attribute one by one for whether it's truly needed. If a native element suffices, subtract the ARIA.
3. Forgetting to update the baseline, so debt gets locked in
Fix a violation but forget to lower .a11y-baseline, and you won't notice until that freed slot fills back up. Put "update baseline" on the fix-PR checklist.
4. Letting the machine touch color-contrast and wrecking the look
As above, color is decided in design, not code. axe-core's findings are valuable, but unless a human stays the subject of the fix, your brand color drifts murkier by degrees.
5. Targeting every page at once and bogging down CI
Scanning all pages from the start stretches Playwright's runtime and the per-PR wait becomes a drag. Start with the key pages on your main paths and widen once it's stable.
The smallest first step you can take this weekend
We built this in four stages, but you don't need all of it on day one. My suggestion is to stand up just Step 1 — the "scan that only emits JSON" — against your four most important pages, and watch the counts for two weeks.
In that window you'll see which rule violations dominate and which screens break most easily. Once the pattern is clear, layer on the Step 2 filter and the gate, and you'll avoid over-engineering. Accessibility isn't something you perfect in one pass; you put the "don't add more" machinery in place first, then pay it down a little at a time. If this gives you the first nudge on a problem you've been circling, I'll be glad.
Antigravity Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.