When CI Passes but App Review Rejects Your Screenshots — Field Notes on Measuring the Freshness of Your Validation Rules
Store asset validation can pass in CI and still get rejected in review, because the rules themselves go stale. Move store specs into a freshness-dated contract file, then add locale overflow checks and perceptual diffs.
The night before submission, every store asset check in CI was green. The next morning, App Store Connect returned a metadata rejection caused by screenshots. I read the logs three times. The validation script kept insisting that every file had passed.
The generation pipeline wasn't the problem. The validation rules themselves had gone stale.
Screenshot size requirements change quietly a few times a year. Yet most validation scripts bake the store spec of the day they were written into the code as constants like 1170x2532. From that moment on, a green build no longer means "this meets the store's requirements." It means "this meets the requirements as they existed when someone wrote this script." The gap between the two is invisible until you actually submit.
As an indie developer shipping apps in multiple locales for years, I hit exactly this rejection right before a release deadline. I suspected the generation pipeline immediately — it never occurred to me to suspect the validator. Since then I structure store asset validation in two tiers: one layer that validates the assets, and one layer that validates the freshness of the validation rules themselves.
This article walks through that two-tier setup on GitHub Actions, with code you can run.
Why a Green Build Stops Being Trustworthy
It helps to see why this failure mode is so hard to catch.
Layer
Who changes it
How often
Detectable by plain CI?
Asset generation (capture and processing)
Your own code
Per commit
Yes — classic validation
Validation rules (sizes, formats, count limits)
Apple / Google
A few times a year, unannounced
No, not as-is
Per-locale captions
Translation updates
Irregular
No, not without rendering
The first layer is what conventional CI protects. The second and third change outside your repository, which fundamentally clashes with commit-triggered CI: when the outside world moves, no build runs — and when a build does run, it passes against outdated rules.
Closing the gap takes three small tools rather than a big platform:
Pull store specs out of the code into a contract file that carries freshness metadata
Have CI check that freshness on every run, and fail once the contract is past its shelf life
Compare rendered per-locale output against golden images with a perceptual diff
Move Store Specs into a Freshness-Dated Contract File
Evict the rules from code constants into JSON, and record when and against what they were last verified.
The key field is staleAfterDays. A machine cannot judge whether the spec contents are still correct — but it can absolutely judge how many days have passed since a human last reconciled them against the official page. So the validator checks the contract before it checks a single image.
// scripts/check-spec-freshness.jsconst spec = require('../store-spec.json');const verifiedAt = new Date(spec.verifiedAt);const ageDays = Math.floor((Date.now() - verifiedAt.getTime()) / 86400000);if (ageDays > spec.staleAfterDays) { console.error( `❌ store-spec.json was last verified ${ageDays} days ago (limit: ${spec.staleAfterDays}). ` + `Reconcile it against the official spec pages and update verifiedAt.` ); process.exit(1);}console.log(`✅ Spec freshness OK (${ageDays} days since verification)`);
Why fail the build instead of warning? Because warnings get read for about a week, and then they become scenery. I use a 45-day limit: Apple's size requirement changes tend to land roughly on a quarterly cadence, so I set the shelf life slightly shorter than a quarter. The renewal itself is a ten-minute job — open the official pages, reconcile, bump verifiedAt. Ten minutes on a schedule is a cheap trade for structurally eliminating deadline-night rejections.
One caution: the size numbers in this article are examples. The entire point of a contract file is that you build the first version by reconciling against the official specs yourself, not by copying numbers from a blog post — including this one.
✦
Thank you for reading this far.
Continue Reading
What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.
WHAT YOU'LL LEARN
✦A CI design that treats store specs as a freshness-dated contract file instead of hardcoded constants, with working validation code
✦How to wire three defense layers into GitHub Actions — locale text overflow, perceptual diffs against golden images, and spec staleness
✦A manifest that binds each asset set to a build number, so a rejection can be traced to a single commit in minutes
Secure payment via Stripe · Cancel anytime
✦
Unlock This Article
Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.
Detect Per-Locale Caption Overflow After Rendering
The second blind spot is caption overflow. A line that fits in Japanese grows by 40 percent in German and escapes the safe area. Generation succeeds, size validation passes, and sometimes review passes too — which is arguably worse than a rejection, because the broken artwork simply goes live on your store page.
If you composite captions yourself, you can measure the rendered text width at composition time.
One measured threshold does a lot of work here. In my own asset sets, English captions averaged around 1.2 times the width of the Japanese layout they were designed against, with long lines approaching 1.5 times. So I reserve a safe width of roughly the maximum expected Japanese width times 1.5, and any line that still overflows gets shortened on the translation side. Shrinking the font is the other option, but caption sizes that vary by locale make the store listing look uneven, so I prefer cutting words.
Catch Silent Degradation with Perceptual Diffs Against Golden Images
The third layer catches assets whose dimensions and file sizes are correct but whose contents are broken. The incident that sold me on this: a CI runner image update changed CJK font resolution, and captions silently rendered in a fallback font. Size validation passed, of course. A human would notice in a second; the machine said nothing.
Keep the approved assets in the repository as goldens, and diff every generation against them.
The 2 percent limit is an empirical line. Recompression and anti-aliasing jitter stay around 0.5 percent; font substitutions and layout breakage show up above 5 percent. The threshold sits between the two. When you change the design on purpose, you let the diff fail, review it, and update the golden — and that golden update, going through code review, becomes your change history for free.
Bind Each Asset Set to a Build Number
The last piece shortens the investigation when a rejection does happen. Emit a manifest with every generation, tying together the commit, the contract version, and the build number that produced the set.
Rejection emails rarely say more than which screenshots offended, so root-causing depends entirely on your own records. With the manifest, you open the manifest.json of the submitted set: an old specVersion means contract staleness, a recent commit means a layout change. The triage becomes a straight line. Investigations that used to cost me the better part of an hour of scrolling through generation logs now take minutes.
Wiring this into GitHub Actions is unglamorous: freshness check → generation → caption measurement → dimension validation → perceptual diff → manifest, in series. The only deliberate choice is putting the freshness check before generation — if the rules are rotten, everything downstream is theater.
What to Do Next
You don't need all of this at once. The highest-leverage piece is the contract file plus the staleness gate: if you already have a validation script, evict its constants into JSON and add check-spec-freshness.js at the front. Start there, confirm a rejection-free release, then layer in caption measurement and perceptual diffs.
If you've ever lost an evening to a deadline-night rejection, I hope these notes save you the next one.
Share
Thank You for Reading
Antigravity Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.