Design Check — Phase 0 Spike Findings
Status: Partial — architecture validated and shipped, calibration data pending
Spike duration: 2026-04-29 → 2026-05-01 (engine + route shipped)
Live since: 2026-05-04 (Guardify Brand Bot Design Check surface)
Author: Eric Downs (Technical Director, Grain & Mortar)
Sibling docs: PRD.md §8 · PRODUCT-EXTENSIONS.md Design Check
What this doc is
The Phase 0 spike asked: can we build a credible image-input brand-compliance check on top of the same MCP-as-product architecture that powers Brand Check, with predictable cost shape and reasonable accuracy? This doc captures the spike answer (yes) and the open calibration questions that need real fixtures to close (pending Nicholas's Guardify asset bundle + a small G&M ad set).
Treat this as a living doc until Phase 1 rollout is decided. When
calibration data arrives, the "Pending calibration" sections fill in
and the doc moves to archived/ as the locked-in spike record.
TL;DR
- Architecture works. Four-component pipeline (colors / OCR /
vision / brand-check reuse) composes cleanly via
Promise.allSettledinlib/design-check/engine.ts. One failure does not sink the whole run — partial results surface inwarnings. Live in prod for Guardify since 2026-05-04, no architectural changes since. - Cost shape is workable but variable. Range observed: ~$0.013 on text-light images to ~$0.027 on text-dense reference assets. ~2x spread driven entirely by Anthropic vision-OCR token count, not by image size or color complexity.
- Engine reuse paid off. OCR'd text feeds straight into
runBrandCheck, so the same naming / banned-terms / contractions / numbers / jargon rules that power the Slack message shortcut apply unchanged to image-extracted copy. One rule engine, two input modes — the design discipline from PRODUCT-EXTENSIONS landed. - Phase 1 rollout decision is unblocked on Guardify-specific threshold tuning. Phase 0 proves the pipeline; Phase 1 needs per-tenant calibration on real ad fixtures before pricing or surface-extension can be locked.
Architecture
DesignCheckRoute (/api/design-check/[tenantId])
↓ multipart upload
checkDesign(buffer, content, meta) ← lib/design-check/engine.ts
↓ Promise.allSettled — all four run in parallel
├── extractColors(buffer) ← lib/design-check/colors.ts
│ sharp dominant-color extraction + ΔE vs tenant palette
│ local compute, NO Anthropic cost
│
├── extractText(buffer) ← lib/design-check/ocr.ts
│ Anthropic vision OCR, callType=design_check_ocr
│ returns { text, confidence }
│
├── assessVision(buffer, context) ← lib/design-check/vision.ts
│ Anthropic vision call with brand context in system prompt
│ callType=design_check_vision
│ returns 5-dimension scores (palette / typography / tone / messaging / overall)
│
└── runBrandCheck(ocrText, ...) ← lib/brand-check/engine.ts (REUSE)
deterministic rule pass over OCR'd text
no LLM cost
↓
DesignCheckResult { colorScore, ocrText, brandCheckResult,
visionAssessment, overallScore, cost, warnings }
Two callType labels (design_check_ocr and design_check_vision)
keep the cost split visible in /admin/usage per surface. colorScore
and brandCheckResult are zero-cost — both run locally over the image
buffer or OCR'd text.
Cost shape (measured in production)
| Image type | OCR cost | Vision cost | Total | Source |
|---|---|---|---|---|
| Text-light brand asset (placeholder) | ~$0.005 | ~$0.008 | ~$0.013 | Original landing-page claim |
| Text-dense reference (Guardify colors page screenshot, 889 KB PNG) | ~$0.014 | ~$0.013 | ~$0.0271 | QA pass 2026-05-05 |
Implication: the original "$0.014 per image" landing-page copy underestimated cost on text-dense uploads by ~2x. Customer-facing copy on Guardify Brand Bot updated 2026-05-05 (PR guardify-brand-bot#1) to show "$0.01–$0.03 per image, depending on how much text the asset contains."
For Phase 1 pricing, this means the per-image cost should be modeled against the upper bound on each tenant's typical asset mix, not the optimistic baseline.
What's working today
- Tenant onboarding is fast. Issuing a
design_checkAPI key via/admin/api-keysand dropping it into the tenant's frontend Vercel env is the entire onboarding path. Guardify's Design Check surface went live 30 minutes after the key was issued. - Subscription gate inherited cleanly.
/api/design-checkreturns 402 if the tenant is paused/canceled, mirroring the MCP route. No new billing logic needed. - Per-API-key rate limit (PR #37) protects the cap. Token bucket on the design-check route prevents a misconfigured client from draining the monthly cap in a runaway loop.
- IP forensics on every check (PR #40) means abuse investigation is one query away.
Pending calibration
The pieces below need real ad fixtures to close. Until they do, Phase 1 rollout decisions (Slack image-message shortcut, pricing model, widening to CF tenant) should hold.
G&M tenant — needs 5–10 ad PNGs
- 2-3 clearly on-brand assets (recent client work, hero images, campaign tiles)
- 2-3 borderline assets (intentionally edge-of-brand — wrong-weight type, off-palette accent, etc.)
- 2-3 clearly off-brand assets (competitor work, generic stock, obvious palette mismatch)
For each fixture: capture color score, OCR text + confidence, vision score across all 5 dimensions, brand-check rule results, total cost. Land the numbers in a comparison table here. Calibration loop: adjust ΔE thresholds, vision system prompt, brand-check rule strictness based on observed false positive / false negative rates.
Guardify tenant — needs 3–5 deliberately-off-brand variants from Nicholas
Per the 2026-05-04 strategy lock with Eric: Nicholas Petersen / less.is authors deliberately-off-brand variants per category (wrong-color logo, stroke-around-logo, pill button vs 4px, non-Guardify gradient, wrong typeface, off-brand photography, improper hexagon overlay). Each labeled with the rule(s) it breaks. Becomes the canonical match-rate fixture set the QA framework runs against, producing the defensible "matches with N% consistency" claim.
Asset thread blocked on Nicholas as of 2026-05-05 (Todoist
6gX55HvgWX7vrJC7).
Open threshold questions
- Color ΔE pass/fail boundary. Current implementation surfaces per-color ΔE without a hard pass/fail. Does Phase 1 collapse this to a binary, or keep the per-color scores and let the consumer decide?
- Vision-score weighting. All 5 dimensions currently average
equally into
overallScore. Should palette-fit weight more (it's the most testable)? Should typography weight less (lowest-accuracy detection)? - Brand-check rule strictness on OCR'd text. OCR text is noisier than user-pasted copy. Run a few G&M ad PNGs and compare the brand-check results on OCR text vs the same copy pasted directly — does the rule engine need OCR-aware tolerance?
- Warnings vs errors. When OCR returns low confidence, do downstream checks run anyway with a warning, or skip with an explicit "OCR too noisy to lint" message? Today they run.
Phase 1 rollout decision (gated on calibration above)
Three open scope questions, locked once calibration data lands:
- Surface widening. Does Design Check ship as a Slack
message-shortcut on a message-with-image, alongside (or instead of)
the portal upload page? If yes — what does the Slack rendering
look like? PR for that needs
docs/SLACK-UX.mddesign pass. - Multi-tenant rollout. Once Guardify-calibrated, is it a one-line opt-in for CF and G&M (issue key + drop in env), or are per-tenant rule files needed?
- Pricing. Bundle into the brand-bot retainer (current default), or per-image upcharge above some threshold? Cost variability (~2x) makes flat-bundle the safer near-term posture.
Cards
- Discovery (parent): Todoist
6gWJ32QRVqjRh727 - Phase 0 spike (G&M): Todoist
6gWJ4qJ25j28vF6f— engine + route shipped, parent card can move to Complete - Closeout (fixtures + benchmark + this doc): Todoist
6gWMvCCPwCxr6qFf— partial doc shipped 2026-05-05, calibration sections await fixtures - Guardify calibration set: Todoist
6gX55HvgWX7vrJC7(blocked on Nicholas's asset bundle) - Phase 1 rollout decision: Todoist
6gX55Jjfx5GC5Ww7(gated on calibration above)
Related
- Engine code:
lib/design-check/{colors,ocr,vision,engine}.ts - Route handler:
app/api/design-check/[tenantId]/route.ts - Live consumer:
guardify-brand-bot.vercel.app/design-check - QA log of the cost-shape measurement:
~/Projects/brand-guide-mcp/docs/qa/chat-surface/runs/2026-05-05-pre-tuning.md(search "$0.0271")