Design Check — Phase 0 Spike Findings

Status: Partial — architecture validated and shipped, calibration data pending Spike duration: 2026-04-29 → 2026-05-01 (engine + route shipped) Live since: 2026-05-04 (Guardify Brand Bot Design Check surface) Author: Eric Downs (Technical Director, Grain & Mortar) Sibling docs: PRD.md §8 · PRODUCT-EXTENSIONS.md Design Check

What this doc is

The Phase 0 spike asked: can we build a credible image-input brand-compliance check on top of the same MCP-as-product architecture that powers Brand Check, with predictable cost shape and reasonable accuracy? This doc captures the spike answer (yes) and the open calibration questions that need real fixtures to close (pending Nicholas's Guardify asset bundle + a small G&M ad set).

Treat this as a living doc until Phase 1 rollout is decided. When calibration data arrives, the "Pending calibration" sections fill in and the doc moves to archived/ as the locked-in spike record.

TL;DR

Architecture works. Four-component pipeline (colors / OCR / vision / brand-check reuse) composes cleanly via Promise.allSettled in lib/design-check/engine.ts. One failure does not sink the whole run — partial results surface in warnings. Live in prod for Guardify since 2026-05-04, no architectural changes since.
Cost shape is workable but variable. Range observed: ~$0.013 on text-light images to ~$0.027 on text-dense reference assets. ~2x spread driven entirely by Anthropic vision-OCR token count, not by image size or color complexity.
Engine reuse paid off. OCR'd text feeds straight into runBrandCheck, so the same naming / banned-terms / contractions / numbers / jargon rules that power the Slack message shortcut apply unchanged to image-extracted copy. One rule engine, two input modes — the design discipline from PRODUCT-EXTENSIONS landed.
Phase 1 rollout decision is unblocked on Guardify-specific threshold tuning. Phase 0 proves the pipeline; Phase 1 needs per-tenant calibration on real ad fixtures before pricing or surface-extension can be locked.

Architecture

DesignCheckRoute (/api/design-check/[tenantId])
  ↓ multipart upload
checkDesign(buffer, content, meta)  ← lib/design-check/engine.ts
  ↓ Promise.allSettled — all four run in parallel
  ├── extractColors(buffer)         ← lib/design-check/colors.ts
  │   sharp dominant-color extraction + ΔE vs tenant palette
  │   local compute, NO Anthropic cost
  │
  ├── extractText(buffer)           ← lib/design-check/ocr.ts
  │   Anthropic vision OCR, callType=design_check_ocr
  │   returns { text, confidence }
  │
  ├── assessVision(buffer, context) ← lib/design-check/vision.ts
  │   Anthropic vision call with brand context in system prompt
  │   callType=design_check_vision
  │   returns 5-dimension scores (palette / typography / tone / messaging / overall)
  │
  └── runBrandCheck(ocrText, ...)   ← lib/brand-check/engine.ts (REUSE)
       deterministic rule pass over OCR'd text
       no LLM cost
  ↓
DesignCheckResult { colorScore, ocrText, brandCheckResult,
                    visionAssessment, overallScore, cost, warnings }

Two callType labels (design_check_ocr and design_check_vision) keep the cost split visible in /admin/usage per surface. colorScore and brandCheckResult are zero-cost — both run locally over the image buffer or OCR'd text.

Cost shape (measured in production)

Image type	OCR cost	Vision cost	Total	Source
Text-light brand asset (placeholder)	~$0.005	~$0.008	~$0.013	Original landing-page claim
Text-dense reference (Guardify colors page screenshot, 889 KB PNG)	~$0.014	~$0.013	~$0.0271	QA pass 2026-05-05

Implication: the original "$0.014 per image" landing-page copy underestimated cost on text-dense uploads by ~2x. Customer-facing copy on Guardify Brand Bot updated 2026-05-05 (PR guardify-brand-bot#1) to show "$0.01–$0.03 per image, depending on how much text the asset contains."

For Phase 1 pricing, this means the per-image cost should be modeled against the upper bound on each tenant's typical asset mix, not the optimistic baseline.

What's working today

Tenant onboarding is fast. Issuing a design_check API key via /admin/api-keys and dropping it into the tenant's frontend Vercel env is the entire onboarding path. Guardify's Design Check surface went live 30 minutes after the key was issued.
Subscription gate inherited cleanly. /api/design-check returns 402 if the tenant is paused/canceled, mirroring the MCP route. No new billing logic needed.
Per-API-key rate limit (PR #37) protects the cap. Token bucket on the design-check route prevents a misconfigured client from draining the monthly cap in a runaway loop.
IP forensics on every check (PR #40) means abuse investigation is one query away.

Pending calibration

The pieces below need real ad fixtures to close. Until they do, Phase 1 rollout decisions (Slack image-message shortcut, pricing model, widening to CF tenant) should hold.

G&M tenant — needs 5–10 ad PNGs

2-3 clearly on-brand assets (recent client work, hero images, campaign tiles)
2-3 borderline assets (intentionally edge-of-brand — wrong-weight type, off-palette accent, etc.)
2-3 clearly off-brand assets (competitor work, generic stock, obvious palette mismatch)

For each fixture: capture color score, OCR text + confidence, vision score across all 5 dimensions, brand-check rule results, total cost. Land the numbers in a comparison table here. Calibration loop: adjust ΔE thresholds, vision system prompt, brand-check rule strictness based on observed false positive / false negative rates.

Guardify tenant — needs 3–5 deliberately-off-brand variants from Nicholas

Per the 2026-05-04 strategy lock with Eric: Nicholas Petersen / less.is authors deliberately-off-brand variants per category (wrong-color logo, stroke-around-logo, pill button vs 4px, non-Guardify gradient, wrong typeface, off-brand photography, improper hexagon overlay). Each labeled with the rule(s) it breaks. Becomes the canonical match-rate fixture set the QA framework runs against, producing the defensible "matches with N% consistency" claim.

Asset thread blocked on Nicholas as of 2026-05-05 (Todoist 6gX55HvgWX7vrJC7).

Open threshold questions

Color ΔE pass/fail boundary. Current implementation surfaces per-color ΔE without a hard pass/fail. Does Phase 1 collapse this to a binary, or keep the per-color scores and let the consumer decide?
Vision-score weighting. All 5 dimensions currently average equally into overallScore. Should palette-fit weight more (it's the most testable)? Should typography weight less (lowest-accuracy detection)?
Brand-check rule strictness on OCR'd text. OCR text is noisier than user-pasted copy. Run a few G&M ad PNGs and compare the brand-check results on OCR text vs the same copy pasted directly — does the rule engine need OCR-aware tolerance?
Warnings vs errors. When OCR returns low confidence, do downstream checks run anyway with a warning, or skip with an explicit "OCR too noisy to lint" message? Today they run.

Phase 1 rollout decision (gated on calibration above)

Three open scope questions, locked once calibration data lands:

Surface widening. Does Design Check ship as a Slack message-shortcut on a message-with-image, alongside (or instead of) the portal upload page? If yes — what does the Slack rendering look like? PR for that needs docs/SLACK-UX.md design pass.
Multi-tenant rollout. Once Guardify-calibrated, is it a one-line opt-in for CF and G&M (issue key + drop in env), or are per-tenant rule files needed?
Pricing. Bundle into the brand-bot retainer (current default), or per-image upcharge above some threshold? Cost variability (~2x) makes flat-bundle the safer near-term posture.

Cards

Discovery (parent): Todoist 6gWJ32QRVqjRh727
Phase 0 spike (G&M): Todoist 6gWJ4qJ25j28vF6f — engine + route shipped, parent card can move to Complete
Closeout (fixtures + benchmark + this doc): Todoist 6gWMvCCPwCxr6qFf — partial doc shipped 2026-05-05, calibration sections await fixtures
Guardify calibration set: Todoist 6gX55HvgWX7vrJC7 (blocked on Nicholas's asset bundle)
Phase 1 rollout decision: Todoist 6gX55Jjfx5GC5Ww7 (gated on calibration above)

Engine code: lib/design-check/{colors,ocr,vision,engine}.ts
Route handler: app/api/design-check/[tenantId]/route.ts
Live consumer: guardify-brand-bot.vercel.app/design-check
QA log of the cost-shape measurement: ~/Projects/brand-guide-mcp/docs/qa/chat-surface/runs/2026-05-05-pre-tuning.md (search "$0.0271")