Design Check — Phase 0 Spike Findings

Status: Partial — architecture validated and shipped, calibration data pending Spike duration: 2026-04-29 → 2026-05-01 (engine + route shipped) Live since: 2026-05-04 (Guardify Brand Bot Design Check surface) Author: Eric Downs (Technical Director, Grain & Mortar) Sibling docs: PRD.md §8 · PRODUCT-EXTENSIONS.md Design Check


What this doc is

The Phase 0 spike asked: can we build a credible image-input brand-compliance check on top of the same MCP-as-product architecture that powers Brand Check, with predictable cost shape and reasonable accuracy? This doc captures the spike answer (yes) and the open calibration questions that need real fixtures to close (pending Nicholas's Guardify asset bundle + a small G&M ad set).

Treat this as a living doc until Phase 1 rollout is decided. When calibration data arrives, the "Pending calibration" sections fill in and the doc moves to archived/ as the locked-in spike record.

TL;DR

Architecture

DesignCheckRoute (/api/design-check/[tenantId])
  ↓ multipart upload
checkDesign(buffer, content, meta)  ← lib/design-check/engine.ts
  ↓ Promise.allSettled — all four run in parallel
  ├── extractColors(buffer)         ← lib/design-check/colors.ts
  │   sharp dominant-color extraction + ΔE vs tenant palette
  │   local compute, NO Anthropic cost
  │
  ├── extractText(buffer)           ← lib/design-check/ocr.ts
  │   Anthropic vision OCR, callType=design_check_ocr
  │   returns { text, confidence }
  │
  ├── assessVision(buffer, context) ← lib/design-check/vision.ts
  │   Anthropic vision call with brand context in system prompt
  │   callType=design_check_vision
  │   returns 5-dimension scores (palette / typography / tone / messaging / overall)
  │
  └── runBrandCheck(ocrText, ...)   ← lib/brand-check/engine.ts (REUSE)
       deterministic rule pass over OCR'd text
       no LLM cost
  ↓
DesignCheckResult { colorScore, ocrText, brandCheckResult,
                    visionAssessment, overallScore, cost, warnings }

Two callType labels (design_check_ocr and design_check_vision) keep the cost split visible in /admin/usage per surface. colorScore and brandCheckResult are zero-cost — both run locally over the image buffer or OCR'd text.

Cost shape (measured in production)

Image type OCR cost Vision cost Total Source
Text-light brand asset (placeholder) ~$0.005 ~$0.008 ~$0.013 Original landing-page claim
Text-dense reference (Guardify colors page screenshot, 889 KB PNG) ~$0.014 ~$0.013 ~$0.0271 QA pass 2026-05-05

Implication: the original "$0.014 per image" landing-page copy underestimated cost on text-dense uploads by ~2x. Customer-facing copy on Guardify Brand Bot updated 2026-05-05 (PR guardify-brand-bot#1) to show "$0.01–$0.03 per image, depending on how much text the asset contains."

For Phase 1 pricing, this means the per-image cost should be modeled against the upper bound on each tenant's typical asset mix, not the optimistic baseline.

What's working today

Pending calibration

The pieces below need real ad fixtures to close. Until they do, Phase 1 rollout decisions (Slack image-message shortcut, pricing model, widening to CF tenant) should hold.

G&M tenant — needs 5–10 ad PNGs

For each fixture: capture color score, OCR text + confidence, vision score across all 5 dimensions, brand-check rule results, total cost. Land the numbers in a comparison table here. Calibration loop: adjust ΔE thresholds, vision system prompt, brand-check rule strictness based on observed false positive / false negative rates.

Guardify tenant — needs 3–5 deliberately-off-brand variants from Nicholas

Per the 2026-05-04 strategy lock with Eric: Nicholas Petersen / less.is authors deliberately-off-brand variants per category (wrong-color logo, stroke-around-logo, pill button vs 4px, non-Guardify gradient, wrong typeface, off-brand photography, improper hexagon overlay). Each labeled with the rule(s) it breaks. Becomes the canonical match-rate fixture set the QA framework runs against, producing the defensible "matches with N% consistency" claim.

Asset thread blocked on Nicholas as of 2026-05-05 (Todoist 6gX55HvgWX7vrJC7).

Open threshold questions

  1. Color ΔE pass/fail boundary. Current implementation surfaces per-color ΔE without a hard pass/fail. Does Phase 1 collapse this to a binary, or keep the per-color scores and let the consumer decide?
  2. Vision-score weighting. All 5 dimensions currently average equally into overallScore. Should palette-fit weight more (it's the most testable)? Should typography weight less (lowest-accuracy detection)?
  3. Brand-check rule strictness on OCR'd text. OCR text is noisier than user-pasted copy. Run a few G&M ad PNGs and compare the brand-check results on OCR text vs the same copy pasted directly — does the rule engine need OCR-aware tolerance?
  4. Warnings vs errors. When OCR returns low confidence, do downstream checks run anyway with a warning, or skip with an explicit "OCR too noisy to lint" message? Today they run.

Phase 1 rollout decision (gated on calibration above)

Three open scope questions, locked once calibration data lands:

  1. Surface widening. Does Design Check ship as a Slack message-shortcut on a message-with-image, alongside (or instead of) the portal upload page? If yes — what does the Slack rendering look like? PR for that needs docs/SLACK-UX.md design pass.
  2. Multi-tenant rollout. Once Guardify-calibrated, is it a one-line opt-in for CF and G&M (issue key + drop in env), or are per-tenant rule files needed?
  3. Pricing. Bundle into the brand-bot retainer (current default), or per-image upcharge above some threshold? Cost variability (~2x) makes flat-bundle the safer near-term posture.

Cards