Cloudflare Browser Rendering — Crawl Reference

What This Is

We used Cloudflare's Browser Rendering /crawl API to scrape the existing Squarespace site at rootstructureomaha.com before migrating to WordPress. This captured all text content, image URLs, and page structure.

API Details

Free Plan Limits

Resource Limit
Crawl jobs/day 5
Pages per crawl 100
Browser hours 10 min/day
REST API requests 6/min

Other Useful Endpoints

Endpoint Purpose Notes
/json AI-powered structured data extraction Pass a prompt + JSON schema, get clean data back. Needs Browser Rendering Edit token permission.
/scrape CSS selector-based scraping Target specific elements like meta[name="description"]
/content Full raw HTML Returns complete <head> with all meta tags
/crawl Multi-page crawl What we used. Follows links, returns markdown/HTML/JSON

Token Permissions

The stored token at ~/.cloudflare/credentials has zone management permissions only. For Browser Rendering endpoints (/json, /scrape, /content, /crawl), the token needs Browser Rendering - Edit permission added in the Cloudflare dashboard.

Data Locations

Folder Contents
~/.claude/project-notes/root-structure/rootstructure-crawl/ Raw crawl markdown files (13 pages)
~/.claude/project-notes/root-structure/rootstructure-images/ All images organized by category (48 files, 51MB)
~/.claude/project-notes/root-structure/rootstructure-seo/ SEO metadata per page (JSON + CSV)

Replicating for Other Clients

This workflow can be used for any Squarespace-to-WordPress migration:

  1. Crawl — Hit /crawl with the client's domain, get markdown content
  2. Extract images — Parse image URLs from markdown, download with curl
  3. Extract SEO — Fetch raw HTML, parse <head> meta tags (or use /json with a prompt if token has permissions)
  4. Import to WP — Browser-based PHP script using wp_insert_post() + media_handle_sideload()
  5. Set Yoast — Write to _yoast_wpseo_* post meta and wpseo_* term meta
  6. Create redirects — Old URL paths → new WordPress paths via Redirection plugin