A search-focused web console and API proxy for exploring Common Crawl index data.
commoncrawl.cc makes Common Crawl index data easier to explore from the browser. It combines a fast web UI with a typed API proxy so you can inspect captures, timelines, and raw responses without manually stitching together index endpoints.
Example search workspace exploring github.blog/* snapshots, timeline metadata, and capture inspection.
Common Crawl is incredibly useful, but its index APIs are still fairly low-level for day-to-day exploration. commoncrawl.cc aims to provide a cleaner workflow for developers, researchers, SEO teams, archivists, and data engineers who need to:
- search snapshot history for a URL
- inspect capture timelines
- fetch raw capture responses
- experiment from a browser instead of ad-hoc scripts
- build against a typed OpenAPI surface
- Search-focused UI for Common Crawl index exploration
- Snapshot, timeline, and capture inspection workflows
- Raw response preview for capture debugging
- Cloudflare Worker API proxy for
index.commoncrawl.org - Generated OpenAPI spec and typed web client
- MSW-backed local mocking for frontend development
- Cloudflare-based deployment workflow for API and web
- Web: https://commoncrawl.cc
- API: https://api.commoncrawl.cc
- OpenAPI: https://api.commoncrawl.cc/openapi.json
commoncrawl.cc is maintained as an independent open source project. Sponsorship helps fund ongoing maintenance, UX improvements, API hardening, documentation, and the time required to keep the project useful and free for the community.
If your company uses Common Crawl for search, SEO, archival, research, data enrichment, or LLM pipelines, sponsoring this project is a practical way to support the tooling around that ecosystem.
No sponsors yet — your company can become the founding sponsor.
|
Top README placement |
Sponsor section placement |
Acknowledgement and support |
A dedicated sponsor kit with tiers, logo guidelines, and company contact details can be added as the sponsorship program evolves.
packages/web— Preact + Vite frontend for search and capture explorationpackages/api— Cloudflare Worker proxy and OpenAPI source
Browser UI (packages/web)
-> API proxy (packages/api)
-> index.commoncrawl.org
The web app consumes generated API clients based on the Worker's exported OpenAPI spec. That keeps the frontend and proxy contract aligned.
pnpm installcp packages/web/.env.example packages/web/.envpnpm --filter @commoncrawl.cc/api devpnpm --filter @commoncrawl.cc/web devThen open:
The web app expects the API at http://localhost:8787 by default.
pnpm --filter @commoncrawl.cc/api build
pnpm --filter @commoncrawl.cc/web buildpnpm --filter @commoncrawl.cc/web testpnpm lint
pnpm fmt:checkpnpm openapi:syncThis exports the API OpenAPI spec and regenerates the typed web client.
- Preact
- Vite
- preact-iso
- Hono
- Cloudflare Workers
- Cloudflare Pages
- Orval
- MSW
- pnpm workspace
Issues and pull requests are welcome. If you find rough edges in the search workflow, timeline view, replay behavior, or API contract, feedback is especially valuable.