commoncrawl.cc

A search-focused web console and API proxy for exploring Common Crawl index data.

commoncrawl.cc makes Common Crawl index data easier to explore from the browser. It combines a fast web UI with a typed API proxy so you can inspect captures, timelines, and raw responses without manually stitching together index endpoints.

Example search workspace exploring github.blog/* snapshots, timeline metadata, and capture inspection.

Why this project exists

Common Crawl is incredibly useful, but its index APIs are still fairly low-level for day-to-day exploration. commoncrawl.cc aims to provide a cleaner workflow for developers, researchers, SEO teams, archivists, and data engineers who need to:

search snapshot history for a URL
inspect capture timelines
fetch raw capture responses
experiment from a browser instead of ad-hoc scripts
build against a typed OpenAPI surface

Features

Search-focused UI for Common Crawl index exploration
Snapshot, timeline, and capture inspection workflows
Raw response preview for capture debugging
Cloudflare Worker API proxy for index.commoncrawl.org
Generated OpenAPI spec and typed web client
MSW-backed local mocking for frontend development
Cloudflare-based deployment workflow for API and web

Live endpoints

Web: https://commoncrawl.cc
API: https://api.commoncrawl.cc
OpenAPI: https://api.commoncrawl.cc/openapi.json

Sponsors

commoncrawl.cc is maintained as an independent open source project. Sponsorship helps fund ongoing maintenance, UX improvements, API hardening, documentation, and the time required to keep the project useful and free for the community.

If your company uses Common Crawl for search, SEO, archival, research, data enrichment, or LLM pipelines, sponsoring this project is a practical way to support the tooling around that ecosystem.

No sponsors yet — your company can become the founding sponsor.

Sponsor visibility

Packages

packages/web — Preact + Vite frontend for search and capture exploration
packages/api — Cloudflare Worker proxy and OpenAPI source

Architecture

Browser UI (packages/web)
  -> API proxy (packages/api)
    -> index.commoncrawl.org

The web app consumes generated API clients based on the Worker's exported OpenAPI spec. That keeps the frontend and proxy contract aligned.

Quick start

1) Install dependencies

pnpm install

2) Configure the web app

cp packages/web/.env.example packages/web/.env

3) Start the API

pnpm --filter @commoncrawl.cc/api dev

4) Start the web app

pnpm --filter @commoncrawl.cc/web dev

Then open:

http://localhost:3000

The web app expects the API at http://localhost:8787 by default.

Development

Build

pnpm --filter @commoncrawl.cc/api build
pnpm --filter @commoncrawl.cc/web build

Test

pnpm --filter @commoncrawl.cc/web test

Lint and format

pnpm lint
pnpm fmt:check

Sync OpenAPI artifacts

pnpm openapi:sync

This exports the API OpenAPI spec and regenerates the typed web client.

Tech stack

Preact
Vite
preact-iso
Hono
Cloudflare Workers
Cloudflare Pages
Orval
MSW
pnpm workspace

Contributing

Issues and pull requests are welcome. If you find rough edges in the search workflow, timeline view, replay behavior, or API contract, feedback is especially valuable.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.github		.github
assets/readme		assets/readme
packages		packages
.gitignore		.gitignore
.oxfmtrc.json		.oxfmtrc.json
.oxlintrc.json		.oxlintrc.json
LICENSE		LICENSE
README.md		README.md
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

commoncrawl.cc

Why this project exists

Features

Live endpoints

Sponsors

Sponsor visibility

Packages

Architecture

Quick start

1) Install dependencies

2) Configure the web app

3) Start the API

4) Start the web app

Development

Build

Test

Lint and format

Sync OpenAPI artifacts

Tech stack

Contributing

License

About

Uh oh!

Releases

Sponsor this project

Uh oh!

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

commoncrawl.cc

Why this project exists

Features

Live endpoints

Sponsors

Sponsor visibility

Packages

Architecture

Quick start

1) Install dependencies

2) Configure the web app

3) Start the API

4) Start the web app

Development

Build

Test

Lint and format

Sync OpenAPI artifacts

Tech stack

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages