Forked from karpathy. Cooked with Codex.
Live demo US data: karpathy.ai/jobs
[need to edit this] The BLS OOH covers 342 occupations spanning every sector of the US economy, with detailed data on job duties, work environment, education requirements, pay, and employment projections. We scraped all of it, scored each occupation's AI exposure using an LLM, and built an interactive treemap visualization.
- Scrape (
scrape.py) — Playwright (non-headless, BLS blocks bots) downloads raw HTML for all 342 occupation pages intohtml/. - Parse (
parse_detail.py,process.py) — BeautifulSoup converts raw HTML into clean Markdown files inpages/. - Tabulate (
make_csv.py) — Extracts structured fields (pay, education, job count, growth outlook, SOC code) intooccupations.csv. - Score (
score.py) — Sends each occupation's Markdown description to an LLM (Gemini Flash via OpenRouter) with a scoring rubric. Each occupation gets an AI Exposure score from 0-10 with a rationale. Results saved toscores.json. - Build site data (
build_site_data.py) — Merges CSV stats and AI exposure scores into a compactsite/data.jsonfor the frontend. - Website (
site/index.html) — Interactive treemap visualization where area = employment and color = AI exposure (green to red).
| File | Description |
|---|---|
occupations.json |
Master list of 342 occupations with title, URL, category, slug |
occupations.csv |
Summary stats: pay, education, job count, growth projections |
scores.json |
AI exposure scores (0-10) with rationales for all 342 occupations |
prompt.md |
All data in a single file, designed to be pasted into an LLM for analysis |
html/ |
Raw HTML pages from BLS (source of truth, ~40MB) |
pages/ |
Clean Markdown versions of each occupation page |
site/ |
Static website (treemap visualization) |
Each occupation is scored on a single AI Exposure axis from 0 to 10, measuring how much AI will reshape that occupation. The score considers both direct automation (AI doing the work) and indirect effects (AI making workers so productive that fewer are needed).
A key signal is whether the job's work product is fundamentally digital — if the job can be done entirely from a home office on a computer, AI exposure is inherently high. Conversely, jobs requiring physical presence, manual skill, or real-time human interaction have a natural barrier.
Calibration examples from the dataset:
[need to confirm this]
| Score | Meaning | Examples |
|---|---|---|
| 0-1 | Minimal | Roofers, janitors, construction laborers |
| 2-3 | Low | Electricians, plumbers, nurses aides, firefighters |
| 4-5 | Moderate | Registered nurses, retail workers, physicians |
| 6-7 | High | Teachers, managers, accountants, engineers |
| 8-9 | Very high | Software developers, paralegals, data analysts, editors |
| 10 | Maximum | Medical transcriptionists |
Average exposure across all 342 occupations: 5.3/10.
The main visualization is an interactive treemap where:
- Area of each rectangle is proportional to employment (number of jobs)
- Color indicates AI exposure on a green (safe) to red (exposed) scale
- Layout groups occupations by BLS category
- Hover shows detailed tooltip with pay, jobs, outlook, education, exposure score, and LLM rationale
prompt.md packages all the data — aggregate statistics, tier breakdowns, exposure by pay/education, BLS growth projections, and all 342 occupations with their scores and rationales — into a single file (~45K tokens) designed to be pasted into an LLM. This lets you have a data-grounded conversation about AI's impact on the job market without needing to run any code. Regenerate it with uv run python make_prompt.py.
uv sync
uv run playwright install chromium
Requires an OpenRouter API key in .env:
OPENROUTER_API_KEY=your_key_here
# Scrape BLS pages (only needed once, results are cached in html/)
uv run python scrape.py
# Generate Markdown from HTML
uv run python process.py
# Generate CSV summary
uv run python make_csv.py
# Score AI exposure (uses OpenRouter API)
uv run python score.py
# Build website data
uv run python build_site_data.py
# Serve the site locally
cd site && python -m http.server 8000