AI Exposure of the Canadian Job Market

Forked from karpathy. Cooked with Codex.

What's here

[need to edit this] The BLS OOH covers 342 occupations spanning every sector of the US economy, with detailed data on job duties, work environment, education requirements, pay, and employment projections. We scraped all of it, scored each occupation's AI exposure using an LLM, and built an interactive treemap visualization.

Data pipeline

Scrape (scrape.py) — Playwright (non-headless, BLS blocks bots) downloads raw HTML for all 342 occupation pages into html/.
Parse (parse_detail.py, process.py) — BeautifulSoup converts raw HTML into clean Markdown files in pages/.
Tabulate (make_csv.py) — Extracts structured fields (pay, education, job count, growth outlook, SOC code) into occupations.csv.
Score (score.py) — Sends each occupation's Markdown description to an LLM (Gemini Flash via OpenRouter) with a scoring rubric. Each occupation gets an AI Exposure score from 0-10 with a rationale. Results saved to scores.json.
Build site data (build_site_data.py) — Merges CSV stats and AI exposure scores into a compact site/data.json for the frontend.
Website (site/index.html) — Interactive treemap visualization where area = employment and color = AI exposure (green to red).

Key files

File	Description
`occupations.json`	Master list of 342 occupations with title, URL, category, slug
`occupations.csv`	Summary stats: pay, education, job count, growth projections
`scores.json`	AI exposure scores (0-10) with rationales for all 342 occupations
`prompt.md`	All data in a single file, designed to be pasted into an LLM for analysis
`html/`	Raw HTML pages from BLS (source of truth, ~40MB)
`pages/`	Clean Markdown versions of each occupation page
`site/`	Static website (treemap visualization)

AI exposure scoring

Each occupation is scored on a single AI Exposure axis from 0 to 10, measuring how much AI will reshape that occupation. The score considers both direct automation (AI doing the work) and indirect effects (AI making workers so productive that fewer are needed).

A key signal is whether the job's work product is fundamentally digital — if the job can be done entirely from a home office on a computer, AI exposure is inherently high. Conversely, jobs requiring physical presence, manual skill, or real-time human interaction have a natural barrier.

Calibration examples from the dataset:

[need to confirm this]

Score	Meaning	Examples
0-1	Minimal	Roofers, janitors, construction laborers
2-3	Low	Electricians, plumbers, nurses aides, firefighters
4-5	Moderate	Registered nurses, retail workers, physicians
6-7	High	Teachers, managers, accountants, engineers
8-9	Very high	Software developers, paralegals, data analysts, editors
10	Maximum	Medical transcriptionists

Average exposure across all 342 occupations: 5.3/10.

Visualization

The main visualization is an interactive treemap where:

Area of each rectangle is proportional to employment (number of jobs)
Color indicates AI exposure on a green (safe) to red (exposed) scale
Layout groups occupations by BLS category
Hover shows detailed tooltip with pay, jobs, outlook, education, exposure score, and LLM rationale

LLM prompt

prompt.md packages all the data — aggregate statistics, tier breakdowns, exposure by pay/education, BLS growth projections, and all 342 occupations with their scores and rationales — into a single file (~45K tokens) designed to be pasted into an LLM. This lets you have a data-grounded conversation about AI's impact on the job market without needing to run any code. Regenerate it with uv run python make_prompt.py.

Setup

uv sync
uv run playwright install chromium

Requires an OpenRouter API key in .env:

OPENROUTER_API_KEY=your_key_here

Usage

# Scrape BLS pages (only needed once, results are cached in html/)
uv run python scrape.py

# Generate Markdown from HTML
uv run python process.py

# Generate CSV summary
uv run python make_csv.py

# Score AI exposure (uses OpenRouter API)
uv run python score.py

# Build website data
uv run python build_site_data.py

# Serve the site locally
cd site && python -m http.server 8000

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Exposure of the Canadian Job Market

What's here

Data pipeline

Key files

AI exposure scoring

Visualization

LLM prompt

Setup

Usage

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
html		html
site		site
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
build_site_data.py		build_site_data.py
jobs.png		jobs.png
make_csv.py		make_csv.py
make_prompt.py		make_prompt.py
occupational_outlook_handbook.html		occupational_outlook_handbook.html
occupations.csv		occupations.csv
occupations.json		occupations.json
parse_detail.py		parse_detail.py
parse_occupations.py		parse_occupations.py
process.py		process.py
prompt.md		prompt.md
pyproject.toml		pyproject.toml
score.py		score.py
scores.json		scores.json
scrape.py		scrape.py
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

AI Exposure of the Canadian Job Market

What's here

Data pipeline

Key files

AI exposure scoring

Visualization

LLM prompt

Setup

Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages