Skip to content

immanuelgiulea/jobsCanada

 
 

Repository files navigation

AI Exposure of the Canadian Job Market

Forked from karpathy. Cooked with Codex.

Live demo US data: karpathy.ai/jobs

AI Exposure Treemap

What's here

[need to edit this] The BLS OOH covers 342 occupations spanning every sector of the US economy, with detailed data on job duties, work environment, education requirements, pay, and employment projections. We scraped all of it, scored each occupation's AI exposure using an LLM, and built an interactive treemap visualization.

Data pipeline

  1. Scrape (scrape.py) — Playwright (non-headless, BLS blocks bots) downloads raw HTML for all 342 occupation pages into html/.
  2. Parse (parse_detail.py, process.py) — BeautifulSoup converts raw HTML into clean Markdown files in pages/.
  3. Tabulate (make_csv.py) — Extracts structured fields (pay, education, job count, growth outlook, SOC code) into occupations.csv.
  4. Score (score.py) — Sends each occupation's Markdown description to an LLM (Gemini Flash via OpenRouter) with a scoring rubric. Each occupation gets an AI Exposure score from 0-10 with a rationale. Results saved to scores.json.
  5. Build site data (build_site_data.py) — Merges CSV stats and AI exposure scores into a compact site/data.json for the frontend.
  6. Website (site/index.html) — Interactive treemap visualization where area = employment and color = AI exposure (green to red).

Key files

File Description
occupations.json Master list of 342 occupations with title, URL, category, slug
occupations.csv Summary stats: pay, education, job count, growth projections
scores.json AI exposure scores (0-10) with rationales for all 342 occupations
prompt.md All data in a single file, designed to be pasted into an LLM for analysis
html/ Raw HTML pages from BLS (source of truth, ~40MB)
pages/ Clean Markdown versions of each occupation page
site/ Static website (treemap visualization)

AI exposure scoring

Each occupation is scored on a single AI Exposure axis from 0 to 10, measuring how much AI will reshape that occupation. The score considers both direct automation (AI doing the work) and indirect effects (AI making workers so productive that fewer are needed).

A key signal is whether the job's work product is fundamentally digital — if the job can be done entirely from a home office on a computer, AI exposure is inherently high. Conversely, jobs requiring physical presence, manual skill, or real-time human interaction have a natural barrier.

Calibration examples from the dataset:

[need to confirm this]

Score Meaning Examples
0-1 Minimal Roofers, janitors, construction laborers
2-3 Low Electricians, plumbers, nurses aides, firefighters
4-5 Moderate Registered nurses, retail workers, physicians
6-7 High Teachers, managers, accountants, engineers
8-9 Very high Software developers, paralegals, data analysts, editors
10 Maximum Medical transcriptionists

Average exposure across all 342 occupations: 5.3/10.

Visualization

The main visualization is an interactive treemap where:

  • Area of each rectangle is proportional to employment (number of jobs)
  • Color indicates AI exposure on a green (safe) to red (exposed) scale
  • Layout groups occupations by BLS category
  • Hover shows detailed tooltip with pay, jobs, outlook, education, exposure score, and LLM rationale

LLM prompt

prompt.md packages all the data — aggregate statistics, tier breakdowns, exposure by pay/education, BLS growth projections, and all 342 occupations with their scores and rationales — into a single file (~45K tokens) designed to be pasted into an LLM. This lets you have a data-grounded conversation about AI's impact on the job market without needing to run any code. Regenerate it with uv run python make_prompt.py.

Setup

uv sync
uv run playwright install chromium

Requires an OpenRouter API key in .env:

OPENROUTER_API_KEY=your_key_here

Usage

# Scrape BLS pages (only needed once, results are cached in html/)
uv run python scrape.py

# Generate Markdown from HTML
uv run python process.py

# Generate CSV summary
uv run python make_csv.py

# Score AI exposure (uses OpenRouter API)
uv run python score.py

# Build website data
uv run python build_site_data.py

# Serve the site locally
cd site && python -m http.server 8000

About

Analyzing how susceptible every occupation in the Canadian economy is to AI and automation, using data from the StatsCan

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • HTML 99.9%
  • Python 0.1%