Skip to content

Commit d2f5dd6

Browse files
chekosclaude
andauthored
Replace matplotlib with Altair and add live charts to docs (#305)
* Replace matplotlib with Altair across all docs and add rendered Vega-Lite charts Switch all visualization examples from matplotlib to Altair for a more declarative, interactive charting experience. Add mkdocs-charts-plugin to render Vega-Lite specs directly in the docs so readers can see live, interactive charts with tooltips—not just code blocks. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Address PR review: add admonitions, DC data, and altair dep - Add `altair` to docs optional dependencies in pyproject.toml - Add DC (id: 11) to choropleth datasets in quickstart, decennial, and population-estimates pages that were missing it - Add `!!! example` admonitions above all rendered vegalite charts to clarify they show state-level sample data, not the exact output of the code above them - Update chart titles to remove "(sample data)" suffix now that admonitions explain the context Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Update uv.lock with mkdocs-charts-plugin and altair deps The lockfile was out of sync with pyproject.toml after adding mkdocs-charts-plugin and altair to the docs extras. ReadTheDocs uses `uv sync --frozen` which requires the lockfile to match. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add CLAUDE.md with project context for Claude Code Documents project structure, common commands, dependency management (including the uv.lock gotcha with ReadTheDocs), docs conventions, CI pipeline, and test markers. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Fix dot density map: extract geometry coords for Altair Altair cannot resolve nested attribute access like geometry.x in field references. Extract lon/lat into separate columns before passing to alt.Chart(). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 1f68b42 commit d2f5dd6

File tree

12 files changed

+856
-66
lines changed

12 files changed

+856
-66
lines changed

CLAUDE.md

Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
# CLAUDE.md
2+
3+
Project-level context for Claude Code sessions working on PyPUMS.
4+
5+
## Project overview
6+
7+
PyPUMS is a Python interface to the US Census Bureau API. It provides functions for
8+
ACS, Decennial Census, PUMS microdata, population estimates, and migration flows.
9+
The package returns pandas DataFrames (or GeoDataFrames when `geometry=True`).
10+
11+
- **Package source:** `pypums/`
12+
- **Tests:** `tests/`
13+
- **Docs:** `docs/` (MkDocs Material, hosted on ReadTheDocs)
14+
- **CLI:** `pypums/cli.py` (Typer)
15+
16+
## Common commands
17+
18+
```bash
19+
# Run tests (uses doctest + pytest)
20+
uv run pytest
21+
22+
# Run a specific test file
23+
uv run pytest tests/test_get_acs.py
24+
25+
# Lint and format
26+
uv run ruff check --fix .
27+
uv run ruff format .
28+
29+
# Build docs locally
30+
uv run mkdocs build --strict
31+
32+
# Serve docs locally (live reload)
33+
uv run mkdocs serve
34+
```
35+
36+
## Dependencies and lockfile
37+
38+
This project uses **uv** for dependency management.
39+
40+
**Important:** When you add or change dependencies in `pyproject.toml`, you **must**
41+
run `uv lock` and commit the updated `uv.lock`. ReadTheDocs uses
42+
`uv sync --frozen --extra docs` which requires the lockfile to be in sync with
43+
`pyproject.toml`. Forgetting to update `uv.lock` will cause the docs build to fail
44+
on CI.
45+
46+
Optional dependency groups:
47+
- `spatial` — geopandas (for `geometry=True` support)
48+
- `test` — pytest
49+
- `docs` — mkdocs-material, mkdocstrings, mkdocs-charts-plugin, altair, etc.
50+
51+
## Code style
52+
53+
- **Linter/formatter:** Ruff (config in `pyproject.toml`)
54+
- **Target:** Python 3.10+
55+
- **Pre-commit hooks:** trailing whitespace, end-of-file fixer, check-yaml, check-toml,
56+
ruff-check, ruff-format
57+
58+
## Docs
59+
60+
The documentation uses **MkDocs Material** with these notable plugins/extensions:
61+
62+
- `mkdocs-charts-plugin` with `pymdownx.superfences` custom fences for rendering
63+
**Vega-Lite** charts inline. Use ` ```vegalite ` fenced code blocks with a JSON
64+
Vega-Lite spec to render interactive charts directly in the docs.
65+
- `mkdocstrings[python]` for API reference (numpy-style docstrings).
66+
- `mkdocs-typer` for CLI reference auto-generation.
67+
- Visualizations use **Altair** (not matplotlib). Code examples should use Altair, and
68+
rendered previews use Vega-Lite JSON specs in `vegalite` fenced blocks.
69+
70+
Rendered Vega-Lite charts in docs should:
71+
- Use inline `"values"` data (not live API calls) so they render without a backend.
72+
- Include all 50 states + DC for choropleth maps.
73+
- Be wrapped in `!!! example "Interactive preview"` admonitions when the rendered chart
74+
shows a different geographic scale than the code example (e.g., state-level preview
75+
for a tract-level code example).
76+
77+
## CI pipeline
78+
79+
- **GitHub Actions:** lint (ruff), tests (Python 3.10-3.13)
80+
- **ReadTheDocs:** builds docs from `.readthedocs.yaml` using `uv sync --frozen --extra docs`
81+
- **Pre-commit hooks:** run on every commit (trailing whitespace, ruff, etc.)
82+
83+
## Test markers
84+
85+
```
86+
phase0 — Foundation (API infra, geography, FIPS, cache)
87+
phase1 — Core data functions (get_acs, get_decennial, load_variables)
88+
phase2 — MOE, spatial, enhanced PUMS
89+
phase3 — Estimates, flows, survey
90+
integration — Requires real Census API key and network
91+
spatial — Requires geopandas
92+
```

docs/getting-started/quickstart.md

Lines changed: 111 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -84,22 +84,124 @@ print(type(la_poverty)) # <class 'geopandas.geodataframe.GeoDataFrame'>
8484
4. **county** -- `"037"` is the FIPS code for Los Angeles County. Use `pypums.datasets.fips.lookup_fips(state="California", county="Los Angeles County")` to look up codes.
8585
5. **geometry** -- When `True`, PyPUMS fetches TIGER/Line cartographic boundary shapefiles and merges them with the data. The result is a `GeoDataFrame` with a `geometry` column.
8686

87-
Now plot it:
87+
Now plot it with [Altair](https://altair-viz.github.io/):
8888

8989
```python
90-
la_poverty.plot(
91-
column="estimate",
92-
cmap="YlOrRd",
93-
legend=True,
94-
figsize=(12, 10),
95-
edgecolor="0.8",
96-
linewidth=0.3,
97-
missing_kwds={"color": "lightgrey"},
90+
import altair as alt
91+
92+
alt.Chart(la_poverty).mark_geoshape(
93+
stroke="white", strokeWidth=0.3,
94+
).encode(
95+
color=alt.Color(
96+
"estimate:Q",
97+
scale=alt.Scale(scheme="yelloworangered"),
98+
legend=alt.Legend(title="Below Poverty Level"),
99+
),
100+
tooltip=["NAME:N", alt.Tooltip("estimate:Q", format=",")],
101+
).project("albersUsa").properties(
102+
width=600, height=500,
103+
title="Poverty by Census Tract, Los Angeles County",
98104
)
99105
```
100106

101107
The resulting map shows poverty counts by Census tract across Los Angeles County, with darker shades indicating higher counts.
102108

109+
!!! example "Interactive preview — state-level choropleth with ACS population data"
110+
The chart below shows a state-level choropleth to demonstrate what
111+
`geometry=True` output looks like when plotted with Altair. Your actual
112+
tract-level map will have much finer geographic detail.
113+
114+
```vegalite
115+
{
116+
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
117+
"width": 600,
118+
"height": 400,
119+
"title": "ACS Total Population by State",
120+
"data": {
121+
"url": "https://cdn.jsdelivr.net/npm/vega-datasets@v2.7.0/data/us-10m.json",
122+
"format": {"type": "topojson", "feature": "states"}
123+
},
124+
"transform": [
125+
{
126+
"lookup": "id",
127+
"from": {
128+
"data": {
129+
"values": [
130+
{"id": 1, "estimate": 5074296, "name": "Alabama"},
131+
{"id": 2, "estimate": 733583, "name": "Alaska"},
132+
{"id": 4, "estimate": 7359197, "name": "Arizona"},
133+
{"id": 5, "estimate": 3045637, "name": "Arkansas"},
134+
{"id": 6, "estimate": 39029342, "name": "California"},
135+
{"id": 8, "estimate": 5839926, "name": "Colorado"},
136+
{"id": 9, "estimate": 3626205, "name": "Connecticut"},
137+
{"id": 10, "estimate": 1018396, "name": "Delaware"},
138+
{"id": 11, "estimate": 671803, "name": "District of Columbia"},
139+
{"id": 12, "estimate": 22244823, "name": "Florida"},
140+
{"id": 13, "estimate": 10912876, "name": "Georgia"},
141+
{"id": 15, "estimate": 1440196, "name": "Hawaii"},
142+
{"id": 16, "estimate": 1939033, "name": "Idaho"},
143+
{"id": 17, "estimate": 12582032, "name": "Illinois"},
144+
{"id": 18, "estimate": 6833037, "name": "Indiana"},
145+
{"id": 19, "estimate": 3200517, "name": "Iowa"},
146+
{"id": 20, "estimate": 2937150, "name": "Kansas"},
147+
{"id": 21, "estimate": 4512310, "name": "Kentucky"},
148+
{"id": 22, "estimate": 4590241, "name": "Louisiana"},
149+
{"id": 23, "estimate": 1385340, "name": "Maine"},
150+
{"id": 24, "estimate": 6164660, "name": "Maryland"},
151+
{"id": 25, "estimate": 6981974, "name": "Massachusetts"},
152+
{"id": 26, "estimate": 10034113, "name": "Michigan"},
153+
{"id": 27, "estimate": 5717184, "name": "Minnesota"},
154+
{"id": 28, "estimate": 2940057, "name": "Mississippi"},
155+
{"id": 29, "estimate": 6177957, "name": "Missouri"},
156+
{"id": 30, "estimate": 1122867, "name": "Montana"},
157+
{"id": 31, "estimate": 1967923, "name": "Nebraska"},
158+
{"id": 32, "estimate": 3177772, "name": "Nevada"},
159+
{"id": 33, "estimate": 1395231, "name": "New Hampshire"},
160+
{"id": 34, "estimate": 9261699, "name": "New Jersey"},
161+
{"id": 35, "estimate": 2113344, "name": "New Mexico"},
162+
{"id": 36, "estimate": 19677151, "name": "New York"},
163+
{"id": 37, "estimate": 10698973, "name": "North Carolina"},
164+
{"id": 38, "estimate": 779261, "name": "North Dakota"},
165+
{"id": 39, "estimate": 11756058, "name": "Ohio"},
166+
{"id": 40, "estimate": 4019800, "name": "Oklahoma"},
167+
{"id": 41, "estimate": 4240137, "name": "Oregon"},
168+
{"id": 42, "estimate": 12972008, "name": "Pennsylvania"},
169+
{"id": 44, "estimate": 1093734, "name": "Rhode Island"},
170+
{"id": 45, "estimate": 5282634, "name": "South Carolina"},
171+
{"id": 46, "estimate": 909824, "name": "South Dakota"},
172+
{"id": 47, "estimate": 7051339, "name": "Tennessee"},
173+
{"id": 48, "estimate": 30029572, "name": "Texas"},
174+
{"id": 49, "estimate": 3380800, "name": "Utah"},
175+
{"id": 50, "estimate": 647064, "name": "Vermont"},
176+
{"id": 51, "estimate": 8642274, "name": "Virginia"},
177+
{"id": 53, "estimate": 7785786, "name": "Washington"},
178+
{"id": 54, "estimate": 1775156, "name": "West Virginia"},
179+
{"id": 55, "estimate": 5892539, "name": "Wisconsin"},
180+
{"id": 56, "estimate": 576851, "name": "Wyoming"}
181+
]
182+
},
183+
"key": "id",
184+
"fields": ["estimate", "name"]
185+
}
186+
}
187+
],
188+
"projection": {"type": "albersUsa"},
189+
"mark": {"type": "geoshape", "stroke": "white", "strokeWidth": 0.5},
190+
"encoding": {
191+
"color": {
192+
"field": "estimate",
193+
"type": "quantitative",
194+
"scale": {"scheme": "yelloworangered"},
195+
"legend": {"title": "Population", "format": ","}
196+
},
197+
"tooltip": [
198+
{"field": "name", "type": "nominal", "title": "State"},
199+
{"field": "estimate", "type": "quantitative", "title": "Population", "format": ","}
200+
]
201+
}
202+
}
203+
```
204+
103205
!!! info "What is a TIGER/Line shapefile?"
104206
The Census Bureau publishes free geographic boundary files called
105207
TIGER/Line shapefiles. When you set `geometry=True`, PyPUMS automatically

docs/guides/acs-data.md

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -127,7 +127,7 @@ The `output` parameter controls the shape of the returned DataFrame.
127127
=== "Tidy (default)"
128128

129129
Each row is one geography-variable combination. This is ideal for
130-
plotting with libraries like Altair, Plotly, or seaborn.
130+
plotting with [Altair](https://altair-viz.github.io/).
131131

132132
```python
133133
df_tidy = pypums.get_acs(
@@ -383,7 +383,12 @@ ca_counties_geo = pypums.get_acs(
383383
print(type(ca_counties_geo))
384384
# <class 'geopandas.geodataframe.GeoDataFrame'>
385385

386-
ca_counties_geo.plot(column="estimate", legend=True)
386+
import altair as alt
387+
388+
alt.Chart(ca_counties_geo).mark_geoshape(stroke="white", strokeWidth=0.5).encode(
389+
color=alt.Color("estimate:Q", legend=alt.Legend(title="Population")),
390+
tooltip=["NAME:N", alt.Tooltip("estimate:Q", format=",")],
391+
).project("albersUsa").properties(width=500, height=400)
387392
```
388393

389394
!!! note "Optional dependency"

docs/guides/decennial-data.md

Lines changed: 108 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -225,7 +225,108 @@ county_geo = pypums.get_decennial(
225225
year=2020,
226226
geometry=True,
227227
)
228-
county_geo.plot(column="value", legend=True)
228+
import altair as alt
229+
230+
alt.Chart(county_geo).mark_geoshape(stroke="white", strokeWidth=0.5).encode(
231+
color=alt.Color("value:Q", legend=alt.Legend(title="Population")),
232+
tooltip=["NAME:N", alt.Tooltip("value:Q", format=",")],
233+
).project("albersUsa").properties(width=500, height=400)
234+
```
235+
236+
!!! example "Interactive preview — state-level 2020 Census population"
237+
The code above fetches county-level data for Illinois. The chart below
238+
uses state-level data to demonstrate what `geometry=True` + Altair
239+
looks like at a broader geographic scale.
240+
241+
```vegalite
242+
{
243+
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
244+
"width": 500,
245+
"height": 350,
246+
"title": "2020 Census Population by State",
247+
"data": {
248+
"url": "https://cdn.jsdelivr.net/npm/vega-datasets@v2.7.0/data/us-10m.json",
249+
"format": {"type": "topojson", "feature": "states"}
250+
},
251+
"transform": [
252+
{
253+
"lookup": "id",
254+
"from": {
255+
"data": {
256+
"values": [
257+
{"id": 1, "value": 5024279, "name": "Alabama"},
258+
{"id": 2, "value": 733391, "name": "Alaska"},
259+
{"id": 4, "value": 7151502, "name": "Arizona"},
260+
{"id": 5, "value": 3011524, "name": "Arkansas"},
261+
{"id": 6, "value": 39538223, "name": "California"},
262+
{"id": 8, "value": 5773714, "name": "Colorado"},
263+
{"id": 9, "value": 3605944, "name": "Connecticut"},
264+
{"id": 10, "value": 989948, "name": "Delaware"},
265+
{"id": 11, "value": 689545, "name": "District of Columbia"},
266+
{"id": 12, "value": 21538187, "name": "Florida"},
267+
{"id": 13, "value": 10711908, "name": "Georgia"},
268+
{"id": 15, "value": 1455271, "name": "Hawaii"},
269+
{"id": 16, "value": 1839106, "name": "Idaho"},
270+
{"id": 17, "value": 12812508, "name": "Illinois"},
271+
{"id": 18, "value": 6785528, "name": "Indiana"},
272+
{"id": 19, "value": 3190369, "name": "Iowa"},
273+
{"id": 20, "value": 2937880, "name": "Kansas"},
274+
{"id": 21, "value": 4505836, "name": "Kentucky"},
275+
{"id": 22, "value": 4657757, "name": "Louisiana"},
276+
{"id": 23, "value": 1362359, "name": "Maine"},
277+
{"id": 24, "value": 6177224, "name": "Maryland"},
278+
{"id": 25, "value": 7029917, "name": "Massachusetts"},
279+
{"id": 26, "value": 10077331, "name": "Michigan"},
280+
{"id": 27, "value": 5706494, "name": "Minnesota"},
281+
{"id": 28, "value": 2961279, "name": "Mississippi"},
282+
{"id": 29, "value": 6154913, "name": "Missouri"},
283+
{"id": 30, "value": 1084225, "name": "Montana"},
284+
{"id": 31, "value": 1961504, "name": "Nebraska"},
285+
{"id": 32, "value": 3104614, "name": "Nevada"},
286+
{"id": 33, "value": 1377529, "name": "New Hampshire"},
287+
{"id": 34, "value": 9288994, "name": "New Jersey"},
288+
{"id": 35, "value": 2117522, "name": "New Mexico"},
289+
{"id": 36, "value": 20201249, "name": "New York"},
290+
{"id": 37, "value": 10439388, "name": "North Carolina"},
291+
{"id": 38, "value": 779094, "name": "North Dakota"},
292+
{"id": 39, "value": 11799448, "name": "Ohio"},
293+
{"id": 40, "value": 3959353, "name": "Oklahoma"},
294+
{"id": 41, "value": 4237256, "name": "Oregon"},
295+
{"id": 42, "value": 13002700, "name": "Pennsylvania"},
296+
{"id": 44, "value": 1097379, "name": "Rhode Island"},
297+
{"id": 45, "value": 5118425, "name": "South Carolina"},
298+
{"id": 46, "value": 886667, "name": "South Dakota"},
299+
{"id": 47, "value": 6910840, "name": "Tennessee"},
300+
{"id": 48, "value": 29145505, "name": "Texas"},
301+
{"id": 49, "value": 3271616, "name": "Utah"},
302+
{"id": 50, "value": 643077, "name": "Vermont"},
303+
{"id": 51, "value": 8631393, "name": "Virginia"},
304+
{"id": 53, "value": 7705281, "name": "Washington"},
305+
{"id": 54, "value": 1793716, "name": "West Virginia"},
306+
{"id": 55, "value": 5893718, "name": "Wisconsin"},
307+
{"id": 56, "value": 576851, "name": "Wyoming"}
308+
]
309+
},
310+
"key": "id",
311+
"fields": ["value", "name"]
312+
}
313+
}
314+
],
315+
"projection": {"type": "albersUsa"},
316+
"mark": {"type": "geoshape", "stroke": "white", "strokeWidth": 0.5},
317+
"encoding": {
318+
"color": {
319+
"field": "value",
320+
"type": "quantitative",
321+
"scale": {"scheme": "blues"},
322+
"legend": {"title": "Population", "format": ","}
323+
},
324+
"tooltip": [
325+
{"field": "name", "type": "nominal", "title": "State"},
326+
{"field": "value", "type": "quantitative", "title": "Population", "format": ","}
327+
]
328+
}
329+
}
229330
```
230331

231332
!!! note "Optional dependency"
@@ -381,7 +482,12 @@ pop_map = pypums.get_decennial(
381482
year=2020,
382483
geometry=True,
383484
)
384-
pop_map.plot(column="value", legend=True, cmap="YlOrRd")
485+
import altair as alt
486+
487+
alt.Chart(pop_map).mark_geoshape(stroke="white", strokeWidth=0.3).encode(
488+
color=alt.Color("value:Q", scale=alt.Scale(scheme="yelloworangered")),
489+
tooltip=["NAME:N", alt.Tooltip("value:Q", format=",")],
490+
).project("albersUsa").properties(width=500, height=400)
385491
```
386492

387493
### Comparing 2010 and 2020

0 commit comments

Comments
 (0)