Commit 999d696
Add sparse matrix builder for local area calibration - SNAP targets (#456)
* Add sparse matrix builder for local area calibration
Core components:
- sparse_matrix_builder.py: Database-driven approach for building calibration matrices
- calibration_utils.py: Shared utilities (cache clearing, constraints, geo helpers)
- matrix_tracer.py: Debugging utility for tracing through sparse matrices
- create_stratified_cps.py: Create stratified sample preserving high-income households
- test_sparse_matrix_builder.py: 6 verification tests for matrix correctness
Data pipeline changes:
- Add GEO_STACKING env var to cps.py and puf.py for geo-stacking data generation
- Add GEO_STACKING_MODE env var to extended_cps.py
- Add CPS_2024_Full, PUF_2023, ExtendedCPS_2023 classes
- Add policy_data.db download to prerequisites
- Add 'make data-geo' target for geo-stacking data pipeline
CI/CD:
- Add geo-stacking dataset build step to workflow
- Add sparse matrix builder test step after geo data generation
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <[email protected]>
* Add changelog entry and format code
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <[email protected]>
* Refactor tests and fix enum encoding, minimize PR scope
- Move sparse matrix tests to tests/test_local_area_calibration/
- Split large test file into focused modules (column indexing, same-state,
cross-state, geo masking)
- Fix small_enhanced_cps.py enum encoding (decode_to_str before astype)
- Fix create_stratified_cps.py to use local storage instead of HuggingFace
- Remove CPS_2024_Full to keep PR minimal
- Revert ExtendedCPS_2024 to use CPS_2024
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <[email protected]>
* Rename GEO_STACKING to LOCAL_AREA_CALIBRATION and restore tracer functionality
- Rename GEO_STACKING to LOCAL_AREA_CALIBRATION in cps.py, puf.py, extended_cps.py
- Rename data-geo to data-local-area in Makefile and workflow
- Add create_target_groups function to calibration_utils.py
- Enhance MatrixTracer with get_group_rows method and variable_desc in row catalog
- Add TARGET GROUPS section to print_matrix_structure output
- Add local_area_calibration_setup.ipynb documentation notebook
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <[email protected]>
* Clear notebook outputs for Myst compatibility
* Pin mystmd>=1.7.0 to fix notebook rendering in docs
* Add logging for constraint evaluation failures and document CD GEOID format
- Replace silent exception catch with debug logging for constraint evaluation
- Add comment explaining CD GEOID format (SSCCC where SS=state FIPS)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <[email protected]>
---------
Co-authored-by: Claude <[email protected]>
Co-authored-by: Max Ghenis <[email protected]>1 parent accd1a1 commit 999d696
File tree
22 files changed
+2134
-7
lines changed- .github/workflows
- docs
- policyengine_us_data
- datasets
- cps
- local_area_calibration
- puf
- storage
- tests/test_local_area_calibration
22 files changed
+2134
-7
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
75 | 75 | | |
76 | 76 | | |
77 | 77 | | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
78 | 90 | | |
79 | 91 | | |
80 | 92 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
74 | 74 | | |
75 | 75 | | |
76 | 76 | | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
77 | 83 | | |
78 | 84 | | |
79 | 85 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
0 commit comments