Skip to content

Commit 999d696

Browse files
baogorekclaudeMaxGhenis
authored
Add sparse matrix builder for local area calibration - SNAP targets (#456)
* Add sparse matrix builder for local area calibration Core components: - sparse_matrix_builder.py: Database-driven approach for building calibration matrices - calibration_utils.py: Shared utilities (cache clearing, constraints, geo helpers) - matrix_tracer.py: Debugging utility for tracing through sparse matrices - create_stratified_cps.py: Create stratified sample preserving high-income households - test_sparse_matrix_builder.py: 6 verification tests for matrix correctness Data pipeline changes: - Add GEO_STACKING env var to cps.py and puf.py for geo-stacking data generation - Add GEO_STACKING_MODE env var to extended_cps.py - Add CPS_2024_Full, PUF_2023, ExtendedCPS_2023 classes - Add policy_data.db download to prerequisites - Add 'make data-geo' target for geo-stacking data pipeline CI/CD: - Add geo-stacking dataset build step to workflow - Add sparse matrix builder test step after geo data generation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * Add changelog entry and format code 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * Refactor tests and fix enum encoding, minimize PR scope - Move sparse matrix tests to tests/test_local_area_calibration/ - Split large test file into focused modules (column indexing, same-state, cross-state, geo masking) - Fix small_enhanced_cps.py enum encoding (decode_to_str before astype) - Fix create_stratified_cps.py to use local storage instead of HuggingFace - Remove CPS_2024_Full to keep PR minimal - Revert ExtendedCPS_2024 to use CPS_2024 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * Rename GEO_STACKING to LOCAL_AREA_CALIBRATION and restore tracer functionality - Rename GEO_STACKING to LOCAL_AREA_CALIBRATION in cps.py, puf.py, extended_cps.py - Rename data-geo to data-local-area in Makefile and workflow - Add create_target_groups function to calibration_utils.py - Enhance MatrixTracer with get_group_rows method and variable_desc in row catalog - Add TARGET GROUPS section to print_matrix_structure output - Add local_area_calibration_setup.ipynb documentation notebook 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * Clear notebook outputs for Myst compatibility * Pin mystmd>=1.7.0 to fix notebook rendering in docs * Add logging for constraint evaluation failures and document CD GEOID format - Replace silent exception catch with debug logging for constraint evaluation - Add comment explaining CD GEOID format (SSCCC where SS=state FIPS) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> --------- Co-authored-by: Claude <[email protected]> Co-authored-by: Max Ghenis <[email protected]>
1 parent accd1a1 commit 999d696

22 files changed

+2134
-7
lines changed

.github/workflows/reusable_test.yaml

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,18 @@ jobs:
7575
TEST_LITE: ${{ !inputs.upload_data }}
7676
PYTHON_LOG_LEVEL: INFO
7777

78+
- name: Build datasets for local area calibration
79+
if: inputs.full_suite
80+
run: |
81+
LOCAL_AREA_CALIBRATION=true python policyengine_us_data/datasets/cps/cps.py
82+
LOCAL_AREA_CALIBRATION=true python policyengine_us_data/datasets/puf/puf.py
83+
LOCAL_AREA_CALIBRATION=true python policyengine_us_data/datasets/cps/extended_cps.py
84+
python policyengine_us_data/datasets/cps/local_area_calibration/create_stratified_cps.py 10500
85+
86+
- name: Run local area calibration tests
87+
if: inputs.full_suite
88+
run: pytest policyengine_us_data/tests/test_local_area_calibration/ -v
89+
7890
- name: Save calibration log
7991
if: inputs.full_suite
8092
uses: actions/upload-artifact@v4

Makefile

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,12 @@ data:
7474
mv policyengine_us_data/storage/enhanced_cps_2024.h5 policyengine_us_data/storage/dense_enhanced_cps_2024.h5
7575
cp policyengine_us_data/storage/sparse_enhanced_cps_2024.h5 policyengine_us_data/storage/enhanced_cps_2024.h5
7676

77+
data-local-area: data
78+
LOCAL_AREA_CALIBRATION=true python policyengine_us_data/datasets/cps/cps.py
79+
LOCAL_AREA_CALIBRATION=true python policyengine_us_data/datasets/puf/puf.py
80+
LOCAL_AREA_CALIBRATION=true python policyengine_us_data/datasets/cps/extended_cps.py
81+
python policyengine_us_data/datasets/cps/local_area_calibration/create_stratified_cps.py 10500
82+
7783
clean:
7884
rm -f policyengine_us_data/storage/*.h5
7985
rm -f policyengine_us_data/storage/*.db

changelog_entry.yaml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
- bump: minor
2+
changes:
3+
added:
4+
- Sparse matrix builder for local area calibration with database-driven constraints
5+
- Local area calibration data pipeline (make data-local-area)
6+
- ExtendedCPS_2023 and PUF_2023 dataset classes
7+
- Stratified CPS sampling to preserve high-income households
8+
- Matrix verification tests for local area calibration

0 commit comments

Comments
 (0)