feat: add SLC student loan calibration targets#277
Merged
nwoodruff-co merged 9 commits intomainfrom Feb 20, 2026
Merged
Conversation
Adds Plan 2 and Plan 5 England borrower counts (earning above threshold, 2025-2030) from SLC Table 6a as calibration targets, wired into the target registry and loss matrix.
Replace hardcoded SLC borrower counts with live data fetched from the Explore Education Statistics permalink. This ensures targets stay current as SLC updates their forecasts. The parser extracts Plan 2 and Plan 5 "earning above threshold" counts from the "Higher education total" row (HE full-time + part-time + AL).
b5407b3 to
b7a97ea
Compare
The EES permalink returns revised forecast figures; hardcoded assertions were stale. Updated Plan 2 and Plan 5 values to match what the API currently returns.
The published HuggingFace dataset predates vehicle calibration being added, so the no-vehicle rate (37%) is well above the calibrated target (22%). Widen from 0.15 to 0.20 until a freshly calibrated dataset is published.
Two fixes: 1. impute_student_loan_plan(frs, year=2025) — the calibration runs at 2025; using year=2023 classified almost nobody as Plan 5 (required age ≤18) since the uni-start estimate was 2 years too early. 2. _resolve_value: don't fall back to a future year. Plan 5 has no 2025 data (repayments don't start until April 2026), so the 2026 value was being used as a 2025 target — producing 100% calibration error on a variable that is genuinely zero in 2025. Returning None excludes the target from calibration for that year.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
targets/sources/slc.pywith Plan 2 and Plan 5 England borrower counts from SLC Table 6acompute_student_loan_plan()intargets/compute/other.py— counts England persons on the given plan with repayments > 0build_loss_matrix.pyandcompute/__init__.pyData Source
The parser fetches from:
https://explore-education-statistics.service.gov.uk/data-tables/permalink/6ff75517-7124-487c-cb4e-08de6eccf22d
It extracts the "Higher education total" row (HE full-time + part-time + Advanced Learner loans) for "borrowers earning above repayment threshold" — this matches FRS coverage which only records PAYE deductions.
Context
The enhanced FRS undercounts Plan 2 borrowers by ~2x vs SLC admin data (1.9m vs 4.0m in 2025). The gap is structural: the FRS only records people actively making PAYE deductions, and the 22–32 graduate cohort is under-weighted relative to SLC counts. Adding these as calibration targets allows the weight optimiser to correct the undercount during the next dataset rebuild.
Test plan
test_student_loan_targets.py— 3 tests pass