Skip to content

Commit 8ebd063

Browse files
Fix NEED calibration to use gross income for income band assignment (#288)
* Fix NEED calibration to use gross income for income band assignment NEED 2023 income bands use Experian modelled gross household income, not net income. The previous code used hbai_household_net_income which misallocated households across bands (especially at extremes — too few in £100k+ bands). This switches to household_gross_income (LCFS P344p) for both the LCFS training calibration and the FRS 4D raking step. Co-Authored-By: Claude <noreply@anthropic.com> * Fix LCFS column name: p344p is lowercase Co-Authored-By: Claude <noreply@anthropic.com> * Use gross income in calibration test to match raking The test was evaluating NEED band fit using hbai_household_net_income while the raking now targets household_gross_income. Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>
1 parent 8e82888 commit 8ebd063

File tree

3 files changed

+16
-3
lines changed

3 files changed

+16
-3
lines changed
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Use gross household income (LCFS P344p / FRS household_gross_income) instead of HBAI net income when assigning households to NEED 2023 income bands for energy consumption calibration. NEED uses Experian modelled gross income, so the previous use of net income misallocated households across bands.

policyengine_uk_data/datasets/imputations/consumption.py

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,9 @@
1313
and demographics, matching the strong drivers in NEED admin data.
1414
- Imputed totals are calibrated to NEED 2023 mean kWh targets by income band,
1515
converted to spend using Ofgem Q4 2023 unit rates (Oct 2023 price cap).
16+
NEED income bands use Experian modelled gross household income, so calibration
17+
matches against gross income (LCFS P344p / FRS household_gross_income) rather
18+
than HBAI net income.
1619
"""
1720

1821
import pandas as pd
@@ -75,6 +78,7 @@
7578
"G019": "is_child",
7679
"Gorx": "region",
7780
"P389p": "hbai_household_net_income",
81+
"p344p": "household_gross_income",
7882
"weighta": "household_weight",
7983
}
8084
PERSON_LCF_RENAMES = {
@@ -146,6 +150,7 @@
146150
OFGEM_Q4_2023_GAS_RATE = 6.89 / 100 # £/kWh (Oct 2023 price cap)
147151

148152
# NEED 2023 mean kWh by income band (Table 11b gas, Table 12b electricity)
153+
# Income bands are gross household income (Experian modelled data)
149154
NEED_INCOME_BANDS = [
150155
(0, 15_000, "under_15k", 7_755, 2_412), # gas kWh, elec kWh
151156
(15_000, 20_000, "15k_20k", 9_196, 2_700),
@@ -336,11 +341,14 @@ def _derive_energy_from_lcfs(household: pd.DataFrame) -> pd.DataFrame:
336341

337342

338343
def _calibrate_energy_to_need(
339-
household: pd.DataFrame, income_col: str = "hbai_household_net_income"
344+
household: pd.DataFrame, income_col: str = "household_gross_income"
340345
) -> pd.DataFrame:
341346
"""
342347
Rescale imputed electricity and gas spend to match NEED 2023 income-band means.
343348
349+
NEED 2023 income bands use Experian modelled gross household income, so we
350+
match against gross income rather than HBAI net income.
351+
344352
For each NEED income band, computes the ratio of the NEED-implied mean spend
345353
to the LCFS-derived mean spend and applies it multiplicatively. This preserves
346354
within-band distributional shape while anchoring the level to admin data.
@@ -471,6 +479,7 @@ def generate_lcfs_table(lcfs_person: pd.DataFrame, lcfs_household: pd.DataFrame)
471479
# Annualise weekly LCFS values (× 52)
472480
annualise = list(CONSUMPTION_VARIABLE_RENAMES.values()) + [
473481
"hbai_household_net_income",
482+
"household_gross_income",
474483
"electricity_consumption",
475484
"gas_consumption",
476485
]
@@ -516,6 +525,7 @@ def uprate_lcfs_table(household: pd.DataFrame, time_period: str) -> pd.DataFrame
516525
# Uprate income predictor so training distribution matches FRS target year
517526
for col in [
518527
"hbai_household_net_income",
528+
"household_gross_income",
519529
"employment_income",
520530
"self_employment_income",
521531
"private_pension_income",
@@ -584,7 +594,9 @@ def impute_consumption(dataset: UKSingleYearDataset) -> UKSingleYearDataset:
584594
# This is a 4-dimensional raking (vs the 1D income-band calibration on LCFS
585595
# training data in _calibrate_energy_to_need) because the FRS has the full
586596
# set of housing/demographic variables needed for multi-margin calibration.
587-
income = input_df["hbai_household_net_income"].values
597+
# NEED income bands use Experian modelled gross income, so we use
598+
# household_gross_income rather than hbai_household_net_income.
599+
income = sim.calculate("household_gross_income", map_to="household").values
588600
tenure = sim.calculate("tenure_type", map_to="household").values
589601
accomm = sim.calculate("accommodation_type", map_to="household").values
590602
region = sim.calculate("region", map_to="household").values

policyengine_uk_data/tests/test_energy_calibration.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ def arrays(imputed):
5050
sim = Microsimulation(dataset=imputed)
5151
return dict(
5252
income=sim.calculate(
53-
"hbai_household_net_income", map_to="household", period=2023
53+
"household_gross_income", map_to="household", period=2023
5454
).values,
5555
tenure=sim.calculate("tenure_type", map_to="household", period=2023).values,
5656
accomm=sim.calculate(

0 commit comments

Comments
 (0)