-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Is your feature request related to a problem? Please describe.
At the moment we use group averages to estimate the start year of missing powerplants.
# Impute using country averages.
averages = (
prepared_df.groupby(["country_id", "category", "technology", "status"])[
"start_year"
]
.transform("mean")
.round()
)This works well enough for brownfield modelling (and is used in other libraries, like powerplantmatching).
However, it can produce odd histograms in cases with little data.
Describe the solution you'd like
Although these are guesses / heuristics, we should strive for smoother trends to make them more realistic.
My current idea is to use a greedy algorithm to assign powerplants using historical statistics.
Logic:
- retired powerplants are assigned to years before the
DATASET_YEAR(the harmonised year where powerplant data was published). - operating powerplants are assigned to years that ensure they are still operational at the
DATASET_YEAR. - planned powerplants are distributed around the weighted mean (or median?) of future projects.
To distribute these powerplants smoothly across years, we could use greedy approaches.
- Let
$N_t=$ national capacity of (country, category) in year$t$ - Let
$C_t=$ capacity from plants with known start / end year in year$t$ - Let
$\Delta N_t = N_t - N_{t-1}$ and$\Delta C_t = C_t - C_{t-1}$ - Headroom is
$Err_t = \Delta N_t - \Delta K_t$
Sort plants with missing years (largest first), then pick the year with the largest headroom (or smallest residual