Improve estimation of powerplants with missing years

**Is your feature request related to a problem? Please describe.**
At the moment we use group averages to estimate the start year of missing powerplants.

```python
    # Impute using country averages.
    averages = (
        prepared_df.groupby(["country_id", "category", "technology", "status"])[
            "start_year"
        ]
        .transform("mean")
        .round()
    )
```

This works well enough for brownfield modelling (and is used in other libraries, like `powerplantmatching`).
However, it can produce odd histograms in cases with little data.

<img width="366" height="292" alt="Image" src="https://github.com/user-attachments/assets/6a471302-1c03-4d4a-bfed-e250526ab980" />

**Describe the solution you'd like**
Although these are guesses / heuristics, we should strive for smoother trends to make them more realistic.
My current idea is to use a greedy algorithm to assign powerplants using historical statistics.

Logic:
1. retired powerplants are assigned to years before the `DATASET_YEAR` (the harmonised year where powerplant data was published).
2. operating powerplants are assigned to years that ensure they are still operational at the `DATASET_YEAR`.
3. planned powerplants are distributed around the weighted mean (or median?) of future projects.

To distribute these powerplants smoothly across years, we could use greedy approaches.
- Let $N_t=$ national capacity of (country, category) in year $t$
- Let $C_t=$ capacity from plants with known start / end year in year $t$
- Let $\Delta N_t = N_t - N_{t-1}$ and $\Delta C_t = C_t - C_{t-1}$
- Headroom is $Err_t = \Delta N_t - \Delta K_t$

Sort plants with missing years (largest first), then pick the year with the largest headroom (or smallest residual $(Err_t-plant)^2$). Rooftop PV is a special case here... we should ensure monotonic increase.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve estimation of powerplants with missing years #4

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve estimation of powerplants with missing years #4

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions