Skip to content

Latest commit

 

History

History
166 lines (116 loc) · 12.9 KB

File metadata and controls

166 lines (116 loc) · 12.9 KB

TabAdjust — Experimental Setup & Results

This document summarizes the 12 experiments I ran to evaluate model-based forecast error adjustment for PV generation. We compare rule-based OCF adjustments against learned adjusters (TabPFN variants and XGBoost, plus AutoGluon where applicable). All metrics reported are MAE (lower is better).


1) Experimental Setup

The design structure follows two broad families of experiments:

  • Daily retraining (transductive) for Experiments 1–2, where both TabPFN and XGBoost are retrained each day on a rolling past-week window and evaluated on the current day. Features are TabPFN-TS derived, with and without classical lags.
  • Fixed splits (tabular) for Experiments 3–12, where XGB/AutoGluon are trained on a fixed historical split (varies per experiment) and evaluated on a fixed future window, while TabPFN(Reg) always uses a rolling 15-day lookback and evaluates on the current day within the same evaluation range. Some variants restrict to 04:30–21:00 local hours to focus on daylight.

Experiment grid (what changed each time)

Ex. No. Data Description Feature Space Train Period for XGB, AG Test Period for XGB, AG Train Period for TabPFN Test Period for TabPFN MAE PERIOD
1 Retrain TabPFN, XGB on full data for each date TabPFN-TS features, no lags Past Week Current Day Past Week Current Day Aug 2024 – May 2025
2 Retrain TabPFN, XGB on full data for each date TabPFN-TS features + basic time-series + 7-day lags (actual, forecast_err) Past Week Current Day Past Week Current Day Aug 2024 – May 2025
3 72% train / 28% test for XGB & AG; TabPFN uses rolling Basic time-series + 7-day lags, no TabPFN-TS Aug 2024 – Dec 2024 Jan – Feb 2025 Past 15 Days Current Day Jan – Feb 2025
4 Time-filtered (04:30–21:00); split same as Exp. 3 Exactly same as Exp. 3 Aug 2024 – Dec 2024 Jan – Feb 2025 Past 15 Days Current Day Jan – Feb 2025
5 75% train / 25% test for XGB & AG; TabPFN rolling Exactly same as Exp. 3 Aug 2024 – Jan 2025 Feb – Mar 2025 Past 15 Days Current Day Feb – Mar 2025
6 Time-filtered (04:30–21:00); split same as Exp. 5 Exactly same as Exp. 3 Aug 2024 – Jan 2025 Feb – Mar 2025 Past 15 Days Current Day Feb – Mar 2025
7 77% train / 23% test for XGB & AG; TabPFN rolling Exactly same as Exp. 3 Aug 2024 – Feb 2025 Mar – Apr 2025 Past 15 Days Current Day Mar – Apr 2025
8 Time-filtered (04:30–21:00); split same as Exp. 7 Exactly same as Exp. 3 Aug 2024 – Feb 2025 Mar – Apr 2025 Past 15 Days Current Day Mar – Apr 2025
9 80% train / 20% test for XGB & AG; TabPFN rolling Exactly same as Exp. 3 Aug 2024 – Mar 2025 Apr – May 2025 Past 15 Days Current Day Apr – May 2025
10 Time-filtered (04:30–21:00); split same as Exp. 9 Exactly same as Exp. 3 Aug 2024 – Mar 2025 Apr – May 2025 Past 15 Days Current Day Apr – May 2025
11 90% train / 10% test for XGB & AG; TabPFN rolling Exactly same as Exp. 3 Aug 2024 – Apr 2025 May 2025 Past 15 Days Current Day May 2025
12 Time-filtered (04:30–21:00); split same as Exp. 11 Exactly same as Exp. 3 Aug 2024 – Apr 2025 May 2025 Past 15 Days Current Day May 2025

Evaluation
For each experiment MAE is computed for Baseline (uncorrected forecast), OCF (rule-based adjustment), and one or more learned adjusters: XGB, XGB (Tuned), AutoGluon (30-min), and TabPFN(Reg) with 7d/15d/30d lookback variants when applicable. Here, XGB (Tuned) refers when using the following hyperparameters:

    # capacity / smoothness
    n_estimators=1000,
    learning_rate=0.03,
    max_depth=6,                # keep shallow to avoid overfit on tiny windows
    grow_policy="lossguide",
    max_leaves=256,             # more leaves -> finer predictions than depth alone
    min_child_weight=1,
    gamma=0.0,

    # regularization
    reg_lambda=1.0,
    reg_alpha=0.0,

    # randomness to generalize
    subsample=0.8,
    colsample_bytree=0.8,

    # histogram tree & finer bins → less quantization
    tree_method="hist",
    max_bin=512,                # ↑ increases unique preds (uses more RAM)

    # objective & metrics
    objective="reg:absoluteerror",  # robust to outliers (MAE)
    eval_metric=["mae"],

    # categorical & reproducibility
    enable_categorical=True,
    random_state=42,
    n_jobs=-1,

2) Results Summary (MAE)

MAE across the 12 experiments (numbers rounded as provided). “Lower is better.”

Ex. No. MAE PERIOD BASELINE OCF XGB XGB (Tuned) AG (30 mins) TABPFN (30d) TABPFN (7d) TABPFN (15d)
1 Aug 2024 – May 2025 6.84 6.61 8.65 6.65
2 Aug 2024 – May 2025 6.77 6.61 7.70 6.76
3 Jan – Feb 2025 5.172467 5.298359 5.926043 5.143177 6.536967 4.842079 5.136870 4.902161
4 Jan – Feb 2025 6.809789 6.975531 8.046942 6.645694 9.231337 6.371601 6.874030 6.425804
5 Feb – Mar 2025 6.950974 6.333376 8.478267 7.338660 9.774701 6.073727 6.496556 6.015113
6 Feb – Mar 2025 9.869959 8.993008 13.002496 9.501256 13.432441 8.624719 9.488458 8.611495
7 Mar – Apr 2025 9.250524 7.816344 12.069821 9.183803 12.729360 7.151459 7.650773 7.311987
8 Mar – Apr 2025 13.104448 11.072765 13.775450 11.126643 15.534543 10.128430 10.789141 10.479442
9 Apr – May 2025 9.975280 8.634488 10.672900 8.443876 11.090249 8.071589 8.668379 8.285999
10 Apr – May 2025 14.050214 12.161703 12.959799 11.910252 14.335486 11.290498 12.058067 11.684221
11 May 2025 9.940009 8.758474 8.872421 8.319686 9.102085 8.728788 9.478281 8.928846
12 May 2025 14.021048 12.354414 12.950353 11.650162 12.864719 12.208591 13.524115 12.526149

High-level takeaways

  • OCF beats Baseline consistently, confirming the value of simple, horizon×hour error climatology.
  • TabPFN(Reg) is often the best or strongly competitive—especially with the 30-day or 15-day lookback (Exps. 3–10). Gains over OCF of ~5–9% are typical in these windows.
  • XGB (Tuned) becomes competitive in late windows (Exps. 11–12), showing ~5–6% improvements over OCF there; earlier ranges sometimes underperform OCF (e.g., Exps. 1–2, 5, 7).
  • AutoGluon on raw 30-min period underperforms in these particular splits; it may benefit from more targeted feature/ensemble constraints and training for a longer duration.
  • Filtering to 04:30–21:00 (Exps. 4, 6, 8, 10, 12) slightly changes relative standings but largely preserves the trend: OCF < TabPFN(Reg) ≲ XGB(Tuned) depending on month.

3) Visual Summary (What the plots tell us)

A) Improvement % over OCF per Experiment (Best TabPFN vs Best XGB)

Improvement over OCF — % by Experiment

This bar chart compares, for each experiment, the best TabPFN variant and the best XGB variant against OCF in percent improvement. Bars above zero indicate an MAE reduction relative to OCF (good), below zero indicate underperformance.
Reading it, you’ll see TabPFN delivers positive improvements in many experiments, while XGB’s performance is more variable, occasionally trailing OCF in earlier ranges but improving later.

B) Best of TabPFN vs Best of XGB vs OCF vs Baseline (MAE per Experiment)

Best-of-Best MAE vs OCF vs Baseline

This line plot shows the absolute MAE for Baseline, OCF, Best TabPFN, and Best XGB across experiments. OCF sits below Baseline (as expected), and the ML adjusters often sit below OCF, with TabPFN frequently leading in mid-range experiments. The star marker highlights the winner per experiment. You can track where XGB (Tuned) catches up or wins in later windows.

C) Best Model per Experiment with Improvement over OCF

Best Model per Experiment (MAE + % over OCF)

Each bar represents the winning model’s MAE in that experiment; the label on top shows how much it improved over OCF (in %). Colors indicate which model won (OCF, TabPFN-30d, TabPFN-15d, or XGB-Tuned). This is a fast way to internalize who wins when and by how much.

Some more insights into Experiments

D) MAE by Time-of-Day (Jan–Feb 2025)

MAE by Time of Day — Jan–Feb 2025

This plot compares hourly MAE profiles for Baseline, OCF, XGB (Untuned), AutoGluon, and TabPFN. All methods show near-zero error at night, rising through the morning ramp, peaking around solar noon, then tapering. OCF consistently lowers daytime MAE relative to Baseline. TabPFN generally tracks or beats OCF across daylight hours, particularly around the mid-day peak where corrections matter most. AG runs higher in this setup, while XGB sits between OCF and TabPFN depending on hour.


E) % MAE Reduction by Horizon (Jan–Feb 2025, vs OCF)

MAE Reduction by Horizon — Jan–Feb 2025

Bars show percentage MAE change vs OCF for each horizon (30–480 minutes). Positive values mean the learner improves upon OCF. TabPFN achieves consistent positive reductions across most horizons, often +6% to +12%. XGB is negative vs OCF at many horizons in this configuration, while AG is notably lower (more negative), indicating it underperforms OCF with the defaults used here. This horizon view is useful for selecting which adjuster to trust at specific lead-times.


F) Average Actual vs Corrected Forecasts by Time-of-Day (Jan–Feb 2025)

Average Actual vs Corrected — Jan–Feb 2025

Each panel overlays the actual curve with the forecast and the adjusted forecast for a given method. OCF pulls the forecast closer to actual during daylight; TabPFN typically yields the closest alignment at mid-day and during the ramp-up/ramp-down periods, indicating it learns state-dependent biases better than a static climatology. XGB provides partial correction; AG remains looser in this experiment family.

Why Untuned XGB does not outperform the baseline, OCF as TabPFN does?

  1. Features are coarse (e.g., integer horizons, hourly time, few lags), lots of rows look identical to a tree -> same leaves -> same preds. TabPFN digests richer patterns from continuous inputs better.

  2. Underfitting from conservative settings Small n_estimators, shallow max_depth, large min_child_weight/gamma, or strong reg_lambda/reg_alpha reduce model flexibility -> fewer unique preds and higher MAE.

  3. We use XGBoost's default settings with n_estimators=50 -> Xgboost's hist algorithm bins continuous features -> additional discretization before the tree split search, increasing the chance of identical leaf assignments as discussed in 1.

  4. The corrected forecast from XGB shown below clearly tells that XGB fails to recognize patterns where actual_pv_generation_MW > 50. However a similar plot for TabPFN shows that it can model the distribution given an optimal setting and also benifits from the bayesian approach.

Similar plots can be found for each experiment (3,12) in the final_results directory.


4) Interpretation & Practical Guidance

  • If you need a robust default correction, OCF is easy to implement and already gives a consistent gain over Baseline.
  • If you can accommodate a lightweight per-horizon daily retrain, TabPFN(Reg) with a 15–30 day lookback is a strong choice across months, often beating OCF by 5–9% MAE.
  • XGB (Tuned) can be favorable in certain months (e.g., May 2025), suggesting value in season-aware retuning or regime-specific ensembling.
  • Restricting to daylight hours (04:30–21:00) does not reverse conclusions but can stabilize MAE and clarify relative differences. The jupyter notebooks further are worth looking since they also contain insights on the hourly, and horizon wise predictions.

5) Notes & Limitations

  • All the final runs on NVIDIA-RTX2080 keeping in mind the limitation of TabPFN on the number of samples (10000).
  • AutoGluon settings here are conservative; specialized presets or tabular feature pruning may close the gap.
  • For industrial use, using a hybrid: OCF as a safe fallback with a learned tabular adjuster selected per month/season based on recent validation MAE can yield the best results. However, it requires more experimentation on a production level compared this short project.