TabAdjust — Experimental Setup & Results

This document summarizes the 12 experiments I ran to evaluate model-based forecast error adjustment for PV generation. We compare rule-based OCF adjustments against learned adjusters (TabPFN variants and XGBoost, plus AutoGluon where applicable). All metrics reported are MAE (lower is better).

1) Experimental Setup

The design structure follows two broad families of experiments:

Daily retraining (transductive) for Experiments 1–2, where both TabPFN and XGBoost are retrained each day on a rolling past-week window and evaluated on the current day. Features are TabPFN-TS derived, with and without classical lags.
Fixed splits (tabular) for Experiments 3–12, where XGB/AutoGluon are trained on a fixed historical split (varies per experiment) and evaluated on a fixed future window, while TabPFN(Reg) always uses a rolling 15-day lookback and evaluates on the current day within the same evaluation range. Some variants restrict to 04:30–21:00 local hours to focus on daylight.

Experiment grid (what changed each time)

Ex. No.	Data Description	Feature Space	Train Period for XGB, AG	Test Period for XGB, AG	Train Period for TabPFN	Test Period for TabPFN	MAE PERIOD
1	Retrain TabPFN, XGB on full data for each date	TabPFN-TS features, no lags	Past Week	Current Day	Past Week	Current Day	Aug 2024 – May 2025
2	Retrain TabPFN, XGB on full data for each date	TabPFN-TS features + basic time-series + 7-day lags (actual, forecast_err)	Past Week	Current Day	Past Week	Current Day	Aug 2024 – May 2025
3	72% train / 28% test for XGB & AG; TabPFN uses rolling	Basic time-series + 7-day lags, no TabPFN-TS	Aug 2024 – Dec 2024	Jan – Feb 2025	Past 15 Days	Current Day	Jan – Feb 2025
4	Time-filtered (04:30–21:00); split same as Exp. 3	Exactly same as Exp. 3	Aug 2024 – Dec 2024	Jan – Feb 2025	Past 15 Days	Current Day	Jan – Feb 2025
5	75% train / 25% test for XGB & AG; TabPFN rolling	Exactly same as Exp. 3	Aug 2024 – Jan 2025	Feb – Mar 2025	Past 15 Days	Current Day	Feb – Mar 2025
6	Time-filtered (04:30–21:00); split same as Exp. 5	Exactly same as Exp. 3	Aug 2024 – Jan 2025	Feb – Mar 2025	Past 15 Days	Current Day	Feb – Mar 2025
7	77% train / 23% test for XGB & AG; TabPFN rolling	Exactly same as Exp. 3	Aug 2024 – Feb 2025	Mar – Apr 2025	Past 15 Days	Current Day	Mar – Apr 2025
8	Time-filtered (04:30–21:00); split same as Exp. 7	Exactly same as Exp. 3	Aug 2024 – Feb 2025	Mar – Apr 2025	Past 15 Days	Current Day	Mar – Apr 2025
9	80% train / 20% test for XGB & AG; TabPFN rolling	Exactly same as Exp. 3	Aug 2024 – Mar 2025	Apr – May 2025	Past 15 Days	Current Day	Apr – May 2025
10	Time-filtered (04:30–21:00); split same as Exp. 9	Exactly same as Exp. 3	Aug 2024 – Mar 2025	Apr – May 2025	Past 15 Days	Current Day	Apr – May 2025
11	90% train / 10% test for XGB & AG; TabPFN rolling	Exactly same as Exp. 3	Aug 2024 – Apr 2025	May 2025	Past 15 Days	Current Day	May 2025
12	Time-filtered (04:30–21:00); split same as Exp. 11	Exactly same as Exp. 3	Aug 2024 – Apr 2025	May 2025	Past 15 Days	Current Day	May 2025

Evaluation
For each experiment MAE is computed for Baseline (uncorrected forecast), OCF (rule-based adjustment), and one or more learned adjusters: XGB, XGB (Tuned), AutoGluon (30-min), and TabPFN(Reg) with 7d/15d/30d lookback variants when applicable. Here, XGB (Tuned) refers when using the following hyperparameters:

    # capacity / smoothness
    n_estimators=1000,
    learning_rate=0.03,
    max_depth=6,                # keep shallow to avoid overfit on tiny windows
    grow_policy="lossguide",
    max_leaves=256,             # more leaves -> finer predictions than depth alone
    min_child_weight=1,
    gamma=0.0,

    # regularization
    reg_lambda=1.0,
    reg_alpha=0.0,

    # randomness to generalize
    subsample=0.8,
    colsample_bytree=0.8,

    # histogram tree & finer bins → less quantization
    tree_method="hist",
    max_bin=512,                # ↑ increases unique preds (uses more RAM)

    # objective & metrics
    objective="reg:absoluteerror",  # robust to outliers (MAE)
    eval_metric=["mae"],

    # categorical & reproducibility
    enable_categorical=True,
    random_state=42,
    n_jobs=-1,

2) Results Summary (MAE)

MAE across the 12 experiments (numbers rounded as provided). “Lower is better.”

Ex. No.	MAE PERIOD	BASELINE	OCF	XGB	XGB (Tuned)	AG (30 mins)	TABPFN (30d)	TABPFN (7d)	TABPFN (15d)
1	Aug 2024 – May 2025	6.84	6.61	8.65	–	–	–	6.65	–
2	Aug 2024 – May 2025	6.77	6.61	7.70	–	–	–	6.76	–
3	Jan – Feb 2025	5.172467	5.298359	5.926043	5.143177	6.536967	4.842079	5.136870	4.902161
4	Jan – Feb 2025	6.809789	6.975531	8.046942	6.645694	9.231337	6.371601	6.874030	6.425804
5	Feb – Mar 2025	6.950974	6.333376	8.478267	7.338660	9.774701	6.073727	6.496556	6.015113
6	Feb – Mar 2025	9.869959	8.993008	13.002496	9.501256	13.432441	8.624719	9.488458	8.611495
7	Mar – Apr 2025	9.250524	7.816344	12.069821	9.183803	12.729360	7.151459	7.650773	7.311987
8	Mar – Apr 2025	13.104448	11.072765	13.775450	11.126643	15.534543	10.128430	10.789141	10.479442
9	Apr – May 2025	9.975280	8.634488	10.672900	8.443876	11.090249	8.071589	8.668379	8.285999
10	Apr – May 2025	14.050214	12.161703	12.959799	11.910252	14.335486	11.290498	12.058067	11.684221
11	May 2025	9.940009	8.758474	8.872421	8.319686	9.102085	8.728788	9.478281	8.928846
12	May 2025	14.021048	12.354414	12.950353	11.650162	12.864719	12.208591	13.524115	12.526149

High-level takeaways

OCF beats Baseline consistently, confirming the value of simple, horizon×hour error climatology.
TabPFN(Reg) is often the best or strongly competitive—especially with the 30-day or 15-day lookback (Exps. 3–10). Gains over OCF of ~5–9% are typical in these windows.
XGB (Tuned) becomes competitive in late windows (Exps. 11–12), showing ~5–6% improvements over OCF there; earlier ranges sometimes underperform OCF (e.g., Exps. 1–2, 5, 7).
AutoGluon on raw 30-min period underperforms in these particular splits; it may benefit from more targeted feature/ensemble constraints and training for a longer duration.
Filtering to 04:30–21:00 (Exps. 4, 6, 8, 10, 12) slightly changes relative standings but largely preserves the trend: OCF < TabPFN(Reg) ≲ XGB(Tuned) depending on month.

3) Visual Summary (What the plots tell us)

A) Improvement % over OCF per Experiment (Best TabPFN vs Best XGB)

This bar chart compares, for each experiment, the best TabPFN variant and the best XGB variant against OCF in percent improvement. Bars above zero indicate an MAE reduction relative to OCF (good), below zero indicate underperformance.
Reading it, you’ll see TabPFN delivers positive improvements in many experiments, while XGB’s performance is more variable, occasionally trailing OCF in earlier ranges but improving later.

B) Best of TabPFN vs Best of XGB vs OCF vs Baseline (MAE per Experiment)

This line plot shows the absolute MAE for Baseline, OCF, Best TabPFN, and Best XGB across experiments. OCF sits below Baseline (as expected), and the ML adjusters often sit below OCF, with TabPFN frequently leading in mid-range experiments. The star marker highlights the winner per experiment. You can track where XGB (Tuned) catches up or wins in later windows.

C) Best Model per Experiment with Improvement over OCF

Each bar represents the winning model’s MAE in that experiment; the label on top shows how much it improved over OCF (in %). Colors indicate which model won (OCF, TabPFN-30d, TabPFN-15d, or XGB-Tuned). This is a fast way to internalize who wins when and by how much.

Some more insights into Experiments

D) MAE by Time-of-Day (Jan–Feb 2025)

This plot compares hourly MAE profiles for Baseline, OCF, XGB (Untuned), AutoGluon, and TabPFN. All methods show near-zero error at night, rising through the morning ramp, peaking around solar noon, then tapering. OCF consistently lowers daytime MAE relative to Baseline. TabPFN generally tracks or beats OCF across daylight hours, particularly around the mid-day peak where corrections matter most. AG runs higher in this setup, while XGB sits between OCF and TabPFN depending on hour.

E) % MAE Reduction by Horizon (Jan–Feb 2025, vs OCF)

Bars show percentage MAE change vs OCF for each horizon (30–480 minutes). Positive values mean the learner improves upon OCF. TabPFN achieves consistent positive reductions across most horizons, often +6% to +12%. XGB is negative vs OCF at many horizons in this configuration, while AG is notably lower (more negative), indicating it underperforms OCF with the defaults used here. This horizon view is useful for selecting which adjuster to trust at specific lead-times.

F) Average Actual vs Corrected Forecasts by Time-of-Day (Jan–Feb 2025)

Each panel overlays the actual curve with the forecast and the adjusted forecast for a given method. OCF pulls the forecast closer to actual during daylight; TabPFN typically yields the closest alignment at mid-day and during the ramp-up/ramp-down periods, indicating it learns state-dependent biases better than a static climatology. XGB provides partial correction; AG remains looser in this experiment family.

Why Untuned XGB does not outperform the baseline, OCF as TabPFN does?

Features are coarse (e.g., integer horizons, hourly time, few lags), lots of rows look identical to a tree -> same leaves -> same preds. TabPFN digests richer patterns from continuous inputs better.
Underfitting from conservative settings Small n_estimators, shallow max_depth, large min_child_weight/gamma, or strong reg_lambda/reg_alpha reduce model flexibility -> fewer unique preds and higher MAE.
We use XGBoost's default settings with n_estimators=50 -> Xgboost's hist algorithm bins continuous features -> additional discretization before the tree split search, increasing the chance of identical leaf assignments as discussed in 1.
The corrected forecast from XGB shown below clearly tells that XGB fails to recognize patterns where actual_pv_generation_MW > 50. However a similar plot for TabPFN shows that it can model the distribution given an optimal setting and also benifits from the bayesian approach.

Similar plots can be found for each experiment (3,12) in the `final_results` directory.

4) Interpretation & Practical Guidance

If you need a robust default correction, OCF is easy to implement and already gives a consistent gain over Baseline.
If you can accommodate a lightweight per-horizon daily retrain, TabPFN(Reg) with a 15–30 day lookback is a strong choice across months, often beating OCF by 5–9% MAE.
XGB (Tuned) can be favorable in certain months (e.g., May 2025), suggesting value in season-aware retuning or regime-specific ensembling.
Restricting to daylight hours (04:30–21:00) does not reverse conclusions but can stabilize MAE and clarify relative differences. The jupyter notebooks further are worth looking since they also contain insights on the hourly, and horizon wise predictions.

5) Notes & Limitations

All the final runs on NVIDIA-RTX2080 keeping in mind the limitation of TabPFN on the number of samples (10000).
AutoGluon settings here are conservative; specialized presets or tabular feature pruning may close the gap.
For industrial use, using a hybrid: OCF as a safe fallback with a learned tabular adjuster selected per month/season based on recent validation MAE can yield the best results. However, it requires more experimentation on a production level compared this short project.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TabAdjust — Experimental Setup & Results

1) Experimental Setup

Experiment grid (what changed each time)

2) Results Summary (MAE)

3) Visual Summary (What the plots tell us)

A) Improvement % over OCF per Experiment (Best TabPFN vs Best XGB)

B) Best of TabPFN vs Best of XGB vs OCF vs Baseline (MAE per Experiment)

C) Best Model per Experiment with Improvement over OCF

Some more insights into Experiments

D) MAE by Time-of-Day (Jan–Feb 2025)

E) % MAE Reduction by Horizon (Jan–Feb 2025, vs OCF)

F) Average Actual vs Corrected Forecasts by Time-of-Day (Jan–Feb 2025)

Why Untuned XGB does not outperform the baseline, OCF as TabPFN does?

Similar plots can be found for each experiment (3,12) in the `final_results` directory.

4) Interpretation & Practical Guidance

5) Notes & Limitations

FilesExpand file tree

analysis.md

Latest commit

History

analysis.md

File metadata and controls

TabAdjust — Experimental Setup & Results

1) Experimental Setup

Experiment grid (what changed each time)

2) Results Summary (MAE)

3) Visual Summary (What the plots tell us)

A) Improvement % over OCF per Experiment (Best TabPFN vs Best XGB)

B) Best of TabPFN vs Best of XGB vs OCF vs Baseline (MAE per Experiment)

C) Best Model per Experiment with Improvement over OCF

Some more insights into Experiments

D) MAE by Time-of-Day (Jan–Feb 2025)

E) % MAE Reduction by Horizon (Jan–Feb 2025, vs OCF)

F) Average Actual vs Corrected Forecasts by Time-of-Day (Jan–Feb 2025)

Why Untuned XGB does not outperform the baseline, OCF as TabPFN does?

Similar plots can be found for each experiment (3,12) in the final_results directory.

4) Interpretation & Practical Guidance

5) Notes & Limitations

Similar plots can be found for each experiment (3,12) in the `final_results` directory.