@@ -15,6 +15,31 @@ on our benchmarks.
1515
1616![ Meta-test benchmark results] ( ./figures/meta-test_benchmark_results.png )
1717
18+ ## When (not) to use pytabkit
19+
20+ - ** To get the best possible results** :
21+ - Generally we recommend AutoGluon for the best possible results,
22+ though it does not include all the models from pytabkit.
23+ It will probably include RealMLP in the upcoming 1.4 version.
24+ - To get the best possible results from ` pytabkit ` ,
25+ we recommend using
26+ ` Ensemble_HPO_Classifier(n_cv=8, use_full_caruana_ensembling=True, use_tabarena_spaces=True, n_hpo_steps=50) `
27+ with a ` val_metric_name ` corresponding to your target metric
28+ (e.g., ` class_error ` , ` cross_entropy ` , ` brier ` , ` 1-auc_ovr ` ), or the corresponding ` Regressor ` .
29+ (This might take very long to fit.)
30+ - For only a single model, we recommend using
31+ ` RealMLP_HPO_Classifier(n_cv=8, hpo_space_name='tabarena', use_caruana_ensembling=True, n_hyperopt_steps=50) ` ,
32+ also with ` val_metric_name ` as above, or the corresponding ` Regressor ` .
33+ - ** Models** : [ TabArena] ( https://github.com/AutoGluon/tabrepo )
34+ also includes some newer models like RealMLP and TabM
35+ with more general preprocessing (missing numericals, text, etc.),
36+ as well as very good boosted tree implementations.
37+ ` pytabkit ` is currently still easier to use
38+ and supports vectorized cross-validation for RealMLP,
39+ which can significantly speed up the training.
40+ - ** Benchmarking** : While pytabkit can be good for quick benchmarking for development,
41+ for method evaluation we recommend [ TabArena] ( https://github.com/AutoGluon/tabrepo ) .
42+
1843## Installation (new in 1.4.0: optional model dependencies)
1944
2045``` bash
@@ -171,6 +196,20 @@ and https://docs.ray.io/en/latest/cluster/vms/user-guides/community/slurm.html
171196
172197## Releases (see git tags)
173198
199+ - v1.5.0:
200+ - added ` n_repeats ` parameter to scikit-learn interfaces for repeated cross-validation
201+ - HPO sklearn interfaces (the ones using random search)
202+ can now do weighted ensembling instead by setting ` use_caruana_ensembling=True ` .
203+ Removed the ` RealMLP_Ensemble_Classifier ` and ` RealMLP_Ensemble_Regressor ` from v1.4.2
204+ since they are now redundant through this feature.
205+ - renamed ` space ` parameter of GBDT HPO interface
206+ to ` hpo_space_name ` so now it also works with non-TPE versions.
207+ - Added new [ TabArena] ( https://tabarena.ai ) search spaces for boosted trees (not TPE),
208+ which should be almost equivalent to the ones from TabArena
209+ except for the early stopping logic.
210+ - TabM now supports ` val_metric_name ` for early stopping on different metrics.
211+ - fixed issues #20 and #21 regarding HPO
212+ - small updates for the [ "Rethinking Early Stopping" paper] ( https://arxiv.org/abs/2501.19195 )
174213- v1.4.2:
175214 - fixed handling of custom ` val_metric_name ` HPO models and ` Ensemble_TD_Regressor ` .
176215 - if ` tmp_folder ` is specified in HPO models,
@@ -246,7 +285,7 @@ and https://docs.ray.io/en/latest/cluster/vms/user-guides/community/slurm.html
246285 Add time limit for RealMLP,
247286 add support for ` lightning ` (but also still allowing ` pytorch-lightning ` ),
248287 making skorch a lazy import, removed msgpack\_ numpy dependency.
249- - v1.0.0: Release for the NeurIPS version and arXiv v2.
288+ - v1.0.0: Release for the NeurIPS version and arXiv v2+v3 .
250289 - More baselines (MLP-PLR, FT-Transformer, TabR-HPO, RF-HPO),
251290 also some un-polished internal interfaces for other methods,
252291 esp. the ones in AutoGluon.
0 commit comments