Skip to content

Commit 630470d

Browse files
committed
v1.4.2: updates to HPO and ensembling
1 parent 8a03aeb commit 630470d

File tree

9 files changed

+263
-74
lines changed

9 files changed

+263
-74
lines changed

.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,16 @@
11
*.pyc
22
*.pdf
33
*.zip
4+
*.ckpt
45

56
experiments/*/
7+
experiments/trace.json
68
!experiments/meta_hpo
79
!experiments/prototypes
810
public_export
911
dist
12+
files
13+
plots
1014

1115
docs/build
1216
docs/source/modules.rst

README.md

Lines changed: 14 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -80,12 +80,13 @@ Our ML models are available in up to three variants, all with best-epoch selecti
8080

8181
- library defaults (D)
8282
- our tuned defaults (TD)
83-
- random search hyperparameter optimization (HPO), sometimes also tree parzen estimator (HPO-TPE)
83+
- random search hyperparameter optimization (HPO),
84+
sometimes also tree parzen estimator (HPO-TPE) or weighted ensembling (Ensemble)
8485

8586
We provide the following ML models:
8687

87-
- **RealMLP** (TD, HPO): Our new neural net models with tuned defaults (TD)
88-
or random search hyperparameter optimization (HPO)
88+
- **RealMLP** (TD, HPO, Ensemble): Our new neural net models with tuned defaults (TD),
89+
random search hyperparameter optimization (HPO), or Ensembling
8990
- **XGB**, **LGBM**, **CatBoost** (D, TD, HPO, HPO-TPE): Interfaces for gradient-boosted
9091
tree libraries XGBoost, LightGBM, CatBoost
9192
- **MLP**, **ResNet**, **FTT** (D, HPO): Models
@@ -170,6 +171,16 @@ and https://docs.ray.io/en/latest/cluster/vms/user-guides/community/slurm.html
170171

171172
## Releases (see git tags)
172173

174+
- v1.4.2:
175+
- fixed handling of custom `val_metric_name` HPO models and `Ensemble_TD_Regressor`.
176+
- if `tmp_folder` is specified in HPO models,
177+
save each model to disk immediately instead of holding all of them in memory.
178+
This can considerably reduce RAM/VRAM usage.
179+
In this case, pickled HPO models will still rely on the models stored in the `tmp_folder`.
180+
- We now provide `RealMLP_Ensemble_Classifier` and `RealMLP_Ensemble_Regressor`,
181+
which will use weighted ensembling and usually perform better than HPO
182+
(but have slower inference time). We recommend using the new `hpo_space_name='tabarena'`
183+
for best results.
173184
- v1.4.1:
174185
- moved dill to optional dependencies
175186
- updated TabM code to a newer version:

docs/source/models/01_sklearn_interfaces.rst

Lines changed: 14 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -9,17 +9,24 @@ and categorical features in the ``fit`` method:
99

1010
.. autofunction:: pytabkit.models.sklearn.sklearn_base.AlgInterfaceEstimator.fit
1111

12+
Important: For HPO and ensemble interfaces, it is recommended to set `tmp_folder`
13+
to allow these methods to store fitted models instead of holding them in the RAM.
14+
This means that `tmp_folder` should not be deleted while the associated interface
15+
still exists (even when it is pickled).
1216

1317
RealMLP
1418
-------
1519

16-
For RealMLP, we provide TD (tuned default)
17-
and HPO (hyperparameter optimization with random search) variants:
20+
For RealMLP, we provide TD (tuned default),
21+
HPO (hyperparameter optimization with random search),
22+
and Ensemble (weighted ensembling of random search configurations) variants:
1823

1924
- RealMLP_TD_Classifier
2025
- RealMLP_TD_Regressor
2126
- RealMLP_HPO_Classifier
2227
- RealMLP_HPO_Regressor
28+
- RealMLP_Ensemble_Classifier
29+
- RealMLP_Ensemble_Regressor
2330

2431
While the TD variants have good defaults,
2532
they provide the option to override any hyperparameters.
@@ -32,7 +39,7 @@ and ``verbosity`` may be ignored by some of the methods.
3239

3340
.. autofunction:: pytabkit.models.sklearn.sklearn_interfaces.RealMLP_TD_Classifier.__init__
3441

35-
For the HPO variants, we currently only provide few options:
42+
For the HPO and Ensemble variants, we currently only provide few options:
3643

3744
.. autofunction:: pytabkit.models.sklearn.sklearn_interfaces.RealMLP_HPO_Classifier.__init__
3845

@@ -74,8 +81,8 @@ with our scikit-learn interfaces,
7481
although in this case the validation sets are not used.
7582
The respective classes are called
7683
``RF_SKL_Classifier`` and ``MLP_SKL_Classifier`` etc.
77-
We also provide our ``Ensemble_TD_Classifier``,
78-
a weighted ensemble of our TD models (and similar for regression).
84+
We also provide our ``Ensemble_TD_Classifier`` and ``Ensemble_HPO_Classifier``,
85+
a weighted ensemble of our TD / HPO models (and similar for regression).
7986

8087
..
8188
test
@@ -97,6 +104,8 @@ can be saved using pickle-like modules.
97104
With standard pickling,
98105
a model trained on a GPU will be restored to use the same GPU,
99106
and fail to load if the GPU is not present.
107+
(Note that dill fails to save torch models in newer torch versions,
108+
while pickle can still save them.)
100109

101110
The following code allows to load GPU-trained models to the CPU,
102111
but fails to run predict() due to pytorch-lightning device issues.

pytabkit/__about__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,4 +2,4 @@
22
#
33
# SPDX-License-Identifier: Apache-2.0
44

5-
__version__ = "1.4.1"
5+
__version__ = "1.4.2"

pytabkit/bench/run/results.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,14 @@ def load(path: Path, load_other: bool = True, load_preds: bool = True):
7575
rm.metrics_dict = utils.deserialize(path / 'metrics.yaml', use_yaml=True)
7676
if load_other:
7777
rm.other_dict = utils.deserialize(path / 'other.msgpack.gz', use_msgpack=True, compressed=True)
78+
for mode in ['cv', 'refit']:
79+
if mode in rm.other_dict and 'y_preds' in rm.other_dict[mode]:
80+
# other_dict was created by old code and still contains y_preds
81+
if mode == 'cv':
82+
rm.y_preds_cv = rm.other_dict[mode]['y_preds']
83+
else:
84+
rm.y_preds_refit = rm.other_dict[mode]['y_preds']
85+
7886
if load_preds:
7987
if utils.existsFile(path / 'y_preds_cv.npz'):
8088
rm.y_preds_cv = np.load(path / 'y_preds_cv.npz')['y_preds']

pytabkit/models/alg_interfaces/ensemble_interfaces.py

Lines changed: 55 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,17 @@
1+
import copy
12
from pathlib import Path
2-
from typing import List, Optional, Dict, Any, Union
3+
from typing import List, Optional, Dict
34

45
import numpy as np
56
import torch
67

7-
from pytabkit.models import utils
88
from pytabkit.models.alg_interfaces.alg_interfaces import SingleSplitAlgInterface, AlgInterface
99
from pytabkit.models.alg_interfaces.base import SplitIdxs, InterfaceResources, RequiredResources
1010
from pytabkit.models.data.data import DictDataset, TaskType
1111
from pytabkit.models.torch_utils import cat_if_necessary
1212
from pytabkit.models.training.logging import Logger
1313
from pytabkit.models.training.metrics import Metrics
14+
from pytabkit.models.utils import ObjectLoadingContext
1415

1516

1617
class WeightedPrediction:
@@ -28,25 +29,6 @@ def predict_for_weights(self, weights: np.ndarray):
2829
return weighted_sum
2930

3031

31-
class ObjectLoadingContext:
32-
def __init__(self, obj: Any, filename: Optional[Union[str, Path]] = None):
33-
self.obj = obj
34-
self.filename = filename
35-
self.saved = False
36-
37-
def __enter__(self) -> Any:
38-
# use pickle since it works better with torch than dill
39-
if self.saved:
40-
self.obj = utils.deserialize(self.filename, use_pickle=True)
41-
return self.obj
42-
43-
def __exit__(self, type, value, traceback) -> None:
44-
if self.filename is not None:
45-
utils.serialize(self.filename, self.obj, use_pickle=True)
46-
self.saved = True
47-
del self.obj
48-
49-
5032
class CaruanaEnsembleAlgInterface(SingleSplitAlgInterface):
5133
"""
5234
Following a simple variant of Caruana et al. (2004), "Ensemble selection from libraries of models"
@@ -65,10 +47,15 @@ def get_refit_interface(self, n_refit: int, fit_params: Optional[List[Dict]] = N
6547

6648
def fit(self, ds: DictDataset, idxs_list: List[SplitIdxs], interface_resources: InterfaceResources,
6749
logger: Logger, tmp_folders: List[Optional[Path]], name: str) -> None:
50+
assert len(idxs_list) == 1
51+
52+
# if tmp_folders is specified, then models will be saved there instead of holding all of them in memory
6853
tmp_folder = tmp_folders[0]
6954
self.alg_contexts_ = [ObjectLoadingContext(ai, None if tmp_folder is None else tmp_folder / f'model_{i}') for
7055
i, ai in enumerate(self.alg_interfaces)]
71-
self.alg_interfaces = None # allow not holding all of them later, to free GPU memory
56+
# store copies here, but the ones that will actually be trained are in alg_contexts_
57+
# this means that models should not be held in RAM all the time
58+
self.alg_interfaces = copy.deepcopy(self.alg_interfaces)
7259

7360
sub_fit_params = []
7461

@@ -94,7 +81,7 @@ def fit(self, ds: DictDataset, idxs_list: List[SplitIdxs], interface_resources:
9481
if val_metric_name is None:
9582
val_metric_name = Metrics.default_val_metric_name(task_type=self.task_type)
9683

97-
n_caruana_steps = self.config.get('n_caruana_steps', 40) # default value is taken from TaskRepo paper (IIRC)
84+
n_caruana_steps = self.config.get('n_caruana_steps', 40) # default value is taken from TabRepo paper (IIRC)
9885

9986
y_preds_oob_list = []
10087
for alg_idx, alg_ctx in enumerate(self.alg_contexts_):
@@ -114,6 +101,8 @@ def fit(self, ds: DictDataset, idxs_list: List[SplitIdxs], interface_resources:
114101

115102
wp = WeightedPrediction(y_preds_oob_list, self.task_type)
116103

104+
allow_negative_weights = self.config.get('allow_negative_weights', False)
105+
117106
for step_idx in range(n_caruana_steps):
118107
best_step_weights = None
119108
best_step_loss = np.inf
@@ -129,6 +118,21 @@ def fit(self, ds: DictDataset, idxs_list: List[SplitIdxs], interface_resources:
129118

130119
weights[weight_idx] -= 1
131120

121+
# negative weights option
122+
# check weights >= 2 allowing for floating-point errors
123+
if allow_negative_weights and np.sum(weights) >= 1.5:
124+
weights[weight_idx] -= 1
125+
126+
y_pred_oob = wp.predict_for_weights(weights)
127+
loss = Metrics.apply(y_pred_oob, y_oob, val_metric_name).item()
128+
# print(f'{weights=}, {loss=}')
129+
if loss < best_step_loss:
130+
best_step_loss = loss
131+
best_step_weights = np.copy(weights)
132+
133+
weights[weight_idx] += 1
134+
135+
132136
if best_step_loss < best_loss:
133137
best_loss = best_step_loss
134138
best_weights = np.copy(best_step_weights)
@@ -179,13 +183,22 @@ def fit(self, ds: DictDataset, idxs_list: List[SplitIdxs], interface_resources:
179183
logger: Logger, tmp_folders: List[Optional[Path]], name: str) -> None:
180184
assert len(idxs_list) == 1
181185

186+
# if tmp_folders is specified, then models will be saved there instead of holding all of them in memory
187+
tmp_folder = tmp_folders[0]
188+
self.alg_contexts_ = [ObjectLoadingContext(ai, None if tmp_folder is None else tmp_folder / f'model_{i}') for
189+
i, ai in enumerate(self.alg_interfaces)]
190+
# store copies here, but the ones that will actually be trained are in alg_contexts_
191+
# this means that models should not be held in RAM all the time
192+
self.alg_interfaces = copy.deepcopy(self.alg_interfaces)
193+
182194
if self.fit_params is not None:
183195
# this is the refit stage, there is no validation data set to determine the best model on,
184196
# instead the best model index is already in fit_params
185197
best_alg_idx = self.fit_params[0]['best_alg_idx']
186198
sub_tmp_folders = [tmp_folder / str(best_alg_idx) if tmp_folder is not None else None for tmp_folder in
187199
tmp_folders]
188-
self.alg_interfaces[best_alg_idx].fit(ds, idxs_list, interface_resources, logger, sub_tmp_folders,
200+
with self.alg_contexts_[best_alg_idx] as alg_interface:
201+
alg_interface.fit(ds, idxs_list, interface_resources, logger, sub_tmp_folders,
189202
name + f'sub-alg-{best_alg_idx}')
190203

191204
return
@@ -206,28 +219,32 @@ def fit(self, ds: DictDataset, idxs_list: List[SplitIdxs], interface_resources:
206219

207220
best_alg_idx = 0
208221
best_alg_loss = np.inf
222+
best_sub_fit_params = None
209223

210-
for alg_idx, alg_interface in enumerate(self.alg_interfaces):
211-
sub_tmp_folders = [tmp_folder / str(alg_idx) if tmp_folder is not None else None for tmp_folder in
212-
tmp_folders]
213-
alg_interface.fit(ds, idxs_list, interface_resources, logger, sub_tmp_folders, name + f'sub-alg-{alg_idx}')
214-
y_preds = alg_interface.predict(ds)
215-
# get out-of-bag predictions
216-
y_pred_oob = cat_if_necessary([y_preds[j, idxs_list[0].val_idxs[j]]
217-
for j in range(idxs_list[0].val_idxs.shape[0])], dim=0)
218-
loss = Metrics.apply(y_pred_oob, y_oob, val_metric_name).item()
219-
if loss < best_alg_loss:
220-
best_alg_loss = loss
221-
best_alg_idx = alg_idx
224+
for alg_idx, alg_ctx in enumerate(self.alg_contexts_):
225+
with alg_ctx as alg_interface:
226+
sub_tmp_folders = [tmp_folder / str(alg_idx) if tmp_folder is not None else None for tmp_folder in
227+
tmp_folders]
228+
alg_interface.fit(ds, idxs_list, interface_resources, logger, sub_tmp_folders, name + f'sub-alg-{alg_idx}')
229+
y_preds = alg_interface.predict(ds)
230+
# get out-of-bag predictions
231+
y_pred_oob = cat_if_necessary([y_preds[j, idxs_list[0].val_idxs[j]]
232+
for j in range(idxs_list[0].val_idxs.shape[0])], dim=0)
233+
loss = Metrics.apply(y_pred_oob, y_oob, val_metric_name).item()
234+
if loss < best_alg_loss:
235+
best_alg_loss = loss
236+
best_alg_idx = alg_idx
237+
best_sub_fit_params = alg_interface.get_fit_params()[0]
222238

223239
self.fit_params = [dict(best_alg_idx=best_alg_idx,
224-
sub_fit_params=self.alg_interfaces[best_alg_idx].get_fit_params()[0])]
240+
sub_fit_params=best_sub_fit_params)]
225241
logger.log(2, f'Best algorithm has index {best_alg_idx}')
226242
logger.log(2, f'Algorithm selection fit parameters: {self.fit_params[0]}')
227243

228244
def predict(self, ds: DictDataset) -> torch.Tensor:
229245
alg_idx = self.fit_params[0]['best_alg_idx']
230-
return self.alg_interfaces[alg_idx].predict(ds)
246+
with self.alg_contexts_[alg_idx] as alg_interface:
247+
return alg_interface.predict(ds)
231248

232249
def get_required_resources(self, ds: DictDataset, n_cv: int, n_refit: int, n_splits: int,
233250
split_seeds: List[int], n_train: int) -> RequiredResources:

pytabkit/models/alg_interfaces/nn_interfaces.py

Lines changed: 40 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -337,7 +337,8 @@ def __init__(self, is_classification: bool, hpo_space_name: str = 'default', **c
337337
def sample_params(self, seed: int) -> Dict[str, Any]:
338338
assert self.hpo_space_name in ['default', 'clr', 'moresigma', 'moresigmadim', 'moresigmadimreg',
339339
'moresigmadimsize', 'moresigmadimlr', 'probclass', 'probclass-mlp', 'large',
340-
'alt1', 'alt2', 'alt3', 'alt4', 'alt5', 'alt6', 'alt7', 'alt8', 'alt9', 'alt10']
340+
'alt1', 'alt2', 'alt3', 'alt4', 'alt5', 'alt6', 'alt7', 'alt8', 'alt9', 'alt10',
341+
'tabarena']
341342
rng = np.random.default_rng(seed=seed)
342343

343344
if self.hpo_space_name == 'probclass-mlp':
@@ -620,6 +621,43 @@ def sample_params(self, seed: int) -> Dict[str, Any]:
620621
'scale_lr_factor': np.exp(rng.uniform(np.log(2.0), np.log(10.0))),
621622
'p_drop_sched': 'flat_cos',
622623
}
624+
elif self.hpo_space_name == 'tabarena':
625+
# common search space
626+
params = {
627+
'n_hidden_layers': rng.integers(2, 4, endpoint=True),
628+
'hidden_sizes': 'rectangular',
629+
'hidden_width': rng.choice([256, 384, 512]),
630+
'p_drop': rng.uniform(0.0, 0.5),
631+
'act': 'mish',
632+
'plr_sigma': np.exp(rng.uniform(np.log(1e-2), np.log(50))),
633+
'sq_mom': 1.0 - np.exp(rng.uniform(np.log(5e-3), np.log(5e-2))),
634+
'plr_lr_factor': np.exp(rng.uniform(np.log(5e-2), np.log(3e-1))),
635+
'scale_lr_factor': np.exp(rng.uniform(np.log(2.0), np.log(10.0))),
636+
'first_layer_lr_factor': np.exp(rng.uniform(np.log(0.3), np.log(1.5))),
637+
'ls_eps_sched': 'coslog4',
638+
'ls_eps': np.exp(rng.uniform(np.log(5e-3), np.log(1e-1))),
639+
'p_drop_sched': 'flat_cos',
640+
'lr': np.exp(rng.uniform(np.log(2e-2), np.log(3e-1))),
641+
'wd': np.exp(rng.uniform(np.log(1e-3), np.log(5e-2))),
642+
'use_ls': rng.choice(["auto", True]), # use label smoothing (will be ignored for regression)
643+
}
644+
645+
if rng.uniform(0.0, 1.0) > 0.5:
646+
# large configs
647+
params['plr_hidden_1'] = rng.choice([8, 16, 32, 64])
648+
params['plr_hidden_2'] = rng.choice([8, 16, 32, 64])
649+
params['n_epochs'] = rng.choice([256, 512])
650+
params['use_early_stopping'] = True
651+
652+
# set in the defaults of RealMLP in TabArena
653+
params['early_stopping_multiplicative_patience'] = 3
654+
params['early_stopping_additive_patience'] = 40
655+
else:
656+
# default values, used here to always set the same set of parameters
657+
params['plr_hidden_1'] = 16
658+
params['plr_hidden_2'] = 4
659+
params['n_epochs'] = 256
660+
params['use_early_stopping'] = False
623661

624662
# print(f'{params=}')
625663

@@ -651,6 +689,7 @@ def _create_sub_interface(self, ds: DictDataset, seed: int):
651689
# params = utils.update_dict(self.fit_params[0], self.config)
652690
if 'n_epochs' in self.config:
653691
params['n_epochs'] = self.config['n_epochs']
692+
self.fit_params[0] = params
654693
return NNAlgInterface(fit_params=None, **params)
655694

656695
def fit(self, ds: DictDataset, idxs_list: List[SplitIdxs], interface_resources: InterfaceResources,

0 commit comments

Comments
 (0)