Variable Fold Weights

I'm currently working through a project where there's time dependency, so I'm using expanding-window folds (e.g. fold 1 is from 2018-2021 with test set 2022, fold 2 is 2018-2022 with test set 2023, etc.). When running a hyperparameter optimization, to my understanding the code will create an average value of the evaluation metric, weighing each of these folds equivalently, even though they have differing number of records. In this case, it would probably make more sense to weight their contribution to the evaluation metric proportional to the number of instances in the given fold.

An example might be:

Fold | N | Metric
-- | -- | --
1 | 100 | 0.80
2 | 200 | 0.85
3 | 250 | 0.90
4 | 300 | 0.95


We should weigh fold 4 more than fold 1 when deciding the hyperparameters. So, under equal weighting we would get 0.875, whereas under weighting proportional to N would give 0.894.

I'm happy to work on this myself and make a PR, just wanted to confirm it would be a supported feature before I invested time into it.

As far as implementation, I think the straightforward way would be to pass a vector to the various `tune` functions (`tune_bayes` or `tune_grid`) indicating weights per-fold, and possibly make a helper function that creates that vector for you based on fold population, which seems like the most obvious case where you would use this. I could also see some argument for putting it into the `rset` object, but that might be a bit over-engineered.  

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Variable Fold Weights #990

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Variable Fold Weights #990

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions