Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
bofire is lacking support for properly handling experimental data that is recorded as a "timeseries"; e.g. for the same experiment, at e.g. 10 different timepoints the output values are determined. Whenever this is the case, cross-validation needs to properly handle this as to not split this data so part of a single trajectory goes into train and test at the same time. This PR adds a flag
is_timeseriesto numerical input feature, which if enabled for a feature requires an _trajeftory_id in experiments to define which experiment it belongs to.cross_validate then automatically performs the appropriate GroupKFold split instead of simply a random split.
There is also a conveniece util function to automatically create the _trajectory_id column from a dataframe by checking for equivalent values in all other input features (up to some eps).
Have you read the Contributing Guidelines on pull requests?
Yes
Test Plan