Skip to content

Conversation

@jkeupp
Copy link
Contributor

@jkeupp jkeupp commented Nov 17, 2025

Motivation

bofire is lacking support for properly handling experimental data that is recorded as a "timeseries"; e.g. for the same experiment, at e.g. 10 different timepoints the output values are determined. Whenever this is the case, cross-validation needs to properly handle this as to not split this data so part of a single trajectory goes into train and test at the same time. This PR adds a flag
is_timeseries to numerical input feature, which if enabled for a feature requires an _trajeftory_id in experiments to define which experiment it belongs to.
cross_validate then automatically performs the appropriate GroupKFold split instead of simply a random split.
There is also a conveniece util function to automatically create the _trajectory_id column from a dataframe by checking for equivalent values in all other input features (up to some eps).

Have you read the Contributing Guidelines on pull requests?

Yes

Test Plan

  • The respective cases when cross_validate should behave differently are tested explicitly.
  • specs tests were fixed to accomodate the new input feature is_timeseries flag.
  • timeseries utils tests were added.

@jduerholt
Copy link
Contributor

Just tell me when it is ready for review ;)

@jkeupp
Copy link
Contributor Author

jkeupp commented Nov 30, 2025

hmm, yes... We left off by disucssion whether the group split column would be also make sense stand-alone without the time-series feature in itself. I think that makes sense, but i've not yet updated any code. it'll probably take a while.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants