-
Notifications
You must be signed in to change notification settings - Fork 43
Description
Context
In an MLJ pipeline, components are trained in sequence. If a component is mutated (replaced or its hyperparameters changed) retraining the pipeline only triggers retraining of that component and downstream components. However, whenever the training rows are changed, as happens in cross-validation, the entire pipeline is retrained. For data hygiene reasons, this is typically the desired behaviour. However, this can lead to long compute times.
For a similar example, consider cross-validation of a model wrapper by TunedModel
, with cross-validation specified as part of the resampling strategy. In this case we get nested cross-validation, again appropriate for data hygiene reasons, but this can be expensive.
Tuning or cross-validating a model Stack
poses similar challenges.
Proposal
As a practical tool to mitigate such expensive computations, I propose adding a Freezable
model wrapper, with this property: model = Freezable(atomic_model, frozen=true)
behaves just like atomic_model
except it that a call to train
it is a no-op, unless the model has never been trained or frozen=false
.
Thoughts anyone?