-
Notifications
You must be signed in to change notification settings - Fork 5
Description
Right now, we don't have any machinery for stateful objects. The extreme case of this is scikit-learn estimators.
This isn't very important for users like SciPy who don't have much stateful API, but to push this towards scikit-learn, we would need a very clear story around this.
sklearn estimators have:
- An
__init__which sets hyperparameters only. - Typically a
fit()that gets the arguments and creates internal state. - Other methods that use or modify the internal state
Normalizing the state is plausible, but seems hard. Typically, once we picked an implementation, we have to stick to it for all other method calls, because otherwise we cannot guarantee compatibility.1
To make matters even more confusing, one could do something like:
with backend_opts(...):
est = Estimator(...)
est.fit() # do we remember the options from when it was created or here?
(Not that I think the above is very worrying either way.)
This probably means that we need a concept of "stateful methods" for such classes. If one was called, we need to remember the backend and always re-use that as well pass in the state.2
However, I need to have a clearer picture to really make up my mind about what the story is here!
There is also the component that estimators may actually make .fit() just call a single hidden _fit_internal() function to normalize things a bit first here.3
The next piece of the puzzle, is that scikit-learn would want a way to convert from an estimator fitted in a backend to a "normal" one (and the opposite direction).
I expect this can be handled with two stateful-methods above.
Footnotes
-
This is actually slightly different for estimators that are Array API enabled. While their state differs in which types it uses, it always has the same structure. ↩
-
One could actually see the state as part of the supported types in theory. Then plugins could effectively indicate that they "understand" another implementations state object.
(We throw all types into one bin, but that bin could contain an estimator state object next to the arrays.) ↩ -
I would personally like to avoid this for simplicity on the scikit-learn side, but it might just be a lot clearer at least for certain estimators. ↩