-
Notifications
You must be signed in to change notification settings - Fork 2k
Description
H2O version, Operating System and Environment
- H2O-3: developer build from current
master(build project version reported by local run:3.47.0.99999) - OS: macOS (Apple Silicon)
- Python: 3.9
- scikit-learn: latest 1.x tested =
1.6.1 - NumPy/SciPy in test env:
numpy==2.0.2,scipy==1.13.1 - Not running on K8s/Hadoop.
Actual behavior
h2o.sklearn generic estimators can lose/incorrectly expose classifier-vs-regressor semantics in newer scikit-learn API paths (notably clone/tags/type-dispatch integration).
This shows up as inconsistent behavior for wrappers using estimator_type in sklearn interoperability checks.
Expected behavior
- Generic wrappers such as
H2OGradientBoostingEstimator(estimator_type='classifier')should consistently behave as classifiers in sklearn type checks. - Clone should preserve estimator type semantics.
- On sklearn versions exposing the new tags API, wrapper tags should expose the correct estimator type.
Steps to reproduce
- Use current
masterh2o-py with scikit-learn1.6.1. - Create a generic estimator with explicit estimator_type:
import h2o.sklearn as hs from sklearn.base import clone, is_classifier est = hs.H2OGradientBoostingEstimator(estimator_type="classifier")
- Clone and check sklearn semantics:
cloned = clone(est) assert is_classifier(cloned)
- (For sklearn 1.6+) verify tags:
tags = est.__sklearn_tags__() assert getattr(tags, "estimator_type", None) == "classifier"
Prior to fix, semantics may not be reliably preserved in these integration paths.
Additional context
This is tied to scikit-learn tag API evolution introduced in scikit-learn 1.6.0 (__sklearn_tags__ with public Tags objects).
No project dependency pin changes are required by the fix itself; this is wrapper behavior/compatibility logic plus regression tests.
Fix includes updates in:
h2o-py/h2o/sklearn/wrapper.py
h2o-py/tests/testdir_sklearn/pyunit_sklearn_api.py