Skip to content

Python sklearn wrappers: fix estimator_type/clone/tag compatibility for scikit-learn 1.6+ #16779

@zazulam

Description

@zazulam

H2O version, Operating System and Environment

  • H2O-3: developer build from current master (build project version reported by local run: 3.47.0.99999)
  • OS: macOS (Apple Silicon)
  • Python: 3.9
  • scikit-learn: latest 1.x tested = 1.6.1
  • NumPy/SciPy in test env: numpy==2.0.2, scipy==1.13.1
  • Not running on K8s/Hadoop.

Actual behavior
h2o.sklearn generic estimators can lose/incorrectly expose classifier-vs-regressor semantics in newer scikit-learn API paths (notably clone/tags/type-dispatch integration).
This shows up as inconsistent behavior for wrappers using estimator_type in sklearn interoperability checks.

Expected behavior

  • Generic wrappers such as H2OGradientBoostingEstimator(estimator_type='classifier') should consistently behave as classifiers in sklearn type checks.
  • Clone should preserve estimator type semantics.
  • On sklearn versions exposing the new tags API, wrapper tags should expose the correct estimator type.

Steps to reproduce

  1. Use current master h2o-py with scikit-learn 1.6.1.
  2. Create a generic estimator with explicit estimator_type:
    import h2o.sklearn as hs
    from sklearn.base import clone, is_classifier
    est = hs.H2OGradientBoostingEstimator(estimator_type="classifier")
  3. Clone and check sklearn semantics:
    cloned = clone(est)
    assert is_classifier(cloned)
  4. (For sklearn 1.6+) verify tags:
    tags = est.__sklearn_tags__()
    assert getattr(tags, "estimator_type", None) == "classifier"

Prior to fix, semantics may not be reliably preserved in these integration paths.

Additional context

This is tied to scikit-learn tag API evolution introduced in scikit-learn 1.6.0 (__sklearn_tags__ with public Tags objects).
No project dependency pin changes are required by the fix itself; this is wrapper behavior/compatibility logic plus regression tests.
Fix includes updates in:
h2o-py/h2o/sklearn/wrapper.py
h2o-py/tests/testdir_sklearn/pyunit_sklearn_api.py

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions