-
Notifications
You must be signed in to change notification settings - Fork 183
[enhancement] Enable Array API in ensemble algos #2201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
/intelci: run |
|
/intelci: run |
|
/intelci: run |
|
/intelci: run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| for i, v in enumerate(class_weights): | ||
| expanded_class_weight[y_store_unique_indices == i] *= v |
Copilot
AI
Dec 7, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment on line 688 warns about O(n*m) complexity. This nested iteration over classes and samples could be a significant performance bottleneck for datasets with many classes. Consider adding a more explicit warning in the docstring or raising a warning at runtime when the number of classes exceeds a threshold (e.g., >100).
| dtype=[xp.float64, xp.float32], | ||
| ensure_all_finite=not sklearn_check_version( | ||
| "1.4" | ||
| ), # completed in offload check |
Copilot
AI
Dec 7, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment 'completed in offload check' is unclear about where and how the finite check is completed. This should reference the specific location (e.g., line numbers or function name) where the check occurs to aid future maintenance.
| ), # completed in offload check | |
| ), # finite check is performed in support_input_format() in onedal._device_offload |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
/intelci: run |
1 similar comment
|
/intelci: run |
| - tests/test_common.py::test_estimators[ExtraTreesClassifier()-check_sample_weights_invariance(kind=ones)] | ||
| - tests/test_common.py::test_estimators[ExtraTreesClassifier()-check_sample_weights_invariance(kind=zeros)] | ||
| - tests/test_common.py::test_estimators[ExtraTreesRegressor()-check_sample_weights_invariance(kind=ones)] | ||
| - ensemble/tests/test_forest.py::test_min_weight_fraction_leaf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CC @Alexandr-Solovev - this test in particular is very straighforward and not expected to fail, yet it does here.
Description
This PR refactors the Ensemble algorithms (RandomForestRegressor, RandomForestClassifier, ExtraTreesRegressor and ExtraTreesClassifier) to follow repository standards and add array API support. This reduced the code by 500+ lines and required the following changes:
BaseEstimatorinheritance from onedal ensemble estimators__init__signatures to remove sklearn conformant kwargs in onedal ensemble estimatorsrandom_stateuse from onedal estimatorsclass_countkwarg tofitas calculating it in python is scikit-learn conformance (oneDAL expects it a priori)oneDALfor use by Classifiers and Regressors_create_modelfunctionpredictmethodForestRegressorandForestClasssifierobjects to minimize maintenancemax_samplestoobservations_per_tree_fractionto follow oneDAL values_save_attributesmethod to be specific to Classifiers vs Regressors_onedal_fit_ready,_onedal_cpu_supportedand_onedal_gpu_supportedto reduce code duplication via inheritance and make array API enabledenable_array_apidecorators to public-facing estimators_check_parametersfunction behindsklearn_check_versionfor future removalmin_impurity_splitwhich was removed in sklearn 0.25_validate_y_class_weightmethod designed specifically for sklearnex estimators (missing some functionality which is irrelevant to the sklearnex estimator)check_n_featuresfromsklearnex.utils.validationas it is no longer necessarysample_weightchecks for sparsity (blocked by_check_sample_weight)PR should start as a draft, then move to ready for review state after CI is passed and all applicable checkboxes are closed.
This approach ensures that reviewers don't spend extra time asking for regular requirements.
You can remove a checkbox as not applicable only if it doesn't relate to this PR in any way.
For example, PR with docs update doesn't require checkboxes for performance while PR with any change in actual code should have checkboxes and justify how this code change is expected to affect performance (or justification should be self-evident).
Checklist to comply with before moving PR from draft:
PR completeness and readability
Testing