[Enhancement] enable array API support in `EmpiricialCovariance` and `IncrementalEmpiricalCovariance` #2207

icfaust · 2024-12-03T14:04:44Z

Description

Enables array API zero copy dispatching for EmpiricalCovariance and IncrementalEmpiricalCovariance, this required the following changes:

Added array API-enabled log_likelihood and pinvh functions (as they are unavailable in sklearn) to sklearnex.utils._array_api
fix sklearnex.preview.covariance.EmpiricalCovariance.mahalanobis.__doc__ which was a bug
Implement array API enabled mahalanobis, score, and error_norm methods. Likely to change due to the nature of how we support dpnp and dpctl.
Added array API enabling from [enhancement] simplify array_api enabling tags via wrapper #2566
Added check_is_fitted (which is even missing from sklearn)
Added get_namespace and swapped namespace support away from numpy
Create a get_precision function to use internal pinvh. This is important for array API conformance, where attributes will no longer only be numpy arrays
moved sklearn conformance of return values to sklearnex estimators instead of onedal
general fixes for array_api_strict inputs (ellipsis use)
Move validate_params before fit's dispatch (will be set as a design rule in a follow-up PR
Deal with an issue with pairwise_distances kwargs and support_input_format, which do not interact well. A follow up development ticket for fixing this issue will be made
set proper array API conformance for return types of oneDAL tables using return_type_constructor in IncrementalEmpiricalCovariance

PR should start as a draft, then move to ready for review state after CI is passed and all applicable checkboxes are closed.
This approach ensures that reviewers don't spend extra time asking for regular requirements.

You can remove a checkbox as not applicable only if it doesn't relate to this PR in any way.
For example, PR with docs update doesn't require checkboxes for performance while PR with any change in actual code should have checkboxes and justify how this code change is expected to affect performance (or justification should be self-evident).

Checklist to comply with before moving PR from draft:

PR completeness and readability

I have reviewed my changes thoroughly before submitting this pull request.
I have commented my code, particularly in hard-to-understand areas.
I have updated the documentation to reflect the changes or created a separate PR with update and provided its number in the description, if necessary.
Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
I have added a respective label(s) to PR if I have a permission for that.
I have resolved any merge conflicts that might occur with the base branch.

Testing

I have run it locally and tested the changes extensively.
All CI jobs are green or I have provided justification why they aren't.
I have extended testing suite if new functionality was introduced in this PR.

Performance

I have measured performance for affected algorithms using scikit-learn_bench and provided at least summary table with measured data, if performance change is expected.
I have provided justification why performance has changed or why changes are not expected.
I have provided justification why quality metrics have changed or why changes are not expected.
I have extended benchmarking suite and provided corresponding scikit-learn_bench PR if new measurable functionality was introduced in this PR.

icfaust · 2024-12-04T06:59:25Z

/intelci: run

icfaust · 2024-12-04T07:56:36Z

/intelci: run

icfaust · 2024-12-04T09:39:21Z

This PR is now dependent on the developments in #2096 (SPMD testing requires array_api bypassing on oneDAL offloading)

icfaust · 2025-08-08T22:15:49Z

/intelci: run

icfaust · 2025-08-10T23:02:45Z

/intelci: run

icfaust · 2025-08-11T12:28:40Z

/intelci: run

icfaust · 2025-08-11T18:17:53Z

/intelci: run

icfaust · 2025-08-11T20:13:53Z

/intelci: run

icfaust · 2025-08-11T22:37:14Z

/intelci: run

sklearnex/covariance/incremental_covariance.py

david-cortes-intel · 2025-08-12T15:39:14Z

sklearnex/covariance/tests/test_incremental_covariance.py

+    with config_context(array_api_dispatch=True):
+        est.fit(X_df)
+
+    with pytest.raises(TypeError, match="Multiple namespaces for array inputs: .*"):


Would this work correctly if put under a config context with array_api_dispatch=True?

I used sklearn's Ridge + numpy and torch as an example of what to expect: (https://scikit-learn.org/stable/modules/array_api.html#input-and-output-array-type-handling)

When attempting to use any non-numpy input after fitting with array_api_dispatch=True will lead to some sort of error associated with the fitted framework, as get_namespace and validate_data will default force data to numpy (https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/utils/_array_api.py#L394) and then comparing numpy to the array api framework will fail.

If we were to use array_api_dispatch=True throughout, it will error at this point in an external package if get_namespace is used: https://github.com/data-apis/array-api-compat/blob/main/array_api_compat/common/_helpers.py#L665

Isn't the idea with array API support in stock sklearn to make it work under such types of situations? (e.g. fitting on a torch array, then predicting on a different kind of array).

Actually no, I see that sklearn throws the same error.

icfaust · 2025-08-12T22:41:16Z

/intelci: run

icfaust · 2025-08-13T12:16:10Z

Private CI failure comes from an infrastructure timeout.

icfaust added 3 commits December 3, 2024 14:48

fix onedal side

af912ff

add SPMD interface

f97206c

add new assert_all_finite

6579bbd

icfaust changed the title ~~[Enhancement] WIP new finite checking in EmpericialCovariance~~ [Enhancement] WIP new finite checking in EmpiricialCovariance Dec 3, 2024

icfaust added 2 commits December 3, 2024 15:37

change incremental algo

580e697

remove old code

8961208

icfaust changed the title ~~[Enhancement] WIP new finite checking in EmpiricialCovariance~~ [Enhancement] WIP new finite checking in EmpiricialCovariance and IncrementalEmpiricalCovariance Dec 3, 2024

icfaust and others added 3 commits December 3, 2024 16:17

readd deletion

41c319b

missing?

5fc159e

Update test_covariance.py

6ed0419

fix error in spmd interface

8a84ce7

icfaust mentioned this pull request Dec 5, 2024

[enhancement] add sklearnex version of validate_data, _check_sample_weight #2177

Merged

13 tasks

icfaust mentioned this pull request Jan 9, 2025

chore: initial refactoring of incremental spmd algos #2248

Merged

9 tasks

icfaust mentioned this pull request Jan 31, 2025

[enhancement] add dlpack support to to_table #2275

Merged

13 tasks

icfaust and others added 13 commits July 11, 2025 13:00

Merge branch 'main' into new_cov

53fe3b2

fixes

83294fb

remove changes

0e99ee3

fix

101144b

fix preview

9ed20eb

fixes

203b025

forgotten import

17f3e12

remove import

b4df020

fix squeeze

c5d0e6a

Update test_covariance.py

9c572e7

Update covariance.py

bf6603b

Update incremental_covariance.py

e13e8b9

Update incremental_covariance.py

c643bbf

icfaust added 3 commits August 8, 2025 23:33

Update _array_api.py

6dc7f1b

Update covariance.py

e0f15d5

Update incremental_covariance.py

5a7751a

icfaust added 3 commits August 11, 2025 00:34

Update test_incremental_covariance_spmd.py

2ea312d

Update covariance.py

f55112f

Update test_incremental_covariance_spmd.py

edd38ab

icfaust added 3 commits August 11, 2025 12:44

Update covariance.py

eafa3a9

Update incremental_covariance.py

3296e24

Update covariance.py

346aea8

Update test_covariance_spmd.py

93b0418

Update test_covariance_spmd.py

2022a4f

icfaust added 3 commits August 12, 2025 00:14

Update test_covariance_spmd.py

4bc7abd

Update test_covariance_spmd.py

7736c4f

Update test_covariance_spmd.py

4d35143

many requisite changes

140a8b6

david-cortes-intel reviewed Aug 12, 2025

View reviewed changes

icfaust and others added 4 commits August 12, 2025 20:52

Update test_covariance.py

3f15f94

force output to a 0d

20f8626

try to fix test'

f3e4af6

updates

945b9b8

david-cortes-intel approved these changes Aug 13, 2025

View reviewed changes

icfaust merged commit b450725 into uxlfoundation:main Aug 13, 2025
29 of 31 checks passed

[Enhancement] enable array API support in EmpiricialCovariance and IncrementalEmpiricalCovariance #2207

[Enhancement] enable array API support in EmpiricialCovariance and IncrementalEmpiricalCovariance #2207

Uh oh!

Conversation

icfaust commented Dec 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

icfaust commented Dec 4, 2024

Uh oh!

icfaust commented Dec 4, 2024

Uh oh!

icfaust commented Dec 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

icfaust commented Aug 8, 2025

Uh oh!

icfaust commented Aug 10, 2025

Uh oh!

icfaust commented Aug 11, 2025

Uh oh!

icfaust commented Aug 11, 2025

Uh oh!

icfaust commented Aug 11, 2025

Uh oh!

icfaust commented Aug 11, 2025

Uh oh!

Uh oh!

david-cortes-intel Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

icfaust Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

david-cortes-intel Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

david-cortes-intel Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

icfaust commented Aug 12, 2025

Uh oh!

icfaust commented Aug 13, 2025

Uh oh!

Uh oh!

Uh oh!

[Enhancement] enable array API support in `EmpiricialCovariance` and `IncrementalEmpiricalCovariance` #2207

[Enhancement] enable array API support in `EmpiricialCovariance` and `IncrementalEmpiricalCovariance` #2207

icfaust commented Dec 3, 2024 •

edited

Loading

icfaust commented Dec 4, 2024 •

edited

Loading

icfaust Aug 12, 2025 •

edited

Loading