Add NearestNeighbors SPMD API #2557

ethanglaser · 2025-06-17T19:16:07Z

Description

Adds spmd NearestNeighbors to API with necessary modifications to sklearnex/spmd and onedal/spmd, along with minor revisions in onedal4py and onedal itself (uxlfoundation/oneDAL#3262 - which is prerequisite for merging this). Test also added for validation.

Full list of changes:

Support raw inputs for kneighbors function
Remove weights from NearestNeighbors class (sklearn does not support this nor does it logically make sense)
Add NearestNeighbors to API for sklearnex and onedal spmd modules
Enable spmd usage of kneighbors in all knn classes (added storage of queue from fit to use if X is None in kneighbors())
Revert incorrect usage of _assert_unordered_allclose for _spmd_assert_allclose in neighbors comparisons in spmd test
Add gold and synthetic tests for spmd NearestNeighbors, including large test that revealed sycl event issue in oneDAL that has been addressed in Support spmd knn search oneDAL#3262
Added test scope for kneighbors with X=None (this would have failed previously)

PR completeness and readability

I have reviewed my changes thoroughly before submitting this pull request.
I have commented my code, particularly in hard-to-understand areas.
I have updated the documentation to reflect the changes or created a separate PR with update and provided its number in the description, if necessary.
Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
I have added a respective label(s) to PR if I have a permission for that.
I have resolved any merge conflicts that might occur with the base branch.

Testing

I have run it locally and tested the changes extensively.
All CI jobs are green or I have provided justification why they aren't.
I have extended testing suite if new functionality was introduced in this PR.

Performance

I have measured performance for affected algorithms using scikit-learn_bench and provided at least summary table with measured data, if performance change is expected.
I have provided justification why performance has changed or why changes are not expected.
I have provided justification why quality metrics have changed or why changes are not expected.
I have extended benchmarking suite and provided corresponding scikit-learn_bench PR if new measurable functionality was introduced in this PR.

codecov · 2025-06-17T20:03:43Z

Codecov Report

❌ Patch coverage is 50.00000% with 12 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
onedal/spmd/neighbors/neighbors.py	35.29%	11 Missing ⚠️
onedal/neighbors/neighbors.py	80.00%	0 Missing and 1 partial ⚠️

Flag	Coverage Δ
azure	`80.55% <50.00%> (-0.13%)`	⬇️
github	`73.17% <80.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
onedal/spmd/neighbors/__init__.py	`100.00% <100.00%> (ø)`
onedal/neighbors/neighbors.py	`82.58% <80.00%> (-0.16%)`	⬇️
onedal/spmd/neighbors/neighbors.py	`54.71% <35.29%> (-9.18%)`	⬇️

... and 41 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ethanglaser · 2025-06-17T22:23:08Z

Combined CI: http://intel-ci.intel.com/f04c5cd4-6c88-f193-9c31-a4bf010d0e2d

sklearnex/spmd/neighbors/tests/test_neighbors_spmd.py

ethanglaser · 2025-09-26T00:31:30Z

CI (with uxlfoundation/oneDAL#3262): http://intel-ci.intel.com/f09afe5b-dc4b-f1cf-81b6-d4f5ef20c6a0

sklearnex/spmd/neighbors/tests/test_neighbors_spmd.py

icfaust

Due to time constraints I have tried to push most of my changes into follow-on PRs. There are some nagging issues we should talk through, but I see no reason to hold off getting this in.

onedal/neighbors/neighbors.py

onedal/spmd/neighbors/neighbors.py

icfaust · 2025-09-30T08:50:16Z

sklearnex/spmd/neighbors/tests/test_neighbors_spmd.py

+    )
+
+    # Run each estimator without an input to kneighbors() and ensure functionality and equivalence
+    for CurrentEstimator in [KNeighborsClassifier, KNeighborsRegressor, NearestNeighbors]:


I see why this was done, but is a bit painful to analyze if there is a failure. Ideally it would be parametrized over, but really isn't possible by the way it is imported. Would be worth adding some sort of message to figure out which is the CurrentEstimator (rather than having to dig through the pytest log for the CurrentEstimator current value was).

Yeah - I am pretty open to ideas on this one. The loop is great because I run the exact same test on all 3 classes, but you are correct that analysis on a fail is trickier. I think scikit-learn may do things like this, I could check how they do it.

I guess its easier there because in sklearn they import at top of file

icfaust · 2025-09-30T08:50:52Z

sklearnex/spmd/neighbors/tests/test_neighbors_spmd.py

+    spmd_dists, spmd_indcs = spmd_model.kneighbors(local_dpt_X_train)
+    batch_dists, batch_indcs = batch_model.kneighbors(X_train)
+
+    tol = 0.005 if dtype == np.float32 else 1e-6


Yikes on this float32 setting. Any info on it? Especially because there is a skip associated with it above (meaning an even worse value occurs?)

It's true, and good observation. It's pretty tricky because this assert all close functionality will fail even if a single element is not within the threshold, hence why it is so loose - it would be nice if there was some sort of customization of that.

It's possible that we could still run the indices check for this case, but distances are more fragile.

This is not the only place in spmd test scope where drastically low thresholds are needed to support float32 tests passing though

ethanglaser · 2025-09-30T15:06:54Z

Final CI: http://intel-ci.intel.com/f09e0f06-8f7b-f1e2-8c25-a4bf010d0e2d

ethanglaser added 3 commits June 17, 2025 12:12

Add NearestNeighbors SPMD API

7213c80

black format

7aea5a6

Merge branch 'main' into dev/eglaser-knn-spmd-search

fa48719

ethanglaser mentioned this pull request Jun 17, 2025

Support spmd knn search uxlfoundation/oneDAL#3262

Merged

9 tasks

ethanglaser added enhancement New feature or request distributed labels Jun 17, 2025

ethanglaser commented Jun 25, 2025

View reviewed changes

sklearnex/spmd/neighbors/tests/test_neighbors_spmd.py Outdated Show resolved Hide resolved

ethanglaser added 2 commits June 25, 2025 15:10

extend gold data to have multiple rows per rank

3765c6c

Merge branch 'uxlfoundation:main' into dev/eglaser-knn-spmd-search

b3c66af

ethanglaser commented Jun 25, 2025

View reviewed changes

sklearnex/spmd/neighbors/tests/test_neighbors_spmd.py Outdated Show resolved Hide resolved

ethanglaser and others added 8 commits June 25, 2025 15:15

formatting

ced4aca

Merge branch 'uxlfoundation:main' into dev/eglaser-knn-spmd-search

72fc707

Merge branch 'uxlfoundation:main' into dev/eglaser-knn-spmd-search

ca9408b

Merge branch 'uxlfoundation:main' into dev/eglaser-knn-spmd-search

e8c1ed9

Merge branch 'uxlfoundation:main' into dev/eglaser-knn-spmd-search

eca8bff

Merge branch 'main' into dev/eglaser-knn-spmd-search

47a7d93

raw inputs support for kneighbors

a56ee49

Merge branch 'uxlfoundation:main' into dev/eglaser-knn-spmd-search

544cca3

ethanglaser commented Sep 11, 2025

View reviewed changes

sklearnex/spmd/neighbors/tests/test_neighbors_spmd.py Show resolved Hide resolved

ethanglaser added 2 commits September 11, 2025 11:15

Reduce rows of synthetic large test

8532fe3

Merge branch 'uxlfoundation:main' into dev/eglaser-knn-spmd-search

b9eb2df

ethanglaser marked this pull request as ready for review September 12, 2025 14:01

ethanglaser requested review from Alexsandruss, Vika-F, david-cortes-intel and icfaust as code owners September 12, 2025 14:01

ethanglaser and others added 2 commits September 17, 2025 16:29

Merge branch 'uxlfoundation:main' into dev/eglaser-knn-spmd-search

ae7ade0

update search size and only use _spmd_assert_allclose

19cde34

support empty kneighbors()

ffba570

ethanglaser commented Sep 26, 2025

View reviewed changes

sklearnex/spmd/neighbors/tests/test_neighbors_spmd.py Outdated Show resolved Hide resolved

Update sklearnex/spmd/neighbors/tests/test_neighbors_spmd.py

0b7777e

icfaust approved these changes Sep 30, 2025

View reviewed changes

ethanglaser and others added 2 commits September 30, 2025 07:37

Merge branch 'uxlfoundation:main' into dev/eglaser-knn-spmd-search

5faaa8e

address comments

466e195

ethanglaser merged commit bae4afb into uxlfoundation:main Sep 30, 2025
30 checks passed

Add NearestNeighbors SPMD API #2557

Add NearestNeighbors SPMD API #2557

Uh oh!

Conversation

ethanglaser commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

codecov bot commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ethanglaser commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ethanglaser commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

icfaust left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

icfaust Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

ethanglaser Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

ethanglaser Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

icfaust Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

ethanglaser Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

ethanglaser Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

ethanglaser commented Sep 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ethanglaser commented Jun 17, 2025 •

edited

Loading

codecov bot commented Jun 17, 2025 •

edited

Loading

ethanglaser commented Jun 17, 2025 •

edited

Loading

ethanglaser commented Sep 26, 2025 •

edited

Loading