-
Notifications
You must be signed in to change notification settings - Fork 183
FIX: Fix processing of sparse data frames #2826
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
/intelci: run |
|
/intelci: run |
|
|
||
|
|
||
| def make_sparse_array(): | ||
| out = sp.random(50, 4, 0.5, format="csc", random_state=123) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why csc and not csr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To test that the format is converted as needed.
| assert not onedal.datatypes._data_conversion._convert_one_to_table.called | ||
|
|
||
|
|
||
| # Note that some estimators that are implemented through daal4py do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't these tests then be in daal4py?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They'd be harder to find and to connect among each other if they are all spread throughout different places.
Codecov Report❌ Patch coverage is
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 2 files with indirect coverage changes 🚀 New features to boost your workflow:
|
|
/intelci: run |
|
/azp run Nightly |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
CI failure is due to a bug in KMeans unrelated to this PR. |
Description
Many estimators check for sparse inputs using scipy's
issparse, but sklearn's data validators also take sparse data frames as sparse and convert them to COO/CSC/CSR internally, whereas sklearnex assumes that data frames are always dense.This PR fixes the sparsity checks to consider also sparse data frames, and adds tests along the way to ensure that they are processed in the right format.
Checklist:
Completeness and readability
Testing