[ENH] Nomogram: Support for sparse data#2197
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2197 +/- ##
==========================================
+ Coverage 67.64% 67.67% +0.03%
==========================================
Files 319 319
Lines 54871 54926 +55
==========================================
+ Hits 37119 37173 +54
- Misses 17752 17753 +1Continue to review full report at Codecov.
|
df7eb56 to
f4fb3c1
Compare
Orange/statistics/util.py
Outdated
| return np.prod(x.shape) != x.data.size | ||
|
|
||
|
|
||
| def _nan_min_max(x, axis=0, func=None): |
There was a problem hiding this comment.
None is not callable (L245)...
The default could be one of min/max or this could be a required parameter.
Orange/statistics/util.py
Outdated
| if axis == 0: | ||
| x = x.T | ||
|
|
||
| # TODO check & transform to correct format |
There was a problem hiding this comment.
In what (incorrect) format is it now?
There was a problem hiding this comment.
X is usually in csr and hence when one calls this with axis=0 x becomes csc (due to transposing), which is isn't efficient for row slicing.
Orange/statistics/util.py
Outdated
| if n_nans: | ||
| return float('nan') | ||
| else: | ||
| n_values = np.prod(x.shape) - n_nans |
There was a problem hiding this comment.
Why - n_nans ? Isn't it 0 in this else part.
Orange/statistics/util.py
Outdated
| return np.unique(x, return_counts=return_counts) | ||
| else: | ||
| n_zeros = np.prod(x.shape) - x.data.size | ||
| r = np.unique(x.data, return_counts=return_counts) |
There was a problem hiding this comment.
x.data can contain explicit zeros right? E.g. make a csr matrix and set a non-zero element to 0.
In this case you need to be careful about inserting another 0 below...
| """ Equivalent of np.unique that supports sparse or dense matrices. """ | ||
| if not sp.issparse(x): | ||
| return np.unique(x, return_counts=return_counts) | ||
| else: |
There was a problem hiding this comment.
else is unnecessary here and in other functions, which first check if x is not sparse and return something.
It just adds an extra indentation to all of the actual function body.
Orange/statistics/util.py
Outdated
|
|
||
| def _sparse_has_zeros(x): | ||
| """ Check if sparse matrix contains any implicit zeros. """ | ||
| return np.prod(x.shape) != x.data.size |
There was a problem hiding this comment.
It is probably better to use x.nnz instead of x.data.size everywhere.
Looks like the spmatrix base class has nnz so every type should have it, while e.g. dok_matrix does not have .data
There was a problem hiding this comment.
Corrected. Though, methods still won't work for for dox_matrix since we rely on x.data elsewhere.
Compute values usually have a reference to the original variable so SharedComputeValue should have it too.
|
@lanzagar I think all issues are addressed now. Please, check again. |
2deeeab to
47b4187
Compare
Issue
Fixes #2165.
Description of changes
nanmin,nanmax,average,uniqueequivalents of numpy's that support sparse or dense matrices.reconstruct_domainmethod: 3.0s -> 0.03 scalculate_log_reg_coefficientsmethod: TLDW (minutes+) -> 1.5 sIncludes