[FIX] Impute: sparse by jerneju · Pull Request #2357 · biolab/orange3

jerneju · 2017-05-30T17:05:48Z

Issue

Description of changes

Work in progress.

Includes

Code changes
Tests
Documentation

nikicc · 2017-05-30T19:51:17Z

@jerneju I already did some debugging about this on Friday and IMO the problem is in the file Orange/statistics/until.py, method stats which for sparse returns X.min and X.max. I think X.min and X.max don't handle missing values and return np.nan when some values are missing, while we probably should return minimum and maximum among defined values only. They should probably just be replaced with nanmin and nanmax methods (L256–272 in the same file).

jerneju · 2017-06-01T11:58:01Z

https://sentry.io/biolab/orange3/issues/284690104/

jerneju · 2017-06-01T11:58:11Z

https://sentry.io/biolab/orange3/issues/284690041/

jerneju · 2017-06-01T11:59:34Z

Well, additional issue:

nikicc · 2017-06-01T21:40:32Z

Orange/tests/test_util.py

+        """
+        x = np.array([[0], [np.nan], [9]])
+        x = sp.csr_matrix(x)
+        self.assertEqual(stats(x)[0][2], 3.)


This should be 4.5 not 3.

nikicc · 2017-06-01T21:40:36Z

Orange/statistics/util.py


-    n_values = np.prod(x.shape) - np.sum(np.isnan(x.data))
-    return np.nansum(x.data) / n_values
+    x.data = np.nan_to_num(x.data)


nan_to_num converts np.nans to zeros, which causes mean to also treat them as zeros. E.g. for the sparse array of [np.nan, np.nan, 1] this implementation returns 0.33 instead of 1.

What's wrong with the previous implementation?

nikicc · 2017-06-01T21:44:09Z

Orange/preprocess/impute.py

+        if not sp.issparse(c):
+            c = np.array(c, copy=True)
+        else:
+            c = c.copy()


Why do we need a copy? Doesn't toarray() already takes care of this?

nikicc · 2017-06-01T21:44:36Z

Orange/preprocess/impute.py

+            c = np.array(c, copy=True)
+        else:
+            c = c.copy()
+            c = c.toarray().flatten()


Should we use ravel instead that doesn't necessarily make an other copy?

codecov-io · 2017-06-02T12:27:09Z

Codecov Report

Merging #2357 into master will decrease coverage by 0.03%.
The diff coverage is 85.41%.

@@            Coverage Diff             @@
##           master    #2357      +/-   ##
==========================================
- Coverage   73.41%   73.38%   -0.04%     
==========================================
  Files         317      317              
  Lines       55653    55664      +11     
==========================================
- Hits        40859    40850       -9     
- Misses      14794    14814      +20

nikicc · 2017-06-02T12:51:09Z

Orange/statistics/util.py



-def nanmean(x):
+def nanmean(x, axis=None):


What about:

def nanmean(x, axis=None): """ Equivalent of np.nanmean that supports sparse or dense matrices. """ def nanmean_sparse(x): n_values = np.prod(x.shape) - np.sum(np.isnan(x.data)) return np.nansum(x.data) / n_values if not sp.issparse(x): return np.nanmean(x, axis=axis) if axis is None: return nanmean_sparse(x) if axis in [0, 1]: arr = x if axis == 1 else x.T return np.array([nanmean_sparse(row) for row in arr]) else: raise NotImplementedError

Well, I did some speed testing. The results are interesting and are listed below:

Ratio for axis 0 : 1.558
Ratio for axis 1 : 0.664

nikicc · 2017-06-02T12:55:13Z

Orange/preprocess/impute.py

+        if not sp.issparse(c):
+            c = np.array(c, copy=True)
+        else:
+            c = c.toarray().ravel()


What about if we take only c.data here and we would need to density the whole column? Consequently, we would need to set only c.data in L314.

nikicc added the DH2017 label Jun 1, 2017

nikicc suggested changes Jun 1, 2017

View reviewed changes

nikicc added this to the 3.4.3 milestone Jun 2, 2017

nikicc self-assigned this Jun 2, 2017

nikicc suggested changes Jun 2, 2017

View reviewed changes

jerneju added 5 commits June 2, 2017 16:30

[FIX] Impute/Stats: sparse support: mean

47de863

preprocess/impute: numpy -> np

66a5249

[FIX] Impute: sparse support: As a distinct value

54664c0

[FIX] Impute: sparse support: just error message

6053ed1

[FIX] Impute/Preprocess: sparse support: Random

cf584f5

nikicc changed the title ~~[WIP][FIX] Impute: sparse~~ [FIX] Impute: sparse Jun 2, 2017

nikicc approved these changes Jun 2, 2017

View reviewed changes

nikicc merged commit d79e46a into biolab:master Jun 2, 2017

jerneju deleted the sparse-impute branch June 5, 2017 07:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FIX] Impute: sparse#2357

[FIX] Impute: sparse#2357
nikicc merged 5 commits intobiolab:masterfrom
jerneju:sparse-impute

jerneju commented May 30, 2017 •

edited

Loading

Uh oh!

nikicc commented May 30, 2017

Uh oh!

jerneju commented Jun 1, 2017

Uh oh!

jerneju commented Jun 1, 2017

Uh oh!

jerneju commented Jun 1, 2017

Uh oh!

nikicc Jun 1, 2017

Uh oh!

nikicc Jun 1, 2017

Uh oh!

nikicc Jun 1, 2017

Uh oh!

nikicc Jun 1, 2017

Uh oh!

codecov-io commented Jun 2, 2017 •

edited

Loading

Uh oh!

nikicc Jun 2, 2017

Uh oh!

jerneju Jun 2, 2017

Uh oh!

nikicc Jun 2, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

jerneju commented May 30, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue

Description of changes

Includes

Uh oh!

nikicc commented May 30, 2017

Uh oh!

jerneju commented Jun 1, 2017

Uh oh!

jerneju commented Jun 1, 2017

Uh oh!

jerneju commented Jun 1, 2017

Uh oh!

nikicc Jun 1, 2017

Choose a reason for hiding this comment

Uh oh!

nikicc Jun 1, 2017

Choose a reason for hiding this comment

Uh oh!

nikicc Jun 1, 2017

Choose a reason for hiding this comment

Uh oh!

nikicc Jun 1, 2017

Choose a reason for hiding this comment

Uh oh!

codecov-io commented Jun 2, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

nikicc Jun 2, 2017

Choose a reason for hiding this comment

Uh oh!

jerneju Jun 2, 2017

Choose a reason for hiding this comment

Uh oh!

nikicc Jun 2, 2017

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jerneju commented May 30, 2017 •

edited

Loading

codecov-io commented Jun 2, 2017 •

edited

Loading