[MNT] Fix warnings in widget tests#3502
Conversation
d14f153 to
d52122c
Compare
Codecov Report
@@ Coverage Diff @@
## master #3502 +/- ##
==========================================
+ Coverage 83.67% 83.68% +<.01%
==========================================
Files 370 370
Lines 66176 66293 +117
==========================================
+ Hits 55372 55476 +104
- Misses 10804 10817 +13 |
Codecov Report
@@ Coverage Diff @@
## master #3502 +/- ##
==========================================
- Coverage 83.59% 83.55% -0.05%
==========================================
Files 367 367
Lines 65570 65751 +181
==========================================
+ Hits 54814 54935 +121
- Misses 10756 10816 +60 |
515bdc6 to
399bc33
Compare
612c6a8 to
c5de535
Compare
| def test_report_widgets_evaluate(self): | ||
| rep = OWReport.get_instance() | ||
| data = Table("zoo") | ||
| data = Table("titanic") |
There was a problem hiding this comment.
This is changed to avoid warning about <4 instances in each fold of CV.
There was a problem hiding this comment.
zoo has a class value that appears only 4 times. If CV is run with k=3, everything should be fine.
Running CV with less folds would probably be good practice even when not needed, since it is just as good in almost all tests and 3x faster.
Changing to titanic instead is probably quite a slowdown since it has 20x as many instances.
(not sure this makes any difference here)
c5de535 to
9c4b96c
Compare
c85d987 to
26d3022
Compare
| def test_report_widgets_evaluate(self): | ||
| rep = OWReport.get_instance() | ||
| data = Table("zoo") | ||
| data = Table("titanic") |
There was a problem hiding this comment.
zoo has a class value that appears only 4 times. If CV is run with k=3, everything should be fine.
Running CV with less folds would probably be good practice even when not needed, since it is just as good in almost all tests and 3x faster.
Changing to titanic instead is probably quite a slowdown since it has 20x as many instances.
(not sure this makes any difference here)
19a4816 to
583e87a
Compare
| is_enabled = self.data is not None and \ | ||
| not self.data.is_sparse() and \ | ||
| len(self.xy_model) > 2 and len(self.data[self.valid_data]) > 1 \ | ||
| and np.all(np.nan_to_num(np.nanstd(self.data.X, 0)) != 0) |
There was a problem hiding this comment.
At this point, I am starting to dislike filterwarnings more and more.
It may be the simplest solution and acceptable in some tests, but in actual code I think in most cases it points to some problems that we should either fix or explicitly check for and not trigger warnings. Filtering them out just makes it too easy to end up with bad code and make it harder to notice problems.
This line specifically is really interesting to take a closer look at.
Why would we trigger warnings in nanstd when the data is all-nan, just to change those nans to zeros and check that there are no zeros (together with checking that no std is zero). Instead of adding 4 lines to filter these warnings (in the whole function), we can add 1 line with an explicit check that no column should be all-nan before computing std and make it easier to read too.
And then looking at that line, we ask ourselves - why should a single feature with std=0 or all-nan prevent vizrank to be used at all? It would still be useful to score other, normal features.
Also, the check tests the whole X, which is now wrong for 2 reasons: meta features have been added to the list and should be treated as other features; categorical vars are no longer used in scatter plot and should be excluded from this check
Conclusion: I think we should remove this change and fix this in a separate PR
There was a problem hiding this comment.
You're right, I removed this from the commit.
| (self.observed - self.expected) / np.sqrt(self.expected) | ||
| with np.errstate(divide="ignore", invalid="ignore"): | ||
| self.residuals = \ | ||
| (self.observed - self.expected) / np.sqrt(self.expected) |
There was a problem hiding this comment.
Can we not divide by zero instead of ignoring warnings?
There was a problem hiding this comment.
No, it's a vector. I could create a mask, but it would be annoying. It's simpler this way.
…t; replace zoo with titanic to ensure enough instances in each class
…for hiearachical clustering
…eural networks and sgd
583e87a to
5f5847a
Compare
5f5847a to
23d674d
Compare
Orange/statistics/util.py
Outdated
| def _nan_min_max(x, func, axis=0): | ||
| if not sp.issparse(x): | ||
| return func(x, axis=axis) | ||
| with warnings.catch_warnings(): |
There was a problem hiding this comment.
This currently still changes the behaviour of nanmin/nanmax compared to numpy's version.
I thought we agreed that for numpy arrays our functions should be equivalent to np.nanmin/max
Issue
Logs are polluted with warnings. Some are relevant and are fixed, and some are expected by tests and should be silenced. Logs are also printed out instead of being tested.
Description of changes
Includes