[FIX] Edit Domain (and perhaps other widgets) could cause missing data later in the workflow#4922
Conversation
02f4c70 to
1b33b03
Compare
1b33b03 to
6c1c324
Compare
Codecov Report
@@ Coverage Diff @@
## master #4922 +/- ##
==========================================
+ Coverage 84.12% 84.14% +0.02%
==========================================
Files 283 278 -5
Lines 57819 57045 -774
==========================================
- Hits 48640 48001 -639
+ Misses 9179 9044 -135 |
Orange/preprocess/transformation.py
Outdated
|
|
||
| return super().__eq__(other) \ | ||
| and len(self.lookup_table) == len(other.lookup_table) \ | ||
| and all(nan_equal(x, y) or np.isnan(x) and np.isnan(y) |
There was a problem hiding this comment.
nan_equal already handles nans so the second part after or is probably a leftover from before nan_equal was introduced?
| @@ -156,3 +160,17 @@ def transform(self, column): | |||
| column[mask] = 0 | |||
| values = self.lookup_table[column] | |||
There was a problem hiding this comment.
Noticed this in passing - the docs say self.lookup_table can also be a list or tuple, but this looks like it will not work for native python types (indexing with a np.ndarray)
Orange/preprocess/transformation.py
Outdated
| and len(self.lookup_table) == len(other.lookup_table) \ | ||
| and all(nan_equal(x, y) or np.isnan(x) and np.isnan(y) | ||
| for x, y in zip(self.lookup_table, other.lookup_table)) \ | ||
| and nan_equal(self.unknown, other.unknown) |
There was a problem hiding this comment.
I was surprised that we do not have a helper function somewhere in utils already for checking if contents of two iterables are the same that handles nans. Looks like something we would need in several places (maybe we just solve it locally every time :)).
Anyway, how about:
a = np.array(self.lookup_table)
b = np.array(other.lookup_table)
((a == b) | (np.isnan(a) & np.isnan(b))).all()
If the observation holds that the docs are incorrect and lookup_table already has to be a numpy array, then only the final line is needed.
There was a problem hiding this comment.
I changed the docstring: lookup_table must obviously be an array. As for checking, I went for np.allclose(..., ..., equal_nan=True).
| def __eq__(self, other): | ||
| return type(other) is type(self) and self.variable == other.variable |
There was a problem hiding this comment.
How about the default __eq__ for Transformation along these lines:
return type(other) is type(self) and vars(self) == vars(other)Seems that that is usually what we want and it would make a lot of trivial overloads of eq unnecessary?
__hash__ is a bit more tricky since class attributes can often be unhashable objects. We could leave it as it is or try hashing a sorted tuple of vars(self). The latter should work for simple cases like Indicator and at least fails more noticably if it should have been overloaded in a subclass but was not!
There was a problem hiding this comment.
I'm hesitant here. It looks like a good idea, but making the parent class to smart might shoot a derived class in the foot. Somebody could add an attribute without realizing it's used in comparisons. Maybe it's better to be explicit.
There was a problem hiding this comment.
You convinced me at first, but then again, you can make an equivalent claim for the other side:
Somebody could add an attribute without realizing it's NOT used in comparisons.
Maybe I preferred comparing everything (vs the minimum) because it behaves closer to how it was before - if some class doesn't overload eq it might erroneously return False (like up to now), but a True should be correct.
In the end it does not really matter, if something is not working like it should it's a bug to be fixed in either case.
I don't mind merging it with either default.
e58036f to
ba3e1c4
Compare
Issue
Fixes #4895.
Equality of variables also checks equality of their
compute_value. This can only work ifTransformationand derived classes have__eq__and__hash__.To fix #4895, it would suffice to change
oweditdomain.LookupMappingTransform. The PR does it for other classes too, to prevent similar future mishaps.Description of changes
Define
__eq__and__hash__where needed.Any add-ons that don't have will now misbehave in a different way -- before this,
__eq__always returnedFalse, even when transforms were the same. Now it will sometimes returnTrueeven though they may be different. This will happen if the class defines additional fields that are not checked by the base class.The base class can neither check children's extra attributes, neither can it raise an exception (e.g. in a meta class), because many derived classes (even some that define extra attributes) do not need
__eq__.Includes