[FIX] Concatenate: Fix wrong merging of categorical features#4425
[FIX] Concatenate: Fix wrong merging of categorical features#4425VesnaT merged 4 commits intobiolab:masterfrom
Conversation
Codecov Report
@@ Coverage Diff @@
## master #4425 +/- ##
==========================================
- Coverage 87.46% 83.15% -4.32%
==========================================
Files 405 268 -137
Lines 74135 53962 -20173
==========================================
- Hits 64841 44870 -19971
+ Misses 9294 9092 -202 |
acde018 to
1424b4f
Compare
Orange/data/variable.py
Outdated
| compute_value=compute_value, name=name, | ||
| number_of_decimals=kwargs.pop("number_of_decimals", | ||
| self.number_of_decimals), | ||
| **kwargs) |
There was a problem hiding this comment.
This enables creating combinations of self._number_of_decimals, self.adjust_decimals and self._format_str that are forbidden by number_of_decimals setter.
Speaking of self.adjust_decimals. Why do we need that?
There was a problem hiding this comment.
You're right. This should go through setter. I changed it to:
def copy(self, compute_value=None, *, name=None, **kwargs):
number_of_decimals = kwargs.pop("number_of_decimals", None)
var = super().copy(compute_value=compute_value, name=name, **kwargs)
if number_of_decimals is not None:
var.number_of_decimals = number_of_decimals
else:
var._number_of_decimals = self._number_of_decimals
var.adjust_decimals = self.adjust_decimals
var.format_str = self._format_str
return varadjust_decimals is a flag that tells whether the number of decimals was fixed (e.g. through setter) or is it being adjusted to the largest number of decimals (while reading the file). See function val_from_str_add_cont.
| raise TypeError("values of DiscreteVariables must be strings") | ||
|
|
||
| super().__init__(name, compute_value, sparse=sparse) | ||
| self.values = values |
There was a problem hiding this comment.
I'd rather make self.values a property and check self._value_index - self._value consistency in its setter. It's too easy to obtain inconsistent state with this api.
There was a problem hiding this comment.
True. But this commit is from #4422, on which this PR was based. #4422 was already merged with this code.
I agree that values should be a property and it should also be a tuple, not a list. But if I remember correctly, I already tried to change this once but ran into problems. At any rate, it's not in this PR, but if you'd like to try it, I support it.
| """ | ||
| if not isinstance(s, str): | ||
| raise TypeError("values of DiscreteVariables must be strings") | ||
| self._value_index[s] = len(self.values) |
There was a problem hiding this comment.
Since this is a public method, I'd add a check if s not in self._value_index, before inserting a new value.
There was a problem hiding this comment.
Same as above. I agree, but this is from #4422, which is already merged.
There was a problem hiding this comment.
I now implemented this in #4450.
Plus fixing some lint issues for a good measure.
Orange/widgets/data/owconcatenate.py
Outdated
| (table.domain for table in tables)) | ||
| domains = [table.domain for table in tables] | ||
| oper = set.union if self.merge_type == OWConcatenate.MergeUnion \ | ||
| else set.intersection |
There was a problem hiding this comment.
Why passing oper parameter? It can be set inside self.merge_domains.
1424b4f to
5c56b65
Compare
5c56b65 to
c9acdf2
Compare
c9acdf2 to
b310fb7
Compare
|
@VesnaT, I added a test that tries different orders of adding signals: 98b0e15#diff-e8327cf214d283cae48e36d934820db9R351. Can you pull the PR and run the test? If it passes: can you modify it to replicate what you're doing in canvas, so that it will fail? |
Issue
Fixes #4406. Also fixes part of #4382.
The problem was that when merging domains, attributes from the first domain are used and other attributes values' that don't appear in the first table are turned into nan's.
This fix applies only when "primary table" is not present. When it is present, #4422 suffices.
Description of changes
When filtering for unique attributes, two (or more) attributes with different values are replaced by a new attribute.
Includes