[FIX] Fix classification trees for data with repeated feature values#6488
[FIX] Fix classification trees for data with repeated feature values#6488janezd merged 2 commits intobiolab:masterfrom
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #6488 +/- ##
=======================================
Coverage 87.66% 87.66%
=======================================
Files 321 321
Lines 69374 69374
=======================================
Hits 60817 60817
Misses 8557 8557 |
|
Hmm, the trees are built differently. When the test fails, I see the following splitting process. And then my debugging code crashed because petal_width split does not have a Furthermore, I see different scores for some other attributes (sepal length for the first and the second split). |
ce900b9 to
ac93cb2
Compare
ac93cb2 to
d1c9e14
Compare
|
During debugging, I saw that It avoided too many: it skipped if the next class value was the same or the next value was the same. That was a problem when feature and class values could both (interchangeably) repeat. |
d1c9e14 to
9ef29b7
Compare
9ef29b7 to
01a5e07
Compare
01a5e07 to
cbe2e11
Compare
Issue
I started seeing this on github tests (only on Ubuntu).
Description of changes
The bug ran deeper. See my last comment:
find_threshold_entropyskipped computing too many entropies.Includes