[ENH] Box plot: Add 'order by importance' checkbox to groups#4055
Merged
VesnaT merged 1 commit intobiolab:masterfrom Nov 15, 2019
Merged
[ENH] Box plot: Add 'order by importance' checkbox to groups#4055VesnaT merged 1 commit intobiolab:masterfrom
VesnaT merged 1 commit intobiolab:masterfrom
Conversation
3241ee5 to
a674fc3
Compare
Codecov Report
@@ Coverage Diff @@
## master #4055 +/- ##
==========================================
+ Coverage 85.94% 85.94% +<.01%
==========================================
Files 393 393
Lines 70033 70090 +57
==========================================
+ Hits 60187 60240 +53
- Misses 9846 9850 +4 |
Contributor
Author
|
Gosh, I forgot what we decided in the end, so please forget me if I'm wrong.
|
Contributor
Author
|
@BlazZupan and @lanzagar, please confirm functionality. I'll write/fix tests afterwards. |
Contributor
Author
|
Comment by @BlazZupan: when stretching bars makes no sense (when there are no groups or when the grouping variable is the same as the variable shown), bars should not be strectched. This should also disable the checkbox. This does notbelong to this PR and is implemented in #4176. |
2 tasks
5330135 to
537e5f4
Compare
0895222 to
5c985b2
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Box plot shows distribution of some attribute and, if grouping is enabled, how the distribution of this attribute varies accross groups. It thus conveys information about conditional probability of the target variable given the value of the grouping variable.
If "Order by importance" is checked, the widget computes the chi square or ANOVA between all target variables and the currently selected group. This may help the user answer the question "If I divide the data into such and such groups, which is the attribute by which these groups differ most".
However, since the widget shows the conditional probability of the target given the grouping variable, it might make more sense to sort grouping variables. This will allow the user to set the target (typically the outcome, the class) and see which (grouping) attribute is the most informative about this class. This is currently possible by setting the outcome as the grouping variable and sorting the variables whose distributions we're observing, but in this case the widget shows the wrong conditional probabilities.
Both ways make some sense, but I believe that sorting group variables makes more sense. A circumstantial evidence for the latter is also that if we sort by variables, we compute a mixture of chi-square and ANOVA p-values and sort them. This is not wrong, they should be commensurable. If we sort by groups, we compute either chi-square (if variable is discrete) or ANOVA (if it's numeric) for all groups (because all groups are always discrete).
I changed the widget so that it can be tried out, but haven't thoroughly checked the code yet. I would appreciate some comments before jumping into a change that we might decide to revert the day after tomorrow.
@BlazZupan?