Added the boxplot outlier#10187
Added the boxplot outlier#10187Emmanuel-Afrifa wants to merge 9 commits intoIDEMSInternational:masterfrom
Conversation
Summary of ChangesHello @Emmanuel-Afrifa, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the Boxplot dialog by integrating a new feature that allows users to identify and label outliers directly on their boxplots. The changes encompass both front-end additions, such as new input controls for specifying outlier variables, and robust back-end logic to dynamically construct and execute R code for outlier detection and visualization using popular R packages like Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces the functionality to display boxplot outliers, addressing issue #10126, through new UI controls, R functions, and updated R code generation. Critically, it introduces multiple R injection vulnerabilities by using unquoted user-controlled column names to construct R commands, potentially allowing arbitrary code execution. All column names in R scripts must be properly quoted with backticks. Furthermore, some commented-out code blocks should be addressed for better maintainability.
|
@Emmanuel-Afrifa can you paste the whole R code when you get an error in the future? Just makes it a tiny bit easier to look into. I'm having issues replicating this branch to look into it but that's probably just an internet thing my end. I can try again tomorrow |
|
@lilyclements Sorry about that. Kindly find the code below # Dialog: Boxplot Options
guinea_two_stations <- data_book$get_data_frame(data_name="guinea_two_stations")
outliers <- guinea_two_stations %>% dplyr::group_by(station) %>% dplyr::mutate(is_out=rain %in% grDevices::boxplot.stats(rain, coef=1.5)$out) %>% dplyr::filter(is_out) %>% dplyr::ungroup() %>% dplyr::mutate(station=instatExtras::make_factor(station))
guinea_two_stations <- data_book$get_data_frame(data_name="guinea_two_stations")
last_graph <- ggplot2::ggplot(guinea_two_stations, mapping=ggplot2::aes(y=rain, x=year, fill=station)) + ggplot2::geom_boxplot(outlier.colour="red") + theme_grey() + ggplot2::geom_text(data=outliers, ggplot2::aes(), hjust=-0.2, position=ggplot2::position_nudge(x=0.05), size=3)
data_book$add_object(data_name="guinea_two_stations", object_name="last_graph", object_type_label="graph", object_format="image", object=instatExtras::check_graph(graph_object=last_graph))
data_book$get_object_data(data_name="guinea_two_stations", object_name="last_graph", as_file=TRUE)
rm(list=c("last_graph", "guinea_two_stations", "outliers")) |
|
So your error says the issue is in Your are running: But in the Climatic Outliers dialog, when I have label as So you're just missing something from your (This looks like it is your error as your error says it requires the |
|
@lilyclements Please, I've fixed it. Thank you very much |
|
@Emmanuel-Afrifa great! Is this then ready for review? |
|
@lilyclements Yes, but one of my tests failed. It had something to do with the packages |
|
@Emmanuel-Afrifa don't worry about the pkg failure, I've noticed this on a few PRs. I've just messaged @ChrisMarsh82 in case he knows anything about this |
lilyclements
left a comment
There was a problem hiding this comment.
@Emmanuel-Afrifa very good and impressive! And very quick. Took me a lot of tries to get a bug, and I only managed to find a tiny one on the TestOK which wasn't even introduced by you. But, while you're on this PR, the ucrNudOutliers can be empty and OK is enabled --- this shouldn't be the case. Can you set that if we're under the "Boxplot" option and that ucrNud is empty then OK is disabled?
|
@lilyclements Please, I've updated the code. Please, can you check to see if everything is okay now? Thank you. |
lilyclements
left a comment
There was a problem hiding this comment.
@Emmanuel-Afrifa nice! @MeSophie can you review this as well? I Have a feeling you were somewhat involved in this, but I might be wrong!
|
@Emmanuel-Afrifa Good job! The dialogue is working well even when you change the data and options. The base buttons are also working well. Just a Small design change. Please can you fix this small layout issue with Single/Multiple Variables? Thank you. |
|
@Emmanuel-Afrifa I Fixed the |
|
@MeSophie Well noted. I'll work on it. Thank you. |
|
@Emmanuel-Afrifa how is this coming along? |
|
@lilyclements Oh, sorry... It skipped me. I thought I had already worked on it. I'll do that now. Thank you. |
|
@lilyclements @MeSophie Please, can you check if it's okay now? |
|
@MeSophie Sorry, I've not have not updated this branch with the changes you made to the receivers, yet... |
|
@MeSophie can you check the changes you suggested on the translations is OK now? (I don't know how to check this!) |
|
@MeSophie Please, how did you go about resizing the |
|
@Emmanuel-Afrifa that's a very nice addition. The only minor detail is that the Single Variable/Multiple Variable labels have been moved to the side.
|
@Emmanuel-Afrifa The easy way to fix it is to copy this |
|
@MeSophie Thank you very much, but still copying it from another dialog didn't work, so I manually set the size of the various receivers (single and multiple receiver) and the button in the code. I hope that's fine. @rdstern @lilyclements Please, can you review that all other changes are okay? This is how it looks on my end now
|
There was a problem hiding this comment.
@Emmanuel-Afrifa I almost just approved, but now I find another teeny problem. Can you see it?
Ok (and To Script) should be enabled when you even just have the Y variable.
So, your new "quiz question" is how did I do the dialog below, where it is enabled?
Answer, if I put in just the Y variable it is not enabled. But if I press the Outlier coefficent then it becomes enabled! It would be nice if it were simpler than that, and perhaps there is a problem with the current code for this to be the case?
|
@rdstern Please, I've made some changes |
rdstern
left a comment
There was a problem hiding this comment.
@Emmanuel-Afrifa it looks great now. I am approving
@Patowhiz please could you check, and hopefully merge. Maybe even make small improvements so you can merge, if needed.
Patowhiz
left a comment
There was a problem hiding this comment.
@rdstern have you tested the other options in the dialog? In particular, the Jitter and Violin plot. Would be good to confirm that the dialog size is according to the way you expect.
I have approved this. Once you confirm the above, I'll go ahead and merge.
|
@Emmanuel-Afrifa after @Patowhiz comments I have checked again. Here is my first example with the usual survey data. Note I reduced the outlier coefficient to 1, to get a few more outliers. Now I include the field number and don't get any label. What am I doing wrong?
|
|
@rdstern Well noted. I'll need a bit of @lilyclements's help to resolve this, though. @lilyclements, I believe the issue @rdstern reported may be coming from the snippet below, specifically the computation of the outliers, where the But this is also because, as you pointed out in the original issue, the variable being fed to the
# Dialog: Boxplot
survey <- data_book$get_data_frame(data_name="survey")
outliers <- survey %>% dplyr::group_by() %>% dplyr::mutate(is_out=yield %in% grDevices::boxplot.stats(yield, coef=1.0)$out) %>% dplyr::filter(is_out) %>% dplyr::ungroup() %>% dplyr::mutate()
survey <- data_book$get_data_frame(data_name="survey")
last_graph <- ggplot2::ggplot(survey, mapping=ggplot2::aes(y=yield, x=village)) + ggplot2::geom_boxplot(coef=1.0, outlier.colour="red") + theme_grey() + ggplot2::geom_text(data=outliers, ggplot2::aes(label=field), hjust=-0.2, position=ggplot2::position_nudge(x=0.05), size=3)
data_book$add_object(data_name="survey", object_name="last_graph", object_type_label="graph", object_format="image", object=instatExtras::check_graph(graph_object=last_graph))
data_book$get_object_data(data_name="survey", object_name="last_graph", as_file=TRUE)
rm(list=c("last_graph", "survey", "outliers")) |
|
@Emmanuel-Afrifa in your example there we want to What I am not understanding is why it is giving different answers if I do two factors. I can investigate this, but have a meeting now so didn't know if you'd instead want to dig into it! e.g., run this in R, and you can see that |
|
@Emmanuel-Afrifa interestingly, even running the usual code doesn't give the outlier for that point! So, I think for now we can just say that in the
|
|
@lilyclements So, if I get you correctly, instead of the |
|
@Emmanuel-Afrifa the group_by should take the variables in the two factor receivers (Factor and Second Factor) - if only one is filled, then it takes only one of them. |
|
@lilyclements Please, I've updated the code for the |
|
@Emmanuel-Afrifa great! @rdstern over to you to test |









Fixes #10126
@lilyclements @rdstern @berylwaswa This PR adds the boxplot outlier to the
Boxplotdialog.@lilyclements When I tried testing it with the same data I used to test the

Climatic Boxplotdialog, I had the error message below. Please, could you help clarify what might be the issue? Thank youDeveloper Testing Checklist