Skip to content

Added the boxplot outlier#10187

Open
Emmanuel-Afrifa wants to merge 9 commits intoIDEMSInternational:masterfrom
Emmanuel-Afrifa:ordinary-boxplot-outlier
Open

Added the boxplot outlier#10187
Emmanuel-Afrifa wants to merge 9 commits intoIDEMSInternational:masterfrom
Emmanuel-Afrifa:ordinary-boxplot-outlier

Conversation

@Emmanuel-Afrifa
Copy link
Collaborator

@Emmanuel-Afrifa Emmanuel-Afrifa commented Jan 21, 2026

Fixes #10126
@lilyclements @rdstern @berylwaswa This PR adds the boxplot outlier to the Boxplot dialog.

@lilyclements When I tried testing it with the same data I used to test the Climatic Boxplot dialog, I had the error message below. Please, could you help clarify what might be the issue? Thank you
for lil

Developer Testing Checklist

  • Runs without errors
  • OK disabled when dialog is incomplete or invalid
  • OK enabled only when required inputs are valid
  • Reset returns dialog to its default/sensible state
  • Invalid inputs are handled cleanly (e.g. negative, too-large, empty, impossible combos)
  • Running twice with different settings behaves consistently (e.g., open → run → close → reopen → change options checked → run again)
  • All AI/bot comments addressed (fixed, intentionally ignored with explanation, or queried)

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @Emmanuel-Afrifa, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the Boxplot dialog by integrating a new feature that allows users to identify and label outliers directly on their boxplots. The changes encompass both front-end additions, such as new input controls for specifying outlier variables, and robust back-end logic to dynamically construct and execute R code for outlier detection and visualization using popular R packages like dplyr and ggplot2. This provides users with more detailed insights into their data distributions directly within the application.

Highlights

  • Outlier Labeling Feature: Introduced the ability to label outliers directly on boxplots within the Boxplot dialog, providing enhanced data visualization capabilities.
  • New UI Controls: Added a 'Variable' label and a corresponding receiver control (ucrReceiverLabelOutliers) to the dialog, allowing users to select the specific variable whose outliers they wish to label.
  • Dynamic R Code Generation: Implemented extensive R code generation logic utilizing dplyr and ggplot2 functions (e.g., group_by, mutate, filter, geom_text) to dynamically identify and plot outliers based on user-defined selections and coefficients.
  • UI Layout Adjustments: Modified the layout, positioning, and sizing of various existing UI controls within the Boxplot dialog to seamlessly integrate the new outlier features and ensure proper display, potentially including improvements for DPI scaling.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the functionality to display boxplot outliers, addressing issue #10126, through new UI controls, R functions, and updated R code generation. Critically, it introduces multiple R injection vulnerabilities by using unquoted user-controlled column names to construct R commands, potentially allowing arbitrary code execution. All column names in R scripts must be properly quoted with backticks. Furthermore, some commented-out code blocks should be addressed for better maintainability.

@lilyclements
Copy link
Contributor

@Emmanuel-Afrifa can you paste the whole R code when you get an error in the future? Just makes it a tiny bit easier to look into. I'm having issues replicating this branch to look into it but that's probably just an internet thing my end. I can try again tomorrow

@Emmanuel-Afrifa
Copy link
Collaborator Author

@lilyclements Sorry about that. Kindly find the code below

# Dialog: Boxplot Options

guinea_two_stations <- data_book$get_data_frame(data_name="guinea_two_stations")
outliers <- guinea_two_stations %>% dplyr::group_by(station) %>% dplyr::mutate(is_out=rain %in% grDevices::boxplot.stats(rain, coef=1.5)$out) %>% dplyr::filter(is_out) %>% dplyr::ungroup() %>% dplyr::mutate(station=instatExtras::make_factor(station))

guinea_two_stations <- data_book$get_data_frame(data_name="guinea_two_stations")
last_graph <- ggplot2::ggplot(guinea_two_stations, mapping=ggplot2::aes(y=rain, x=year, fill=station)) + ggplot2::geom_boxplot(outlier.colour="red") + theme_grey() + ggplot2::geom_text(data=outliers, ggplot2::aes(), hjust=-0.2, position=ggplot2::position_nudge(x=0.05), size=3)
data_book$add_object(data_name="guinea_two_stations", object_name="last_graph", object_type_label="graph", object_format="image", object=instatExtras::check_graph(graph_object=last_graph))
data_book$get_object_data(data_name="guinea_two_stations", object_name="last_graph", as_file=TRUE)
rm(list=c("last_graph", "guinea_two_stations", "outliers"))

@lilyclements
Copy link
Contributor

So your error says the issue is in geom_text, so let's compare your geom_text here to where we know it works (your climatic boxplot code)

Your are running:

ggplot2::geom_text(data=outliers, ggplot2::aes(), hjust=-0.2, position=ggplot2::position_nudge(x=0.05), size=3)

But in the Climatic Outliers dialog, when I have label as station_id, it runs this:

ggplot2::geom_text(data=outliers, ggplot2::aes(label=station_id), hjust=-0.2, position=ggplot2::position_nudge(x=0.05), size=3)

So you're just missing something from your aes() arguments.

(This looks like it is your error as your error says it requires the label aesthetic)

@Emmanuel-Afrifa
Copy link
Collaborator Author

@lilyclements Please, I've fixed it. Thank you very much

@lilyclements
Copy link
Contributor

@Emmanuel-Afrifa great! Is this then ready for review?

@Emmanuel-Afrifa
Copy link
Collaborator Author

@lilyclements Yes, but one of my tests failed. It had something to do with the packages

@lilyclements
Copy link
Contributor

@Emmanuel-Afrifa don't worry about the pkg failure, I've noticed this on a few PRs. I've just messaged @ChrisMarsh82 in case he knows anything about this

Copy link
Contributor

@lilyclements lilyclements left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Emmanuel-Afrifa very good and impressive! And very quick. Took me a lot of tries to get a bug, and I only managed to find a tiny one on the TestOK which wasn't even introduced by you. But, while you're on this PR, the ucrNudOutliers can be empty and OK is enabled --- this shouldn't be the case. Can you set that if we're under the "Boxplot" option and that ucrNud is empty then OK is disabled?

@Emmanuel-Afrifa
Copy link
Collaborator Author

Emmanuel-Afrifa commented Jan 22, 2026

@lilyclements Please, I've updated the code. Please, can you check to see if everything is okay now? Thank you.

lilyclements
lilyclements previously approved these changes Jan 22, 2026
Copy link
Contributor

@lilyclements lilyclements left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Emmanuel-Afrifa nice! @MeSophie can you review this as well? I Have a feeling you were somewhat involved in this, but I might be wrong!

@MeSophie
Copy link
Contributor

@Emmanuel-Afrifa Good job! The dialogue is working well even when you change the data and options. The base buttons are also working well. Just a Small design change. Please can you fix this small layout issue with Single/Multiple Variables? Thank you.
image

@MeSophie
Copy link
Contributor

MeSophie commented Jan 23, 2026

@Emmanuel-Afrifa I Fixed the Outlier Coefficient Translation issue in Boxplot dialogs on PR #10193.
image
As you can see, the French text is too long. Could you also change the location of ucrNudOutlierCoefficient to 302; 333 to avoid the overlapping issue in the layout? Thank you.

@Emmanuel-Afrifa
Copy link
Collaborator Author

@MeSophie Well noted. I'll work on it. Thank you.

@lilyclements
Copy link
Contributor

@Emmanuel-Afrifa how is this coming along?

@Emmanuel-Afrifa
Copy link
Collaborator Author

@lilyclements Oh, sorry... It skipped me. I thought I had already worked on it. I'll do that now. Thank you.

@Emmanuel-Afrifa
Copy link
Collaborator Author

@lilyclements @MeSophie Please, can you check if it's okay now?

@Emmanuel-Afrifa
Copy link
Collaborator Author

@MeSophie Sorry, I've not have not updated this branch with the changes you made to the receivers, yet...

@lilyclements
Copy link
Contributor

lilyclements commented Feb 10, 2026

@MeSophie can you check the changes you suggested on the translations is OK now? (I don't know how to check this!)

lilyclements
lilyclements previously approved these changes Feb 10, 2026
@Emmanuel-Afrifa
Copy link
Collaborator Author

@MeSophie Please, how did you go about resizing the ucrVariableAsFactor control?

@rdstern
Copy link
Collaborator

rdstern commented Feb 10, 2026

@Emmanuel-Afrifa that's a very nice addition. The only minor detail is that the Single Variable/Multiple Variable labels have been moved to the side.

image

@MeSophie
Copy link
Contributor

@MeSophiePourriez-vous me dire comment vous avez procédé pour redimensionner le ucrVariableAsFactorcontrôle ?

@Emmanuel-Afrifa The easy way to fix it is to copy this ucrVariableAsFactor from another dialog, rename it and delete the previous one. You can keep the same size and of cause the same location. I tried it and it works fine. I also tried increasing the control, and it enlarges fine, but still without the right borders. I don't know what is causing this problem, and it only appears on this branch.

@Emmanuel-Afrifa
Copy link
Collaborator Author

@MeSophie Thank you very much, but still copying it from another dialog didn't work, so I manually set the size of the various receivers (single and multiple receiver) and the button in the code. I hope that's fine.

@rdstern @lilyclements Please, can you review that all other changes are okay?

This is how it looks on my end now

bxx1 bxx2

Copy link
Collaborator

@rdstern rdstern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Emmanuel-Afrifa I almost just approved, but now I find another teeny problem. Can you see it?

Image

Ok (and To Script) should be enabled when you even just have the Y variable.

So, your new "quiz question" is how did I do the dialog below, where it is enabled?

Image

Answer, if I put in just the Y variable it is not enabled. But if I press the Outlier coefficent then it becomes enabled! It would be nice if it were simpler than that, and perhaps there is a problem with the current code for this to be the case?

@Emmanuel-Afrifa
Copy link
Collaborator Author

Emmanuel-Afrifa commented Feb 11, 2026

@rdstern Please, I've made some changes Ok test logic. Please, can you check that it behaves as expected now? Thank you.

Copy link
Collaborator

@rdstern rdstern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Emmanuel-Afrifa it looks great now. I am approving
@Patowhiz please could you check, and hopefully merge. Maybe even make small improvements so you can merge, if needed.

Patowhiz
Patowhiz previously approved these changes Feb 16, 2026
Copy link
Contributor

@Patowhiz Patowhiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rdstern have you tested the other options in the dialog? In particular, the Jitter and Violin plot. Would be good to confirm that the dialog size is according to the way you expect.

I have approved this. Once you confirm the above, I'll go ahead and merge.

@rdstern
Copy link
Collaborator

rdstern commented Feb 16, 2026

@Emmanuel-Afrifa after @Patowhiz comments I have checked again. Here is my first example with the usual survey data.
image

Note I reduced the outlier coefficient to 1, to get a few more outliers.

Now I include the field number and don't get any label. What am I doing wrong?

image

Copy link
Collaborator

@rdstern rdstern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reported a problem, so not approving anymore!

@Emmanuel-Afrifa
Copy link
Collaborator Author

@rdstern Well noted. I'll need a bit of @lilyclements's help to resolve this, though.

@lilyclements, I believe the issue @rdstern reported may be coming from the snippet below, specifically the computation of the outliers, where the group_by() function is empty.

But this is also because, as you pointed out in the original issue, the variable being fed to the group_by() function and also to the mutate() function comes from the Second Factor receiver, which wasn't used here and thus explains why they're actually empty. Please, how could this be resolved? Thank you.

Image
# Dialog: Boxplot

survey <- data_book$get_data_frame(data_name="survey")
outliers <- survey %>% dplyr::group_by() %>% dplyr::mutate(is_out=yield %in% grDevices::boxplot.stats(yield, coef=1.0)$out) %>% dplyr::filter(is_out) %>% dplyr::ungroup() %>% dplyr::mutate()

survey <- data_book$get_data_frame(data_name="survey")
last_graph <- ggplot2::ggplot(survey, mapping=ggplot2::aes(y=yield, x=village)) + ggplot2::geom_boxplot(coef=1.0, outlier.colour="red") + theme_grey() + ggplot2::geom_text(data=outliers, ggplot2::aes(label=field), hjust=-0.2, position=ggplot2::position_nudge(x=0.05), size=3)
data_book$add_object(data_name="survey", object_name="last_graph", object_type_label="graph", object_format="image", object=instatExtras::check_graph(graph_object=last_graph))
data_book$get_object_data(data_name="survey", object_name="last_graph", as_file=TRUE)
rm(list=c("last_graph", "survey", "outliers"))

@lilyclements
Copy link
Contributor

@Emmanuel-Afrifa in your example there we want to group_by village because village is a factor.

What I am not understanding is why it is giving different answers if I do two factors. I can investigate this, but have a meeting now so didn't know if you'd instead want to dig into it!

e.g., run this in R, and you can see that outliers doesn't give Kesen, but our plot does!

survey <- data_book$get_data_frame(data_name="survey")
outliers <- survey %>% dplyr::group_by(village, variety) %>%
  dplyr::mutate(is_out=yield %in% grDevices::boxplot.stats(yield, coef=1.0)$out) %>%
  dplyr::filter(is_out) %>%
  dplyr::ungroup()
outliers
survey <- data_book$get_data_frame(data_name="survey")
ggplot2::ggplot(survey, mapping=ggplot2::aes(y=yield, x=village, fill = variety)) +
  ggplot2::geom_boxplot(coef=1.0, outlier.colour="red") + theme_grey() +
  ggplot2::geom_text(data=outliers, ggplot2::aes(label=field), hjust=-0.2,
                     position=ggplot2::position_nudge(x=0.05), size=3)

@lilyclements
Copy link
Contributor

@Emmanuel-Afrifa interestingly, even running the usual code doesn't give the outlier for that point! So, I think for now we can just say that in the group_by we have the variables in the factor receivers.

survey <- data_book$get_data_frame(data_name="survey")
ggplot2::ggplot(survey, mapping=ggplot2::aes(y=yield, x=village, fill = variety)) +
  ggplot2::geom_boxplot(coef=1.0, outlier.colour="red") + theme_grey() +
  ggplot2::stat_summary(aes(label=round(ggplot2::after_stat(y), 1)), geom="text", fun=\ (y) { o <- grDevices::boxplot.stats(y, coef=1) $ out ; if(length(o) == 0)  NaN else o } , hjust=-0.2)
image

@Emmanuel-Afrifa
Copy link
Collaborator Author

Emmanuel-Afrifa commented Feb 27, 2026

@lilyclements So, if I get you correctly, instead of the group_by() function taking only the variable in the second factor receiver, it should take that of the first factor receiver too? That's if both factor receivers are filled, then group_by() gets two variables and if only one of them is filled, then group_by() gets only that? And I suppose the order should be the first factor variable and the second factor variable.

@lilyclements
Copy link
Contributor

@Emmanuel-Afrifa the group_by should take the variables in the two factor receivers (Factor and Second Factor) - if only one is filled, then it takes only one of them.

@Emmanuel-Afrifa
Copy link
Collaborator Author

@lilyclements Please, I've updated the code for the group_by() function to take in the variables in the two factor receivers

@lilyclements
Copy link
Contributor

@Emmanuel-Afrifa great! @rdstern over to you to test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Label outliers also in the ordinary boxplot

5 participants