-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Description
We need to reproduce SAS boxplot output where percentiles use SAS’s default PCTLDEF = 5, which corresponds to R’s stats::quantile(type = 2). Currently, geom_boxplot()/stat_boxplot() use R’s default quantile definition (type = 7) for hinges/median, and there is no way to change it. This makes it impossible to match SAS results out-of-the-box.
I created a fork with a minimal fix that adds a quantile_type parameter (default 7) to stat_boxplot() (and forwards it from geom_boxplot()), so users can request type = 2 when they need SAS parity.
Although this change is technically small and narrowly scoped, it directly affects which points are classified as outliers and can lead to large, visibly different boxplots; since we’re actively encouraging SAS users to adopt R/ggplot2, supporting SAS’s default percentile definition (PCTLDEF = 5 → R type = 2) would be a valuable, low-risk quality-of-life improvement.
Minimal reprex
This one shows the mismatch between current ggplot2 output (implicitly type = 7) and SAS default (type = 2) on a small sample.
library(ggplot2)
x <- data.frame(group = "A", y = c(1, 2, 100, 101))
qs <- c(0, 0.25, 0.5, 0.75, 1)
# What SAS default (PCTLDEF = 5) corresponds to in R
sas_like <- as.numeric(quantile(x$y, probs = qs, type = 2))
names(sas_like) <- c("ymin", "lower", "middle", "upper", "ymax")
# Current ggplot2 (implicitly type = 7)
p <- ggplot(x, aes(group, y)) + geom_boxplot()
built <- ggplot_build(p)$data[[1]]
current <- unlist(built[c("ymin", "lower", "middle", "upper", "ymax")])
list(current = current, sas_like = sas_like)
#> $current
#> ymin lower middle upper ymax
#> 1.00 1.75 51.00 100.25 101.00
#>
#> $sas_like
#> ymin lower middle upper ymax
#> 1.0 1.5 51.0 100.5 101.0Created on 2026-03-12 with reprex v2.1.1
And this one shows how my fork resolves it via quantile_type = 2:
# remotes::install_github("munztd0/ggplot2", ref = "feature/boxplot-quantile-type")
library(ggplot2)
x <- data.frame(group = "A", y = c(1, 2, 100, 101))
qs <- c(0, 0.25, 0.5, 0.75, 1)
# What SAS default (PCTLDEF = 5) corresponds to in R
sas_like <- as.numeric(quantile(x$y, probs = qs, type = 2))
names(sas_like) <- c("ymin", "lower", "middle", "upper", "ymax")
p2 <- ggplot(x, aes(group, y)) + geom_boxplot(quantile_type = 2)
built2 <- ggplot_build(p2)$data[[1]]
with_param <- unlist(built2[c("ymin", "lower", "middle", "upper", "ymax")])
all.equal(with_param[c("lower", "middle", "upper")], sas_like[c("lower", "middle", "upper")])
#> [1] TRUECreated on 2026-03-12 with reprex v2.1.1