Skip to content

Allow choosing quantile definition in boxplots to match SAS default (PCTLDEF = 5, R type = 2) #6819

@munoztd0

Description

@munoztd0

We need to reproduce SAS boxplot output where percentiles use SAS’s default PCTLDEF = 5, which corresponds to R’s stats::quantile(type = 2). Currently, geom_boxplot()/stat_boxplot() use R’s default quantile definition (type = 7) for hinges/median, and there is no way to change it. This makes it impossible to match SAS results out-of-the-box.

I created a fork with a minimal fix that adds a quantile_type parameter (default 7) to stat_boxplot() (and forwards it from geom_boxplot()), so users can request type = 2 when they need SAS parity.

Although this change is technically small and narrowly scoped, it directly affects which points are classified as outliers and can lead to large, visibly different boxplots; since we’re actively encouraging SAS users to adopt R/ggplot2, supporting SAS’s default percentile definition (PCTLDEF = 5 → R type = 2) would be a valuable, low-risk quality-of-life improvement.

Minimal reprex

This one shows the mismatch between current ggplot2 output (implicitly type = 7) and SAS default (type = 2) on a small sample.

library(ggplot2)

x <- data.frame(group = "A", y = c(1, 2, 100, 101))
qs <- c(0, 0.25, 0.5, 0.75, 1)

# What SAS default (PCTLDEF = 5) corresponds to in R
sas_like <- as.numeric(quantile(x$y, probs = qs, type = 2))
names(sas_like) <- c("ymin", "lower", "middle", "upper", "ymax")

# Current ggplot2 (implicitly type = 7)
p <- ggplot(x, aes(group, y)) + geom_boxplot()
built <- ggplot_build(p)$data[[1]]
current <- unlist(built[c("ymin", "lower", "middle", "upper", "ymax")])

list(current = current, sas_like = sas_like)
#> $current
#>   ymin  lower middle  upper   ymax 
#>   1.00   1.75  51.00 100.25 101.00 
#> 
#> $sas_like
#>   ymin  lower middle  upper   ymax 
#>    1.0    1.5   51.0  100.5  101.0

Created on 2026-03-12 with reprex v2.1.1

And this one shows how my fork resolves it via quantile_type = 2:

# remotes::install_github("munztd0/ggplot2", ref = "feature/boxplot-quantile-type")
library(ggplot2)

x <- data.frame(group = "A", y = c(1, 2, 100, 101))
qs <- c(0, 0.25, 0.5, 0.75, 1)

# What SAS default (PCTLDEF = 5) corresponds to in R
sas_like <- as.numeric(quantile(x$y, probs = qs, type = 2))
names(sas_like) <- c("ymin", "lower", "middle", "upper", "ymax")
p2 <- ggplot(x, aes(group, y)) + geom_boxplot(quantile_type = 2)
built2 <- ggplot_build(p2)$data[[1]]
with_param <- unlist(built2[c("ymin", "lower", "middle", "upper", "ymax")])
all.equal(with_param[c("lower", "middle", "upper")], sas_like[c("lower", "middle", "upper")])
#> [1] TRUE

Created on 2026-03-12 with reprex v2.1.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions