Skip to content

Commit cb1b403

Browse files
authored
Merge pull request #212 from stan-dev/ppc-dens-overlay-group
add ppc_*_overlay_grouped functions
2 parents 4c10462 + fb6f559 commit cb1b403

16 files changed

+997
-83
lines changed

NAMESPACE

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -104,7 +104,9 @@ export(ppc_boxplot)
104104
export(ppc_data)
105105
export(ppc_dens)
106106
export(ppc_dens_overlay)
107+
export(ppc_dens_overlay_grouped)
107108
export(ppc_ecdf_overlay)
109+
export(ppc_ecdf_overlay_grouped)
108110
export(ppc_error_binned)
109111
export(ppc_error_hist)
110112
export(ppc_error_hist_grouped)

NEWS.md

Lines changed: 26 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -8,24 +8,30 @@
88
* Items for next release go here
99
-->
1010

11+
* Added `ppc_dens_overlay_grouped()` and `ppc_ecdf_overlay_grouped()` for
12+
plotting density and cumulative distributions of the posterior predictive
13+
distribution (versus observed data) by group. (#212)
14+
15+
* Fix bug in `color_scheme_view()` minimal theme (#213).
16+
1117
* On the y axis, `ppc_loo_pit_qq(..., compare = "normal")` now plots standard
1218
normal quantiles calculated from the PIT values (instead of the standardized
1319
PIT values). (#240, #243, @fweber144)
1420

1521
* New plotting function `ppc_km_overlay()` for outcome variables that are
16-
right-censored. Empirical CCDF estimates of `yrep` are compared with the
22+
right-censored. Empirical CCDF estimates of `yrep` are compared with the
1723
Kaplan-Meier estimate of `y`. (#233, #234, @fweber144)
1824

19-
* CmdStanMCMC objects (from CmdStanR) can now be used with extractor
20-
functions `nuts_params()`, `log_posterior()`, `rhat()`, and
25+
* CmdStanMCMC objects (from CmdStanR) can now be used with extractor
26+
functions `nuts_params()`, `log_posterior()`, `rhat()`, and
2127
`neff_ratio()`. (#227)
2228

2329
* Added missing `facet_args` argument to `mcmc_rank_overlay()`. (#221, @hhau)
2430

25-
* Size of points and interval lines can set in
26-
`mcmc_intervals(..., outer_size, inner_size, point_size)`. (#215, #228, #229)
27-
28-
* `mcmc_areas()` tries to use less blank vertical blank space. (#218, #230)
31+
* Size of points and interval lines can set in
32+
`mcmc_intervals(..., outer_size, inner_size, point_size)`. (#215, #228, #229)
33+
34+
* `mcmc_areas()` tries to use less blank vertical blank space. (#218, #230)
2935

3036
* `ppc_loo_pit_overlay()` now uses a boundary correction for an improved kernel
3137
density estimation. The new argument `boundary_correction` defaults to TRUE but
@@ -53,39 +59,39 @@ matrices also inheriting from "array" in R 4.0.
5359
examples. (#161, #183, #188)
5460

5561
* Two new plots have been added for inspecting the distribution of ranks.
56-
Rank histograms were introduced by the Stan team's [new paper on
62+
Rank histograms were introduced by the Stan team's [new paper on
5763
MCMC diagnostics](https://arxiv.org/abs/1903.08008). (#178, #179)
5864

5965
`mcmc_rank_hist()`: A traditional traceplot (`mcmc_trace()`) visualizes how
6066
sampled values the MCMC chains mix over the course of sampling. A rank
6167
histogram (`mcmc_rank_hist()`) visualizes how the *ranks* of values from the
6268
chains mix together. An ideal plot would show the ranks mixing or overlapping
6369
in a uniform distribution.
64-
70+
6571
`mcmc_rank_overlay()`: Instead of drawing each chain's histogram in a separate
6672
panel, this plot draws the top edge of the chains' histograms in a single
6773
panel.
68-
74+
6975
* Added `mcmc_trace_data()`, which returns the data used for plotting the trace
7076
plots and rank histograms. (Advances #97)
7177

7278
* [ColorBrewer](http://colorbrewer2.org) palettes are now available as color
7379
schemes via
7480
[`color_scheme_set()`](https://mc-stan.org/bayesplot/reference/bayesplot-colors.html).
75-
For example, `color_scheme_set("brewer-Spectral")` will use the Spectral
81+
For example, `color_scheme_set("brewer-Spectral")` will use the Spectral
7682
palette. (#177, #190)
7783

78-
* MCMC plots now also accept objects with an `as.array` method as
84+
* MCMC plots now also accept objects with an `as.array` method as
7985
input (e.g., stanfit objects). (#175, #184)
8086

8187
* [`mcmc_trace()`](https://mc-stan.org/bayesplot/reference/MCMC-traces.html)
8288
gains an argument `iter1` which can be used to label the traceplot starting
8389
from the first iteration after warmup. (#14, #155, @mcol)
8490

8591
* [`mcmc_areas()`](https://mc-stan.org/bayesplot/reference/MCMC-intervals.html)
86-
gains an argument `area_method` which controls how to draw the density
87-
curves. The default `"equal area"` constrains the heights so that the curves
88-
have the same area. As a result, a narrow interval will appear as a spike
92+
gains an argument `area_method` which controls how to draw the density
93+
curves. The default `"equal area"` constrains the heights so that the curves
94+
have the same area. As a result, a narrow interval will appear as a spike
8995
of density, while a wide, uncertain interval is spread thin over the _x_ axis.
9096
Alternatively `"equal height"` will set the maximum height on each curve to
9197
the same value. This works well when the intervals are about the same width.
@@ -112,12 +118,12 @@ matrices also inheriting from "array" in R 4.0.
112118
* The examples in
113119
[`?ppc_loo_pit_overlay()`](https://mc-stan.org/bayesplot/reference/PPC-loo.html)
114120
now work as expected. (#166, #167)
115-
116-
* Added `"viridisD"` as an alternative name for `"viridis"` to the supported
121+
122+
* Added `"viridisD"` as an alternative name for `"viridis"` to the supported
117123
colors.
118124

119-
* Added `"viridisE"` (the [cividis](https://github.com/marcosci/cividis)
120-
version of viridis) to the supported colors.
125+
* Added `"viridisE"` (the [cividis](https://github.com/marcosci/cividis)
126+
version of viridis) to the supported colors.
121127

122128
* `ppc_bars()` and `ppc_bars_grouped()` now allow negative integers as input.
123129
(#172, @jeffpollock9)
@@ -160,7 +166,7 @@ matrices also inheriting from "array" in R 4.0.
160166
gains an argument `discrete`, which is `FALSE` by default, but can be used
161167
to make the Geom more appropriate for discrete data. (#145)
162168

163-
* [PPC intervals
169+
* [PPC intervals
164170
plots](https://mc-stan.org/bayesplot/reference/PPC-intervals.html) and [LOO
165171
predictive checks](https://mc-stan.org/bayesplot/reference/PPC-loo.html) now
166172
draw both an outer and an inner probability interval, which can be

R/bayesplot-colors.R

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -230,8 +230,9 @@ plot_scheme <- function(scheme = NULL) {
230230
legend_none() +
231231
xaxis_text(
232232
face = "bold",
233-
margin = margin(t = -3, b = 10),
233+
margin = margin(t = -3, b = 10, unit = "pt"),
234234
angle = 0,
235+
vjust = 1,
235236
debug = FALSE
236237
)
237238
}

R/ppc-distributions.R

Lines changed: 141 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,8 @@
3333
#' `yrep` should therefore contain only a small number of rows. See the
3434
#' **Examples** section.
3535
#' }
36-
#' \item{`ppc_dens_overlay(), ppc_ecdf_overlay()`}{
36+
#' \item{`ppc_ecdf_overlay(), ppc_dens_overlay(),
37+
#' ppc_ecdf_overlay_grouped(), ppc_dens_overlay_grouped()`}{
3738
#' Kernel density or empirical CDF estimates of each dataset (row) in
3839
#' `yrep` are overlaid, with the distribution of `y` itself on top
3940
#' (and in a darker shade). When using `ppc_ecdf_overlay()` with discrete
@@ -58,6 +59,7 @@
5859
#' yrep <- example_yrep_draws()
5960
#' dim(yrep)
6061
#' ppc_dens_overlay(y, yrep[1:25, ])
62+
#'
6163
#' \donttest{
6264
#' # ppc_ecdf_overlay with continuous data (set discrete=TRUE if discrete data)
6365
#' ppc_ecdf_overlay(y, yrep[sample(nrow(yrep), 25), ])
@@ -85,6 +87,11 @@
8587
#' ppc_freqpoly_grouped(y, yrep[1:3,], group, freq = FALSE) + yaxis_text()
8688
#' }
8789
#'
90+
#' # density and distribution overlays by group
91+
#' ppc_dens_overlay_grouped(y, yrep[1:25, ], group = group)
92+
#'
93+
#' ppc_ecdf_overlay_grouped(y, yrep[1:25, ], group = group)
94+
#'
8895
#' # don't need to only use small number of rows for ppc_violin_grouped
8996
#' # (as it pools yrep draws within groups)
9097
#' color_scheme_set("gray")
@@ -265,15 +272,18 @@ ppc_dens <- function(y, yrep, ..., trim = FALSE, size = 0.5, alpha = 1) {
265272
#' @rdname PPC-distributions
266273
#' @export
267274
#' @template args-density-controls
268-
ppc_dens_overlay <- function(y, yrep, ...,
269-
size = 0.25,
270-
alpha = 0.7,
271-
trim = FALSE,
272-
bw = "nrd0",
273-
adjust = 1,
274-
kernel = "gaussian",
275-
n_dens = 1024) {
276-
275+
ppc_dens_overlay <- function(
276+
y,
277+
yrep,
278+
...,
279+
size = 0.25,
280+
alpha = 0.7,
281+
trim = FALSE,
282+
bw = "nrd0",
283+
adjust = 1,
284+
kernel = "gaussian",
285+
n_dens = 1024
286+
) {
277287
check_ignored_arguments(...)
278288
data <- ppc_data(y, yrep)
279289

@@ -315,10 +325,46 @@ ppc_dens_overlay <- function(y, yrep, ...,
315325
yaxis_ticks(FALSE)
316326
}
317327

328+
#' @rdname PPC-distributions
329+
#' @export
330+
#' @template args-density-controls
331+
ppc_dens_overlay_grouped <- function(
332+
y,
333+
yrep,
334+
group,
335+
...,
336+
size = 0.25,
337+
alpha = 0.7,
338+
trim = FALSE,
339+
bw = "nrd0",
340+
adjust = 1,
341+
kernel = "gaussian",
342+
n_dens = 1024
343+
) {
344+
check_ignored_arguments(...)
318345

319-
320-
321-
346+
p_overlay <- ppc_dens_overlay(
347+
y = y,
348+
yrep = yrep,
349+
...,
350+
size = size,
351+
alpha = alpha,
352+
trim = trim,
353+
bw = bw,
354+
adjust = adjust,
355+
kernel = kernel,
356+
n_dens = n_dens
357+
)
358+
# Use + list(data) trick to replace the data in the plot. The layer-specific
359+
# data in the y and yrep layers should be safe because they are
360+
# specified using a function on the main plot data.
361+
data <- ppc_data(y, yrep, group = group)
362+
p_overlay <- p_overlay + list(data)
363+
364+
p_overlay +
365+
facet_wrap("group") +
366+
force_axes_in_facets()
367+
}
322368

323369
#' @export
324370
#' @rdname PPC-distributions
@@ -327,48 +373,90 @@ ppc_dens_overlay <- function(y, yrep, ...,
327373
#' passed to [ggplot2::stat_ecdf()]. If `discrete` is set to
328374
#' `TRUE` then `geom="step"` is used.
329375
#' @param pad A logical scalar passed to [ggplot2::stat_ecdf()].
330-
ppc_ecdf_overlay <-
331-
function(y,
332-
yrep,
333-
...,
334-
discrete = FALSE,
335-
pad = TRUE,
336-
size = 0.25,
337-
alpha = 0.7) {
338-
check_ignored_arguments(...)
339-
data <- ppc_data(y, yrep)
376+
ppc_ecdf_overlay <- function(
377+
y,
378+
yrep,
379+
...,
380+
discrete = FALSE,
381+
pad = TRUE,
382+
size = 0.25,
383+
alpha = 0.7
384+
) {
385+
check_ignored_arguments(...)
386+
data <- ppc_data(y, yrep)
340387

341-
ggplot(data) +
342-
aes_(x = ~ value) +
343-
hline_at(
344-
c(0, 0.5, 1),
345-
size = c(0.2, 0.1, 0.2),
346-
linetype = 2,
347-
color = get_color("dh")
348-
) +
349-
stat_ecdf(
350-
data = function(x) dplyr::filter(x, !.data$is_y),
351-
mapping = aes_(group = ~ rep_id, color = "yrep"),
352-
geom = if (discrete) "step" else "line",
353-
size = size,
354-
alpha = alpha,
355-
pad = pad
356-
) +
357-
stat_ecdf(
358-
data = function(x) dplyr::filter(x, .data$is_y),
359-
mapping = aes_(color = "y"),
360-
geom = if (discrete) "step" else "line",
361-
size = 1,
362-
pad = pad
363-
) +
364-
scale_color_ppc_dist() +
365-
xlab(y_label()) +
366-
scale_y_continuous(breaks = c(0, 0.5, 1)) +
367-
yaxis_title(FALSE) +
368-
xaxis_title(FALSE) +
369-
yaxis_ticks(FALSE) +
388+
ggplot(data) +
389+
aes_(x = ~ value) +
390+
hline_at(
391+
0.5,
392+
size = 0.1,
393+
linetype = 2,
394+
color = get_color("dh")
395+
) +
396+
hline_at(
397+
c(0, 1),
398+
size = 0.2,
399+
linetype = 2,
400+
color = get_color("dh")
401+
) +
402+
stat_ecdf(
403+
data = function(x) dplyr::filter(x, !.data$is_y),
404+
mapping = aes_(group = ~ rep_id, color = "yrep"),
405+
geom = if (discrete) "step" else "line",
406+
size = size,
407+
alpha = alpha,
408+
pad = pad
409+
) +
410+
stat_ecdf(
411+
data = function(x) dplyr::filter(x, .data$is_y),
412+
mapping = aes_(color = "y"),
413+
geom = if (discrete) "step" else "line",
414+
size = 1,
415+
pad = pad
416+
) +
417+
scale_color_ppc_dist() +
418+
xlab(y_label()) +
419+
scale_y_continuous(breaks = c(0, 0.5, 1)) +
420+
yaxis_title(FALSE) +
421+
xaxis_title(FALSE) +
422+
yaxis_ticks(FALSE) +
370423
bayesplot_theme_get()
371-
}
424+
}
425+
426+
#' @export
427+
#' @rdname PPC-distributions
428+
ppc_ecdf_overlay_grouped <- function(
429+
y,
430+
yrep,
431+
group,
432+
...,
433+
discrete = FALSE,
434+
pad = TRUE,
435+
size = 0.25,
436+
alpha = 0.7
437+
) {
438+
check_ignored_arguments(...)
439+
440+
p_overlay <- ppc_ecdf_overlay(
441+
y = y,
442+
yrep = yrep,
443+
...,
444+
discrete = discrete,
445+
pad = pad,
446+
size = size,
447+
alpha = alpha
448+
)
449+
450+
# Use + list(data) trick to replace the data in the plot
451+
data <- ppc_data(y, yrep, group = group)
452+
p_overlay <- p_overlay + list(data)
453+
454+
p_overlay +
455+
facet_wrap("group") +
456+
force_axes_in_facets()
457+
}
458+
459+
372460

373461
#' @export
374462
#' @rdname PPC-distributions

0 commit comments

Comments
 (0)