Skip to content

Commit 2c1f729

Browse files
committed
incorporate some review feedback
1 parent 8bd02a5 commit 2c1f729

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

63 files changed

+978
-621
lines changed

arranging.Rmd

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ Ideally, when displaying multiple related data views, they are linked through an
99

1010
## Arranging plotly objects
1111

12-
The `subplot()` function provides a flexible interface for merging multiple **plotly** objects into a single object. It is more flexible than most trellis display frameworks (e.g., **ggplot2**'s `facet_wrap()`) as you don't have to condition on a value of common variable in each display [@trellis]. Its capabilities and interface are similar to the `grid.arrange()` function from the **gridExtra** package, which allows you to arrange multiple **grid** grobs in a single view, effectively providing a way to arrange (possibly unrelated) **ggplot2** and/or **lattice** plots in a single view [@RCore]; [@gridExtra]; [@lattice]. Figure \@ref(fig:subplot-simple) shows the most simple way to use `subplot()` which is to directly supply plotly objects.
12+
The `subplot()` function provides a flexible interface for merging multiple **plotly** objects into a single object. It is more flexible than most trellis display frameworks (e.g., **ggplot2**'s `facet_wrap()`) as you don't have to condition on a value of common variable in each display [@trellis]. Its capabilities and interface are similar to the `grid.arrange()` function from the **gridExtra** package, which allows you to arrange multiple **grid** grobs in a single view, effectively providing a way to arrange (possibly unrelated) **ggplot2** and/or **lattice** plots in a single view [@RCore; @gridExtra; @lattice]. Figure \@ref(fig:subplot-simple) shows the most simple way to use `subplot()` which is to directly supply plotly objects.
1313

1414
```r
1515
library(plotly)
@@ -111,10 +111,10 @@ barcharts <- lapply(vars, function(var) {
111111
layout(showlegend = FALSE, hovermode = "y",
112112
yaxis = list(showticklabels = FALSE))
113113
})
114-
subplot(
115-
subplot(barcharts, margin = 0.01), map,
116-
nrows = 2, heights = c(0.3, 0.7), margin = 0.1
117-
)
114+
subplot(barcharts, margin = 0.01) %>%
115+
subplot(map, nrows = 2, heights = c(0.3, 0.7), margin = 0.1) %>%
116+
layout(legend = list(y = 1)) %>%
117+
colorbar(y = 0.5)
118118
```
119119

120120
```{r map-subplot, echo = FALSE, fig.cap = "(ref:map-subplot)"}
@@ -235,7 +235,7 @@ Since **plotly** objects are also **htmlwidgets**, any method that works for arr
235235
2. Bootstrap's grid layout: Both the **crosstalk** and **shiny** packages provide ways to arrange numerous components via Bootstrap's (a popular HTML/CSS framework) [grid layout system](https://getbootstrap.com/docs/4.1/layout/grid/).
236236
3. CSS flexbox: If you know some HTML and CSS, you can leverage [CSS flexbox](https://css-tricks.com/snippets/css/a-guide-to-flexbox/) to arrange components via the **htmltools** package.
237237

238-
Although **flexdashboard** is a really excellent way to arrange web-based content generated from R, it can pay-off to know the other two approaches as their arrangement techniques are agnostic to an **rmarkdown** output format. In other words, approaches 2-3 can be used with used with any **rmarkdown** template^[Although HTML can not possibly render in a pdf or word document, **knitr** can automatically detect a non-HTML output format and embed a static image of the htmlwidget via the **webshot** package [@webshot].] or really _any_ framework for website generation. Although Bootstrap grid layout system (2) is expressive and intuitive, using it in a larger website that also uses a different HTML/CSS framework (e.g. Bulma, Skeleton, etc) can cause issues. In that case, CSS flexbox (3) is a light-weight (i.e., no external CSS/JS dependencies) alternative that is less likely to introduce undesirable side-effects.
238+
Although **flexdashboard** is a really excellent way to arrange web-based content generated from R, it can pay-off to know the other two approaches as their arrangement techniques are agnostic to an **rmarkdown** output format. In other words, approaches 2-3 can be used with any **rmarkdown** template^[Although HTML can not possibly render in a pdf or word document, **knitr** can automatically detect a non-HTML output format and embed a static image of the htmlwidget via the **webshot** package [@webshot].] or really _any_ framework for website generation. Although Bootstrap grid layout system (2) is expressive and intuitive, using it in a larger website that also uses a different HTML/CSS framework (e.g. Bulma, Skeleton, etc) can cause issues. In that case, CSS flexbox (3) is a light-weight (i.e., no external CSS/JS dependencies) alternative that is less likely to introduce undesirable side-effects.
239239

240240
### Flexdashboard
241241

@@ -306,9 +306,9 @@ From the code example in Figure \@ref(fig:flexbox), you might notice that `displ
306306

307307
As we've already seen in Figures \@ref(fig:freqpoly-facet), \@ref(fig:trellis-txhousing), & \@ref(fig:subplot-trellis), the trellis (aka small multiple) display is an effective way to see how a conditional distribution behaves under different conditions. In other words, the trellis display helps us understand how patterns or structure in the data changes across groups. However, trellis displays do have a limitation: they don't scale very well to a large number of groups.
308308

309-
Before trellis displays were formally introduced, @scagnostics-tukey proposed a solution to the problem of scatterplots not being able to scale to a large number of variables (i.e., it's time consuming to visualize 1000 scatterplots!). The proposed solution involved using quantitative measurements of various scatterplot characteristics (e.g. correlation, clumpiness, etc) to help summarise and guide attention towards 'interesting' scatterplots. This idea, coined scagnostics (short for scatterplot diagnostics), has since been made explicit, and many other similar applications have been explored, even techniques for time-series [@Wilkinson:2005b]; [@Wilkinson:2008]; [@Wilkinson:2012]. The idea of associating quantitative measures with a graphical display of data can be generalized to include more that just scatterplots, and in this more general case, these measures are sometimes referred to as cognostics.
309+
Before trellis displays were formally introduced, @scagnostics-tukey proposed a solution to the problem of scatterplots not being able to scale to a large number of variables (i.e., it's time consuming to visualize 1000 scatterplots!). The proposed solution involved using quantitative measurements of various scatterplot characteristics (e.g. correlation, clumpiness, etc) to help summarise and guide attention towards 'interesting' scatterplots. This idea, coined scagnostics (short for scatterplot diagnostics), has since been made explicit, and many other similar applications have been explored, even techniques for time-series [@Wilkinson:2005b; @Wilkinson:2008; @Wilkinson:2012]. The idea of associating quantitative measures with a graphical display of data can be generalized to include more that just scatterplots, and in this more general case, these measures are sometimes referred to as cognostics.
310310

311-
In addition to being useful for navigating exploration of many variables, cognostics can also be useful for exploring many subsets of data. This idea has inspired work on more general divide & recombine technique(s) for working with navigating through many statistical artifacts [@divide-recombine]; [@RHIPE], including visualizations [@trelliscope]. The **trelliscope** package provides a system for computing arbitrary cognostics on each panel of a trellis display as well as an interactive graphical user interface for defining (and navigating through) interesting panels based on those cognostics [@trelliscope-pkg]. This system also allows users to define the graphical method for displaying each panel, so **plotly** graphs can easily be embedded. The **trelliscope** package is currently built upon **shiny**, but as Figure \@ref(fig:trelliscope) demonstrates, the **trelliscopejs** package provides lower-level tools that allow one to create trelliscope displays without **shiny** [@trelliscopejs].
311+
In addition to being useful for navigating exploration of many variables, cognostics can also be useful for exploring many subsets of data. This idea has inspired work on more general divide & recombine technique(s) for working with navigating through many statistical artifacts [@divide-recombine; @RHIPE], including visualizations [@trelliscope]. The **trelliscope** package provides a system for computing arbitrary cognostics on each panel of a trellis display as well as an interactive graphical user interface for defining (and navigating through) interesting panels based on those cognostics [@trelliscope-pkg]. This system also allows users to define the graphical method for displaying each panel, so **plotly** graphs can easily be embedded. The **trelliscope** package is currently built upon **shiny**, but as Figure \@ref(fig:trelliscope) demonstrates, the **trelliscopejs** package provides lower-level tools that allow one to create trelliscope displays without **shiny** [@trelliscopejs].
312312

313313
As the video behind Figure \@ref(fig:trelliscope) demonstrates, **trelliscopejs** provides two very powerful interactive techniques for surfacing 'interesting' panels: sorting and filtering. In this toy example, each panel represents a different country, and the life expentancy is plotted as a function of time. By default, **trelliscopejs** sorts panels by group alphabetically, which is why, on page load we see the first 12 countries (Afghanistan, Albania, Algeria, etc). By opening the sort menu, we can pick and sort by any cognostic for any variable in the dataset. If no cognostics are supplied (as it the case here), some sensible ones are computed and supplied for us (e.g., mean, median, var, max, min). In this case, since we are primarily interested in life expectancy, we sort by life expectancy. This simple task allows us to quickly see the countries with the best and worst average life expectancy, as well as how it has evolved over time. By combining sort with filter, we can surface countries that perform well/poorly under certain conditions. For example, Cuba, Uruguay, Taiwan have great life expectancy considering their GDP per capita. Also, within the Americas, Haiti, Bolivia, and Guatemala have the poorest life expectancy.
314314

book.bib

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -105,7 +105,7 @@ @Book{lattice
105105
}
106106

107107
@online{viridis,
108-
author = {Berkeley Institute for Data Science},
108+
author = {{Berkeley Institute for Data Science}},
109109
title = {mpl colormaps},
110110
year = 2016,
111111
url = {http://web.archive.org/web/20160601125258/http://bids.github.io/colormap/},

creating-bars.Rmd

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
# Bars & histograms {#bars-histograms}
22

3-
The `add_bars()` and `add_histogram()` functions wrap the [bar](https://plot.ly/r/reference/#bar) and [histogram](https://plot.ly/r/reference/#histogram) plotly.js trace types. The main difference between them is that bar traces require bar heights (both `x` and `y`), whereas histogram traces require just a single variable, and plotly.js handles binning in the browser.^[This has some interesting applications for [linked highlighting](#linked-highlighting) as it allows for summary statistics to be computed on-the-fly based on a selection] And perhaps confusingly, both of these functions can be used to visualize the distribution of either a numeric or a discrete variable. So, essentially, the only difference between them is where the binning occurs.
3+
The `add_bars()` and `add_histogram()` functions wrap the [bar](https://plot.ly/r/reference/#bar) and [histogram](https://plot.ly/r/reference/#histogram) plotly.js trace types. The main difference between them is that bar traces require bar heights (both `x` and `y`), whereas histogram traces require just a single variable, and plotly.js handles binning in the browser.^[As we'll see in Section \@ref(graphical-queries), and specifically Figure \@ref(fig:txhousing-aggregates), using 'statistical' a trace type like `add_histogram()` enables statistical graphical queries.] And perhaps confusingly, both of these functions can be used to visualize the distribution of either a numeric or a discrete variable. So, essentially, the only difference between them is where the binning occurs.
44

5-
Figure \@ref(fig:bars-numeric) compares the default binning algorithm in plotly.js to a few different algorithms available in R via the `hist()` function. Although plotly.js has the ability to customize histogram bins via [`xbins`](https://plot.ly/r/reference/#histogram-xbins)/[`ybins`](https://plot.ly/r/reference/#histogram-ybins), R has diverse facilities for estimating the optimal number of bins in a histogram that we can easily leverage.^[Optimal in this context is the number of bins which minimizes the distance between the empirical histogram and the underlying density.] The `hist()` function alone allows us to reference 3 famous algorithms by name [@Sturges]; [@FD]; [@hist-scott], but there are also packages (e.g. the **histogram** package) which extend this interface to incorporate more methodology [@histogram]. The `price_hist()` function below wraps the `hist()` function to obtain the binning results, and map those bins to a plotly version of the histogram using `add_bars()`.
5+
Figure \@ref(fig:bars-numeric) compares the default binning algorithm in plotly.js to a few different algorithms available in R via the `hist()` function. Although plotly.js has the ability to customize histogram bins via [`xbins`](https://plot.ly/r/reference/#histogram-xbins)/[`ybins`](https://plot.ly/r/reference/#histogram-ybins), R has diverse facilities for estimating the optimal number of bins in a histogram that we can easily leverage.^[Optimal in this context is the number of bins which minimizes the distance between the empirical histogram and the underlying density.] The `hist()` function alone allows us to reference 3 famous algorithms by name [@Sturges; @FD; @hist-scott], but there are also packages (e.g. the **histogram** package) which extend this interface to incorporate more methodology [@histogram]. The `price_hist()` function below wraps the `hist()` function to obtain the binning results, and map those bins to a plotly version of the histogram using `add_bars()`.
66

77
```r
88
p1 <- plot_ly(diamonds, x = ~price) %>% add_histogram(name = "plotly.js")
@@ -22,7 +22,7 @@ subplot(
2222
knitr::include_graphics("images/bars-numeric.svg")
2323
```
2424

25-
Figure \@ref(fig:bars-discrete) demonstrates two ways of creating a basic bar chart. Although the visual results are the same, its worth noting the difference in implementation. The `add_histogram()` function sends all of the observed values to the browser and lets plotly.js perform the binning. It takes more human effort to perform the binning in R, but doing so has the benefit of sending less data, and requiring less computation work of the web browser. In this case, we have only about 50,000 records, so there is not much of a difference in page load times or page size. However, with 1 Million records, page load time more than doubles and page size nearly doubles.^[These tests were run on Google Chrome and loaded a page with a single bar chart. [Here](https://www.webpagetest.org/result/160924_DP_JBX/) are the results for `add_histogram()` and [here](https://www.webpagetest.org/result/160924_QG_JA1/) are the results for `add_bars()` ]
25+
Figure \@ref(fig:bars-discrete) demonstrates two ways of creating a basic bar chart. Although the visual results are the same, its worth noting the difference in implementation. The `add_histogram()` function sends all of the observed values to the browser and lets plotly.js perform the binning. It takes more human effort to perform the binning in R, but doing so has the benefit of sending less data, and requiring less computation work of the web browser. In this case, we have only about 50,000 records, so there is not much of a difference in page load times or page size. However, with 1 Million records, page load time more than doubles and page size nearly doubles.^[These tests were run on Google Chrome and loaded a page with a single bar chart. See <https://www.webpagetest.org/result/160924_DP_JBX> for `add_histogram()` and <https://www.webpagetest.org/result/160924_QG_JA1> for `add_bars()`.]
2626

2727
```r
2828
library(dplyr)
@@ -111,7 +111,7 @@ knitr::include_graphics("images/ggmosaic.svg")
111111

112112
# Boxplots
113113

114-
Boxplots encode the five number summary of a numeric variable, and are more efficient than [trellis displays of histograms](multiple-numeric-distributions) for comparing many numeric distributions. The `add_boxplot()` function requires one numeric variable, and guarantees boxplots are [oriented](https://plot.ly/r/reference/#box-orientation) correctly, regardless of whether the numeric variable is placed on the x or y scale. As Figure \@ref(fig:cut-boxes) shows, on the axis orthogonal to the numeric axis, you can provide a discrete variable (for conditioning) or supply a single value (to name the axis category).
114+
Boxplots encode the five number summary of a numeric variable, and provide a decent way to compare many numeric distributions. The visual task of comparing multiple boxplots is relatively easy (i.e., compare position along a common scale) compared to some common alternatives (e.g., a trellis display of histograms, like \@ref(fig:bars-numeric)), but the boxplot is sometimes inadequate for capturing complex (e.g., multi-modal) distributions (in this case, a frequency polygon, like Figure \@ref(fig:freqpoly) provides a nice alternative). The `add_boxplot()` function requires one numeric variable, and guarantees boxplots are [oriented](https://plot.ly/r/reference/#box-orientation) correctly, regardless of whether the numeric variable is placed on the x or y scale. As Figure \@ref(fig:cut-boxes) shows, on the axis orthogonal to the numeric axis, you can provide a discrete variable (for conditioning) or supply a single value (to name the axis category).
115115

116116
```r
117117
p <- plot_ly(diamonds, y = ~price, color = I("black"),

0 commit comments

Comments
 (0)