Update creating-scatter.Rmd

sibusiso16 · web-flow · commit 4d6d9c31b1d7 · 2019-05-03T10:26:30.000+02:00
diff --git a/creating-scatter.Rmd b/creating-scatter.Rmd
@@ -87,15 +87,15 @@ schema()
 knitr::include_graphics("images/schema.png")
 ```
 
-The sections that follow in this chapter demonstrate various type of data views using scatter-based layers. In attempt to avoid duplication of documentation, a particular emphasis is put on features only currently availble from the R package (e.g. the aesthetic mapping arguments).
+The sections that follow in this chapter demonstrate various type of data views using scatter-based layers. In attempt to avoid duplication of documentation, a particular emphasis is put on features only currently availble from the R package (e.g., the aesthetic mapping arguments).
 
 ## Markers
 
-This section details scatter traces with a `mode` of `"markers"` (i.e., `add_markers()`). For simplicity, many of the examples here use `add_markers()` with a numeric x and y axis, which results in scatterplot -- a common way to visualize the association between two quantitative variables.  The content that follows is still relevant markers displayed non-numeric x and y (aka dot pots) as shown in Section \@ref(dot-plots)
+This section details scatter traces with a `mode` of `"markers"` (i.e., `add_markers()`). For simplicity, many of the examples here use `add_markers()` with a numeric x and y axis, which results in scatterplot -- a common way to visualize the association between two quantitative variables.  The content that follows is still relevant markers displayed non-numeric x and y (aka dot pots) as shown in Section \@ref(dot-plots).
 
 ### Alpha blending {#marker-alpha}
 
-As @unwin-graphical-analysis notes, scatterplots can be useful for exposing other important features including: casual relationships, outliers, clusters, gaps, barriers, and conditional relationships. A common problem with scatterplots however is overplotting, meaning that there are multiple observations occupying the same (or similar) x/y locations. Figure \@ref(fig:scatterplots) demonstrates one way to combat overplotting via alpha blending. When dealing with tens of thousands of points (or more), consider using `toWebGL()` to render plots using Canvas rather than SVG (more in Chapter \@ref(performance), or leveraging 2D density estimation (Section \@ref(rectangular-binning-in-r)).
+As @unwin-graphical-analysis notes, scatterplots can be useful for exposing other important features including: casual relationships, outliers, clusters, gaps, barriers, and conditional relationships. A common problem with scatterplots, however is overplotting, meaning that there are multiple observations occupying the same (or similar) x/y locations. Figure \@ref(fig:scatterplots) demonstrates one way to combat overplotting via alpha blending. When dealing with tens of thousands of points (or more), consider using `toWebGL()` to render plots using Canvas rather than SVG (more in Chapter \@ref(performance), or leveraging 2D density estimation (Section \@ref(rectangular-binning-in-r)).
 
 ```r
 subplot(
@@ -128,7 +128,7 @@ subplot(
 knitr::include_graphics("images/color-types.svg")
 ```
 
-There are numerous ways to alter the default color scale via the `colors` argument. This argument excepts one of the following: (1) a color brewer palette name (see the row names of `RColorBrewer::brewer.pal.info` for valid names), (2) a vector of colors to interpolate, or (3) a color interpolation function like `colorRamp()` or `scales::colour_ramp()`. Although this grants a lot of flexibility, one should be conscious of using a sequential colorscale for numeric variables (& ordered factors) as shown in \@ref(fig:color-numeric), and a qualitative colorscale for discrete variables as shown in \@ref(fig:color-discrete).
+There are numerous ways to alter the default color scale via the `colors` argument. This argument excepts one of the following: (1) a color brewer palette name (see the row names of `RColorBrewer::brewer.pal.info` for valid names), (2) a vector of colors to interpolate, or (3) a color interpolation function like `colorRamp()` or `scales::colour_ramp()`. Although this grants a lot of flexibility, one should be conscious of using a sequential colorscale for numeric variables (& ordered factors) as shown in Figure \@ref(fig:color-numeric), and a qualitative colorscale for discrete variables as shown in Figure \@ref(fig:color-discrete).
 
 ```r
 col1 <- c("#132B43", "#56B1F7")
@@ -347,11 +347,11 @@ knitr::include_graphics("images/linetypes-manual.svg")
 
 The `add_segments()` function essentially provides a way to connect two points [(`x`, `y`) to (`xend`, `yend`)] with a line. Segments form the building blocks for numerous useful chart types, including slopegraphs, dumbell charts, candlestick charts, and more. Slopegraphs and dumbell charts are useful for comparing numeric values across numerous categories. Candlestick charts are typically used for visualizing change in a financial asset over time. 
 
-Segments can also provide a useful alternative to `add_bars()` (covered in Section \@ref(bars-histograms)), especially for animations. In particular, Figure \@ref(fig:profile-pyramid) of Section \@ref(animation-support) shows how implement an animated population pyramid using segments instead of bars.
+Segments can also provide a useful alternative to `add_bars()` (covered in Chapter \@ref(bars-histograms)), especially for animations. In particular, Figure \@ref(fig:profile-pyramid) of Section \@ref(animation-support) shows how implement an animated population pyramid using segments instead of bars.
 
 #### Slopegraph
 
-The slope graph, made popular by @tufte2001, is a great way to compare the change in a measurement across numerous groups. This change could be along either a discrete or a continuous axis. For a continuous axis, the slopegraph could be thought of as a decomposition of a line graph into multiple segments. The **slopegraph** R package provides a succinct interface for creating slopegraphs with base or **ggplot2** graphics and also some convenient data sets which we'll make use of here [@slopegraph]. Figure \@ref(fig:slopegraph) recreates an example from @tufte2001, using the `gdp` data set from **slopegraph**, and demonstrates a common issue with labelling in slopegraphs -- it's easy to have overlapping labels when anchoring labels on data values. For that reason, this implementation leverages **plotly** ability to interactively edit annotation positions. See Section \@ref(editing-views) for similar examples of 'editing views'.
+The slope graph, made popular by @tufte2001, is a great way to compare the change in a measurement across numerous groups. This change could be along either a discrete or a continuous axis. For a continuous axis, the slopegraph could be thought of as a decomposition of a line graph into multiple segments. The **slopegraph** R package provides a succinct interface for creating slopegraphs with base or **ggplot2** graphics and also some convenient data sets which we'll make use of here [@slopegraph]. Figure \@ref(fig:slopegraph) recreates an example from @tufte2001, using the `gdp` data set from **slopegraph**, and demonstrates a common issue with labelling in slopegraphs -- it's easy to have overlapping labels when anchoring labels on data values. For that reason, this implementation leverages **plotly** ability to interactively edit annotation positions. See Chapter \@ref(editing-views) for similar examples of 'editing views'.
 
 ```{r, eval = FALSE, summary = "Click to show code"}
 data(gdp, package = "slopegraph")
@@ -456,7 +456,7 @@ knitr::include_graphics("images/candlestick.svg")
 
 ### Density plots
 
-In Section \@ref(bars-histograms), we leverage a number of algorithms in R for computing the "optimal" number of bins for a histogram, via `hist()`, and routing those results to `add_bars()`. We can leverage the `density()` function for computing kernel density estimates in a similar way, and route the results to `add_lines()`, as is done in Figure \@ref(fig:densities).
+In Chapter \@ref(bars-histograms), we leverage a number of algorithms in R for computing the "optimal" number of bins for a histogram, via `hist()`, and routing those results to `add_bars()`. We can leverage the `density()` function for computing kernel density estimates in a similar way, and route the results to `add_lines()`, as is done in Figure \@ref(fig:densities).
 
 ```r
 kerns <- c("gaussian", "epanechnikov", "rectangular", 
@@ -487,7 +487,7 @@ p
 
 ### Parallel Coordinates
 
-One very useful, but often overlooked, visualization technique is the parallel coordinates plot. Parallel coordinates provide a way to compare values along a common (or non-aligned) positional scale(s) -- the most basic of all perceptual tasks -- in more than 3 dimensions [@graphical-perception]. Usually each line represents every measurement for a given row (or observation) in a data set. It's true that plotly.js provides a trace type, parcoords, specifically for parallel coordinates that offers desirable interactive capabilities (e.g. highlighting and reordering of axes).^[See <https://plot.ly/r/parallel-coordinates-plot/> for some interactive examples]. However, it can also be useful learn how to use `add_lines()` to implement parallel coordinates as it can offer more flexibility and control over the axis scales.
+One very useful, but often overlooked, visualization technique is the parallel coordinates plot. Parallel coordinates provide a way to compare values along a common (or non-aligned) positional scale(s) -- the most basic of all perceptual tasks -- in more than 3 dimensions [@graphical-perception]. Usually each line represents every measurement for a given row (or observation) in a data set. It's true that plotly.js provides a trace type, parcoords, specifically for parallel coordinates that offers desirable interactive capabilities (e.g., highlighting and reordering of axes).^[See <https://plot.ly/r/parallel-coordinates-plot/> for some interactive examples]. However, it can also be useful learn how to use `add_lines()` to implement parallel coordinates as it can offer more flexibility and control over the axis scales.
 
 When measurements are on very different scales, some care must be taken, and variables must transformed to be put on a common scale. As Figure \@ref(fig:pcp-common) shows, even when variables are measured on a similar scale, it can still be informative to transform variables in different ways.
 
@@ -517,7 +517,7 @@ It is also worth noting that the **GGally** offers a `ggparcoord()` function whi
 
 ## Polygons
 
-The `add_polygons()` function is essentially equivalent to `add_paths()` with the [fill](https://plot.ly/r/reference/#scatter-fill) attribute set to "toself". Polygons form the basis for other, higher-level scatter-based layers (e.g., `add_ribbons()` and `add_sf()`) that don't have a dedicated plotly.js trace type. Polygons can be use to draw many things, but perhaps the most familiar application where you *might* want to use `add_polygons()` is to draw geo-spatial objects. If and when you use `add_polygons()` to draw a map, make sure you fix the aspect ratio (e.g. [`xaxis.scaleanchor`](https://plot.ly/r/reference/#layout-xaxis-scaleanchor)) and also consider using `plotly_empty()` over `plot_ly()` to hide axis labels, ticks, and the background grid. On the other hand, Section \@ref(maps-custom) shows you how to make a custom maps using the **sf** package and `add_sf()`, which is a bit of work to get started, but is absolutely worth the investment.
+The `add_polygons()` function is essentially equivalent to `add_paths()` with the [fill](https://plot.ly/r/reference/#scatter-fill) attribute set to "toself". Polygons form the basis for other, higher-level scatter-based layers (e.g., `add_ribbons()` and `add_sf()`) that don't have a dedicated plotly.js trace type. Polygons can be use to draw many things, but perhaps the most familiar application where you *might* want to use `add_polygons()` is to draw geo-spatial objects. If and when you use `add_polygons()` to draw a map, make sure you fix the aspect ratio (e.g., [`xaxis.scaleanchor`](https://plot.ly/r/reference/#layout-xaxis-scaleanchor)) and also consider using `plotly_empty()` over `plot_ly()` to hide axis labels, ticks, and the background grid. On the other hand, Section \@ref(maps-custom) shows you how to make a custom maps using the **sf** package and `add_sf()`, which is a bit of work to get started, but is absolutely worth the investment.
 
 ```r
 base <- map_data("world", "canada") %>%
@@ -535,7 +535,7 @@ base %>%
 knitr::include_graphics("images/map-canada.png")
 ```
 
-As discussion surrounding Figure \@ref(fig:split-color) points out, scatter-based polygon layers (i.e., `add_polygons()`, `add_ribbons()`, etc) render all the polygons using one plotly.js trace by default. This approach is computationally efficient, but it's not always desirable (e.g. can't have multiple fills per trace, interactivity is relatively limited). To work around the limitations, consider using `split` (or `color` with a discrete variable) to split the polygon data into multiple traces. Figure \@ref(fig:map-canada-split) demonstrates using `split` which will impose plotly.js' colorway to each trace (i.e., subregion) and leverage `hoveron` to generate one tooltip per sub-region.
+As discussion surrounding Figure \@ref(fig:split-color) points out, scatter-based polygon layers (i.e., `add_polygons()`, `add_ribbons()`, etc) render all the polygons using one plotly.js trace by default. This approach is computationally efficient, but it's not always desirable (e.g., can't have multiple fills per trace, interactivity is relatively limited). To work around the limitations, consider using `split` (or `color` with a discrete variable) to split the polygon data into multiple traces. Figure \@ref(fig:map-canada-split) demonstrates using `split` which will impose plotly.js' colorway to each trace (i.e., subregion) and leverage `hoveron` to generate one tooltip per sub-region.
 
 ```r
 add_polygons(base, split = ~subregion, hoveron = "fills")
@@ -561,4 +561,4 @@ broom::augment(m) %>%
 
 ```{r broom-lm, echo = FALSE, fig.cap = "(ref:broom-lm)"}
 knitr::include_graphics("images/broom-lm.svg")
-```
+```