diff --git a/.travis.yml b/.travis.yml index 620c0d0..415ef1c 100644 --- a/.travis.yml +++ b/.travis.yml @@ -1,3 +1,10 @@ +before_install: + - sudo add-apt-repository ppa:ubuntugis/ppa -y + - sudo apt-get update -q + - sudo apt-get install -y libudunits2-dev proj-bin gdal-bin libgdal-dev libproj-dev + libv8-dev libjq-dev libprotobuf-dev protobuf-compiler + + language: R cache: packages: true diff --git a/DESCRIPTION b/DESCRIPTION index 09d4cd1..23af236 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -28,6 +28,8 @@ Depends: trelliscopejs, ggstat, ggforce, + concaveman, + devtools, ggmosaic, ggalt, GGally, diff --git a/docs/introduction.html b/docs/introduction.html index c989aa2..9639c34 100644 --- a/docs/introduction.html +++ b/docs/introduction.html @@ -453,7 +453,7 @@

1.2 What you will learn

  • Creating views: introduces the process of transforming data into graphics via plotly’s programmatic interface. It focuses mostly on plot_ly(), which can interface directly with the underlying plotly.js graphing library, but emphasis is put on features unique to the R package that make it easier to transform data into graphics. Another way to create graphs with plotly is to use the ggplotly() function to transform ggplot2 graphs into plotly graphs. Section 2.3 discusses when and why ggplotly() might be desirable to plot_ly(). It’s also worth mentioning that this part (nor the book as a whole) does not intend to cover every possible chart type and option available in plotly – it’s more of a presentation of the most generally useful techniques with the greater R ecosystem in mind. For a more exhaustive gallery of examples of what plotly itself is capable of, see https://plot.ly/r/.

  • Publishing views: discusses various techniques for exporting (as well as embedding) plotly graphs to various file formats (e.g., HTML, svg, pdf, png, etc). Also, Chapter 12 demonstrates how one could leverage editable layout components HTML to touch-up a graph, then export to a static file format of interest before publication. Indeed, this book was created using the techniques from this section.

  • Combining multiple views: demonstrates how to combine multiple data views into a single web page (arranging) or graphic (animation). Most of these techniques are shown using plotly graphs, but techniques from Section 13.2 extend to any HTML content generated via htmltools (which includes htmlwidgets).

  • -
  • Linking multiple views: provides an overview of the two models for linking plotly graph(s) to other data views. The first model, covered in Section 16.1, outlines plotly’s support for linking views purely client-side, meaning the resulting graphs render in any web browser on any machine without requiring external software. The second model, covered in Chapter 17, demonstrates how to link plotly with other views via shiny, a reactive web application framework for R. Relatively speaking, the second model grants the R user way more power and flexbility, but comes at the cost of requiring more computational infrastructure. That being said, RStudio provides accessible resources for deploying shiny apps https://shiny.rstudio.com/articles/#deployment.

  • +
  • Linking multiple views: provides an overview of the two models for linking plotly graph(s) to other data views. The first model, covered in Section 16.1, outlines plotly’s support for linking views purely client-side, meaning the resulting graphs render in any web browser on any machine without requiring external software. The second model, covered in Chapter 17, demonstrates how to link plotly with other views via shiny, a reactive web application framework for R. Relatively speaking, the second model grants the R user way more power and flexibility, but comes at the cost of requiring more computational infrastructure. That being said, RStudio provides accessible resources for deploying shiny apps https://shiny.rstudio.com/articles/#deployment.

  • Custom behavior with JavaScript: demonstrates various ways to customize plotly graphs by writing custom JavaScript to handle certain user events. This part of the book is designed to be approachable for R users that want to learn just enough JavaScript to plotly to do something it doesn’t “natively” support.

  • Various special topics: offers a grab-bag of topics that address common questions, mostly related to the customization of plotly graphs in R.

  • diff --git a/docs/search_index.json b/docs/search_index.json index db4ab3e..a9ddf27 100644 --- a/docs/search_index.json +++ b/docs/search_index.json @@ -1,6 +1,6 @@ [ ["index.html", "Interactive web-based data visualization with R, plotly, and shiny Welcome", " Interactive web-based data visualization with R, plotly, and shiny Carson Sievert 2019-05-14 Welcome This is the website for “Interactive web-based data visualization with R, plotly, and shiny”. In this book, you’ll gain insight and practical skills for creating interactive and dynamic web graphics for data analysis from R. It makes heavy use of plotly for rendering graphics, but you’ll also learn about other R packages that augment a data science workflow, such as the tidyverse and shiny. Along the way, you’ll gain insight into best practices for visualization of high-dimensional data, statistical graphics, and graphical perception. By mastering these concepts and tools, you’ll impress your colleagues with your ability to generate more informative, engaging, and repeatable interactive graphics using free software that you can share over email, export to pdf/png, and more. An online version of this book, available at https://plotly-r.com, is free to use and is licensed under the Creative Commons Attribution-NonCommercial-NoDerivs 3.0 United States License. Both the print and online versions of the book are written in rmarkdown with bookdown and those source files are available at https://github.com/cpsievert/plotly_book. The online version will continue to evolve in between reprints of the physical book. "], -["introduction.html", "1 Introduction 1.1 Why interactive web graphics from R? 1.2 What you will learn 1.3 What you won’t learn (much of) 1.4 Prerequisites 1.5 Run code examples 1.6 Getting help and learning more 1.7 Acknowledgements 1.8 Colophon", " 1 Introduction 1.1 Why interactive web graphics from R? As Wickham and Grolemund (2018) argue, the exploratory phase of a data science workflow (Figure 1.1) requires lots of iteration between data manipulation, visualization, and modeling. Achieving these tasks through a programming language like R offers the opportunity to scale and automate tasks, document and track them, and reliably reproduce their output. That power, however, typically comes at the cost of increasing the amount of cognitive load involved relative to a GUI-based system.1 R packages like the tidyverse have been incredibly successful due to their ability to limit cognitive load without removing the benefits of performing analysis via code. Moreover, the tidyverse’s unifying principles of designing for humans, consistency, and composabilty makes iteration within and between these stages seamless – an important but often overlooked challenge in exploratory data analysis (EDA) (Tidyverse team 2018). FIGURE 1.1: The stages of a data science workflow from Wickham and Grolemund (2018). In fact, packages within the tidyverse such as dplyr (transformation) and ggplot2 (visualization) are such productive tools that many analysts use static ggplot2 graphics for EDA. Then, when it comes to communicating results, some analysts switch to another tool or language altogether (e.g., JavaScript) to generate interactive web graphics presenting their most important findings (Yau 2016; Quealy 2013). Unfortunately, this requires a heavy context switch that requires a totally different skillset and impedes productivity. Moreover, for the average analyst, the opportunity costs involved with becoming competent with the complex world of web technologies is simply not worth the required investment. Even before the web, interactive graphics were shown to have great promise in aiding the exploration of high-dimensional data (D. Cook, Buja, and Swayne 2007). The ASA maintains an incredible video library, http://stat-graphics.org/movies/, documenting the use of interactive statistical graphics for tasks that otherwise wouldn’t have been easy or possible using numerical summaries and/or static graphics alone. Roughly speaking, these tasks tend to fall under three categories: Identifying structure that would otherwise go missing (J. W. Tukey and Fisherkeller 1973). Diagnosing models and understanding algorithms (Wickham, Cook, and Hofmann 2015). Aiding the sense-making process by searching for information quickly without fully specified questions (Unwin and Hofmann 1999). Today, you can find and run some of these and similar Graphical User Interface (GUI) systems for creating interactive graphics: DataDesk https://datadescription.com/, GGobi http://www.ggobi.org/, Mondrian http://www.theusrus.de/Mondrian/, JMP https://www.jmp.com, Tableau https://www.tableau.com/. Although these GUI-based systems have nice properties, they don’t gel with a code-based workflow: any tasks you complete through a GUI likely can’t be replicated without human intervention. That means, if at any point, the data changes, and analysis outputs must be regenerated, you need to remember precisely how to reproduce the outcome, which isn’t necessarily easy, trustworthy, or economical. Moreover, GUI-based systems are typically ‘closed’ systems that don’t allow themselves to be easily customized, extended, or integrated with another system. Programming interactive graphics allows you to leverage all the benefits of a code-based workflow while also helping with tasks that are difficult to accomplish with code alone. For an example, if you were to visualize engine displacement (displ) versus miles per gallon (hwy) using the mpg dataset, you might wonder: “what are these cars with an unusually high value of hwy given their displ?”. Rather than trying to write code to query those observations, it would be more easier and intuitive to draw an outline around the points to query the data behind them. library(ggplot2) ggplot(mpg, aes(displ, hwy)) + geom_point() FIGURE 1.2: A scatterplot of engine displacement versus miles per gallon made with the ggplot2 package. Figure 1.3 demonstrates how we can transform Figure 1.2 into an interactive version that can be used to query and inspect points of interest. The framework that enables this kind of linked brushing is discussed in depth within Section 16.1, but the point here is that the added effort required to enable such functionality is relatively small. This is important, because although interactivity can augment exploration by allowing us to pursue follow-up questions, it’s typically only practical when we can create and alter them quickly. That’s because, in a true exploratory setting, you have to make lots of visualizations, and investigate lots of follow-up questions, before stumbling across something truly valuable. library(plotly) m <- highlight_key(mpg) p <- ggplot(m, aes(displ, hwy)) + geom_point() gg <- highlight(ggplotly(p), "plotly_selected") crosstalk::bscols(gg, DT::datatable(m)) FIGURE 1.3: Linked brushing in a scatterplot to query more information about points of interest. By lasso selecting a region of unusual points, we learn that corvette’s have an unusually high miles per gallon considering the engine size. For the interactive, see https://plotly-r.com/interactives/mpg-lasso.html When a valuable insight surfaces, since the code behind Figure 1.3 generates HTML, the web-based graphic can be easily shared with collaborators through email and/or incorporated inside a larger automated report or website. Moreover, since these interactive graphics are based on the htmlwidgets framework, they work seamlessly inside of larger rmarkdown documents, inside shiny apps, RStudio, Jupyter notebooks, the R prompt, and more. Being able to share interactive graphics with collaborators through these different mediums enhances the conversation – your colleagues can point out things you may not yet have considered and, in some cases, they can get immediate responses from the graphics themselves. In the final stages of an analysis, when it comes time to publish your work to a general audience, rather than relying on the audience to interact with the graphics and discover insight for themselves, it’s always a good idea to clearly highlight your findings. For example, from Figure 1.3, we’ve learned that most of these unusual points can be explained by a single feature of the data (model == 'corvette'). As shown in Figure 1.4, the geom_mark_hull() function from the ggforce package provides a helpful way to annotate those points with a hull. Moreover, as Chapter 12 demonstrates, it can also be helpful to add and/or edit annotations interactively when preparing a graphic for publication. library(ggforce) ggplot(mpg, aes(displ, hwy)) + geom_point() + geom_mark_hull(aes(filter = model == "corvette", label = model)) + labs( title = "Fuel economy data from 1999 and 2008 for 38 popular models of car", caption = "Source: https://fueleconomy.gov/", x = "Engine Displacement", y = "Miles Per Gallon" ) FIGURE 1.4: Using the ggforce package to annotate the corvette’s in this dataset. This simple example quickly shows how interactive web graphics can assist EDA (for another, slightly more in-depth example, see Section 2.3). Being able to program these graphics from R allows one to combine their functionality within a world-class computing environment for data analysis and statistics. Programming interactive graphics may not be as intuitive as using a GUI-based system, but making the investment pays dividends in terms of workflow improvements: automation, scaling, provenance, and flexibility. 1.2 What you will learn This book provides a foundation for learning how to make interactive web-based graphics for data analysis from R via plotly, without assuming any prior experience with web technologies. The goal is to provide the context you need to go beyond copying existing plotly examples to having a useful mental model of the underlying framework, its capabilities, and how it fits into the larger R ecosystem. By learning this mental model, you’ll have a better understanding of how to create more sophisticated visualizatons, fix common issues, improve performance, understand the limitations, and even contribute back to the project itself. You may already be familiar with existing plotly documentation (e.g., https://plot.ly/r/), which is essentially a language-agnostic how-to guide, but this book is meant to be more holistic tutorial written by and for the R user. This book also focuses primarily on features that are unique to the plotly R package (i.e., things that don’t work the same for Python or JavaScript). This ranges from creation of a single graph using the plot_ly() special named arguments that make it easier to map data to visuals: plot_ly(diamonds, x = ~cut, color = ~clarity, colors = "Accent") FIGURE 1.5: An example of what you’ll learn: Figure 2.7. For the interactive, see https://plotly-r.com/interactives/intro-show-hide-preview.html To its ability to link multiple data views purely client-side (see Section 16.1): FIGURE 1.6: An example of what you’ll learn: Figure 16.21. For the interactive, see https://plotly-r.com/interactives/storms-preview.html To advanced server-side linking with shiny to implement responsive and scalable crossfilters (see Section 17.4.2): FIGURE 1.7: An example of what you’ll learn: Figure 17.28. For the interactive, see https://plotly-r.com/interactives/shiny-crossfilter-preview.html By going through the code behind these examples, you’ll see that many of them leverage other R packages in their implementation. To highlight a few of the R packages that you’ll see: dplyr and tidyr For transforming data into a form suitable for the visualization method. ggplot2 and friends (e.g., GGally, ggmosaic, etc) For creating plotly visualizations that would be tedious to implement without ggplotly(). sf, rnaturalearth, cartogram For obtaining and working with geo-spatial data structures in R. stats, MASS, broom, and forecast For working with statistical models and summaries. shiny For running R code in response to user input. htmltools, htmlwidgets For combining multiple views and saving the result. This book contains six parts and each part contains numerous chapters. A summary of each part is provided below. Creating views: introduces the process of transforming data into graphics via plotly’s programmatic interface. It focuses mostly on plot_ly(), which can interface directly with the underlying plotly.js graphing library, but emphasis is put on features unique to the R package that make it easier to transform data into graphics. Another way to create graphs with plotly is to use the ggplotly() function to transform ggplot2 graphs into plotly graphs. Section 2.3 discusses when and why ggplotly() might be desirable to plot_ly(). It’s also worth mentioning that this part (nor the book as a whole) does not intend to cover every possible chart type and option available in plotly – it’s more of a presentation of the most generally useful techniques with the greater R ecosystem in mind. For a more exhaustive gallery of examples of what plotly itself is capable of, see https://plot.ly/r/. Publishing views: discusses various techniques for exporting (as well as embedding) plotly graphs to various file formats (e.g., HTML, svg, pdf, png, etc). Also, Chapter 12 demonstrates how one could leverage editable layout components HTML to touch-up a graph, then export to a static file format of interest before publication. Indeed, this book was created using the techniques from this section. Combining multiple views: demonstrates how to combine multiple data views into a single web page (arranging) or graphic (animation). Most of these techniques are shown using plotly graphs, but techniques from Section 13.2 extend to any HTML content generated via htmltools (which includes htmlwidgets). Linking multiple views: provides an overview of the two models for linking plotly graph(s) to other data views. The first model, covered in Section 16.1, outlines plotly’s support for linking views purely client-side, meaning the resulting graphs render in any web browser on any machine without requiring external software. The second model, covered in Chapter 17, demonstrates how to link plotly with other views via shiny, a reactive web application framework for R. Relatively speaking, the second model grants the R user way more power and flexbility, but comes at the cost of requiring more computational infrastructure. That being said, RStudio provides accessible resources for deploying shiny apps https://shiny.rstudio.com/articles/#deployment. Custom behavior with JavaScript: demonstrates various ways to customize plotly graphs by writing custom JavaScript to handle certain user events. This part of the book is designed to be approachable for R users that want to learn just enough JavaScript to plotly to do something it doesn’t “natively” support. Various special topics: offers a grab-bag of topics that address common questions, mostly related to the customization of plotly graphs in R. You might already notice that this book often uses the term ‘view’ or ‘data view’, so here we take a moment to frame its use in a wider context. As Wills (2008) puts it: “a ‘data view’ is anything that gives the user a way of examining data so as to gain insight and understanding. A data view is usually thought of as a barchart, scatterplot, or other traditional statistical graphic, but we use the term more generally, including ‘views’ such as the results of a regression analysis, a neural net prediction, or a set of descriptive statistics”. In this book, more often than not, the term ‘view’ typically refers to a plotly graph or other htmlwidgets (e.g., DT, leaflet, etc). In particular, Section 16.1 is all about linking multiple htmlwidgets together through a graphical database querying framework. However, the term ‘view’ takes on a more general interpretation in Chapter 17 since the reactive programming framework that shiny provides allows us to have a more general conversation surrounding linked data views. 1.3 What you won’t learn (much of) 1.3.1 Web technologies Although this book is fundamentally about creating web graphics, it does not aim to teach you web technologies (e.g., HTML, SVG, CSS, JavaScript, etc). It’s true that mastering these technologies grants you the ability to build really impressive websites, but even expert web developers would say their skillset is much better suited for expository rather than exploratory visualization. That’s because, most web programming tools are not well-suited for the exploratory phase of a data science workflow where iteration between data visualization, transformation, and modeling is a necessary task that often impedes hypothesis generation and sense-making. As a result, for most data analysts whose primary function is to derive insight from data, the opportunity costs involved with mastering web technologies is usually not worth the investment. That being said, learning a little about web technologies can have a relatively large payoff with directed learning and instruction. In Chapter 18, you’ll learn how to customize plotly graphs with JavaScript – even if you haven’t seen JavaScript before, this chapter should be approachable, insightful, and provide you with some useful examples. 1.3.2 d3js The JavaScript library D3 is a great tool for data visualization assuming you’re familiar with web technologies and are primarily interested in expository (not exploratory) visualization. There are already lots of great resources for learning D3, including the numerous books by Murray (2013) and Murray (2017). It’s worth noting, however, if you do know D3, you can easily leverage it from a web page that are already a plotly graph, as demonstrated in Figure 22.1. 1.3.3 ggplot2 The book does contain some ggplot2 code examples (which are then converted to plotly via ggplotly()), but it’s not designed to teach you ggplot2. For those looking to learn ggplot2, I recommend using the learning materials listed at https://ggplot2.tidyverse.org. 1.3.4 Graphical data analysis How to perform data analysis via graphics (carefully, correctly, and creatively) is a large topic unto itself. Although this book does have examples of graphical data analysis, it does not aim to provide a comprehensive foundation. For nice comprehensive resources on the topic, see Unwin (2015) and D. Cook and Swayne (2007). 1.3.5 Data visualization best practices Encoding information in a graphic (concisely and effectively) is a large topic unto itself. Although this book does have some ramblings related to best practices in data visualization, it does not aim to provide a comprehensive foundation. For some approachable and fun resources on the topic, see Tufte (2001a), Yau (2011), Healey (2018), and Wilke (2018). 1.4 Prerequisites For those new to R and/or data visualization, R for Data Science provides an excellent foundation for understanding the vast majority of concepts covered in this book (Wickham and Grolemund 2018). In particular, if you have a solid grasp on Part I: Explore, Part II: Wrangle, and Part III: Program, you should be able to understand almost everything here. Although not explicitly covered, the book does make references to (and was creating using) rmarkdown, so if you’re new to rmarkdown, I also recommend reading the R Markdown chapter. 1.5 Run code examples This book contains many code examples in an effort to teach the art and science behind creating interactive web-based graphics using plotly. To see the actual interactive result of the code (rather than a video or static version), you may want to run the code examples in a suitable computational environment. Visit http://bit.ly/plotly-book-cloud for a cloud-based instance of RStudio with all the required software to run the code examples in this book. Most, if not all of these code examples assume you have the plotly package loaded: library(plotly) Within some chapters, there may be examples that assume packages we loaded during a previous example. If you’d like to avoid this situation, please load the library(plotlyBook) If you’d like to run examples on your local machine (instead of RStudio Cloud), you can install all the necessary R packages with: if (!require(remotes)) install.packages("remotes") remotes::install_github("cpsievert/plotly_book") 1.6 Getting help and learning more As Wickham and Grolemund (2018) states, “This book is not an island; there is no single resource that will allow you to master R [or plotly]. As you start to apply the techniques described in this book to your own data you will soon find questions that I do not answer. This section describes a few tips on how to get help, and to help you keep learning.” These tips on how to get help (e.g., Google, StackOverflow, Twitter, etc) also apply to getting help with plotly. RStudio’s community is another great place to ask broader questions about all things R and plotly. It’s worth mentioning that the R community is incredibly welcoming, compassionate, and generous; especially if you can demonstrate that you’ve done your research and/or provide minimally reproducible example of your problem. 1.7 Acknowledgements This book wouldn’t be possible without the generous assistance and mentorship of many people: Heike Hofmann and Di Cook for their mentorship and many helpful conversations about interactive graphics. Toby Dylan Hocking for many helpful conversations, his mentorship in the R packages animint and plotly, and laying the original foundation behind ggplotly(). Joe Cheng for many helpful conversations and inspiring Section 16.1. Étienne Tétreault-Pinard, Alex Johnson, and the other plotly.js core developers for responding to my feature requests and bug reports. Yihui Xie for his work on knitr, rmarkdown, bookdown, bookdown-crc, and responding to my feature requests. Anthony Unwin for helpful feedback, suggestions, and for inspiring Figure 16.13. Hadley Wickham and the ggplot2 team for maintaining ggplot2. Hadley Wickham and Garret Grolemund for writing R for Data Science and allowing me to model this introduction after their introduction. Kent Russell for contributions to plotly and writing reactR. Adam Loy for inspiring Figure 14.5. Many other R community members who contributed to the plotly package and provided feedback and corrections for this book. 1.8 Colophon An online version of this book is available at https://plotly-r.com. It will continue to evolve in between reprints of the physical book. The source of the book is available at https://github.com/cpsievert/plotly_book. The book is powered by https://bookdown.org which makes it easy to turn R markdown files into HTML, PDF, and EPUB. This book was built with the following computing environment: devtools::session_info("plotly") #> ─ Session info ────────────────────────────────────── #> setting value #> version R version 3.6.0 (2019-04-26) #> os macOS Mojave 10.14.3 #> system x86_64, darwin15.6.0 #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz America/Chicago #> date 2019-05-14 #> #> ─ Packages ────────────────────────────────────────── #> package * version date lib #> askpass 1.1 2019-01-13 [1] #> assertthat 0.2.1 2019-03-21 [1] #> backports 1.1.4 2019-04-10 [1] #> base64enc 0.1-3 2015-07-28 [1] #> BH 1.69.0-1 2019-01-07 [1] #> cli 1.1.0 2019-03-19 [1] #> colorspace 1.4-1 2019-03-18 [1] #> crayon 1.3.4 2017-09-16 [1] #> crosstalk 1.0.1 2019-05-02 [1] #> curl 3.3 2019-01-10 [1] #> data.table 1.12.2 2019-04-07 [1] #> digest 0.6.18 2018-10-10 [1] #> dplyr * 0.8.0.1 2019-02-15 [1] #> fansi 0.4.0 2018-10-05 [1] #> ggplot2 * 3.1.1 2019-04-07 [1] #> glue 1.3.1 2019-03-12 [1] #> gtable 0.3.0 2019-03-25 [1] #> hexbin 1.27.2 2018-01-15 [1] #> htmltools 0.3.6 2017-04-28 [1] #> htmlwidgets 1.3 2018-09-30 [1] #> httpuv 1.5.1 2019-04-05 [1] #> httr 1.4.0 2018-12-11 [1] #> jsonlite 1.6 2018-12-07 [1] #> labeling 0.3 2014-08-23 [1] #> later 0.8.0 2019-02-11 [1] #> lattice 0.20-38 2018-11-04 [1] #> lazyeval 0.2.2 2019-03-15 [1] #> magrittr 1.5 2014-11-22 [1] #> MASS 7.3-51.4 2019-03-31 [1] #> Matrix 1.2-17 2019-03-22 [1] #> mgcv 1.8-28 2019-03-21 [1] #> mime 0.6 2018-10-05 [1] #> munsell 0.5.0 2018-06-12 [1] #> nlme 3.1-140 2019-05-12 [1] #> openssl 1.3 2019-03-22 [1] #> pillar 1.4.0 2019-05-11 [1] #> pkgconfig 2.0.2 2018-08-16 [1] #> plogr 0.2.0 2018-03-25 [1] #> plotly * 4.9.0 2019-04-10 [1] #> plyr 1.8.4 2016-06-08 [1] #> promises 1.0.1 2018-04-13 [1] #> purrr 0.3.2 2019-03-15 [1] #> R6 2.4.0 2019-02-14 [1] #> RColorBrewer 1.1-2 2014-12-07 [1] #> Rcpp 1.0.1 2019-03-17 [1] #> reshape2 1.4.3 2017-12-11 [1] #> rlang 0.3.4 2019-04-07 [1] #> scales 1.0.0 2018-08-09 [1] #> shiny 1.3.2.9000 2019-05-10 [1] #> sourcetools 0.1.7 2018-04-25 [1] #> stringi 1.4.3 2019-03-12 [1] #> stringr 1.4.0 2019-02-10 [1] #> sys 3.2 2019-04-23 [1] #> tibble 2.1.1 2019-03-16 [1] #> tidyr 0.8.3 2019-03-01 [1] #> tidyselect 0.2.5 2018-10-11 [1] #> utf8 1.1.4 2018-05-24 [1] #> vctrs 0.1.0 2018-11-29 [1] #> viridisLite 0.3.0 2018-02-01 [1] #> withr 2.1.2 2018-03-15 [1] #> xtable 1.8-4 2019-04-21 [1] #> yaml 2.2.0 2018-07-25 [1] #> zeallot 0.1.0 2018-01-28 [1] #> source #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> Github (rstudio/crosstalk@feaf86b) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> local #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> #> [1] /Library/Frameworks/R.framework/Versions/3.6/Resources/library References "], +["introduction.html", "1 Introduction 1.1 Why interactive web graphics from R? 1.2 What you will learn 1.3 What you won’t learn (much of) 1.4 Prerequisites 1.5 Run code examples 1.6 Getting help and learning more 1.7 Acknowledgements 1.8 Colophon", " 1 Introduction 1.1 Why interactive web graphics from R? As Wickham and Grolemund (2018) argue, the exploratory phase of a data science workflow (Figure 1.1) requires lots of iteration between data manipulation, visualization, and modeling. Achieving these tasks through a programming language like R offers the opportunity to scale and automate tasks, document and track them, and reliably reproduce their output. That power, however, typically comes at the cost of increasing the amount of cognitive load involved relative to a GUI-based system.1 R packages like the tidyverse have been incredibly successful due to their ability to limit cognitive load without removing the benefits of performing analysis via code. Moreover, the tidyverse’s unifying principles of designing for humans, consistency, and composabilty makes iteration within and between these stages seamless – an important but often overlooked challenge in exploratory data analysis (EDA) (Tidyverse team 2018). FIGURE 1.1: The stages of a data science workflow from Wickham and Grolemund (2018). In fact, packages within the tidyverse such as dplyr (transformation) and ggplot2 (visualization) are such productive tools that many analysts use static ggplot2 graphics for EDA. Then, when it comes to communicating results, some analysts switch to another tool or language altogether (e.g., JavaScript) to generate interactive web graphics presenting their most important findings (Yau 2016; Quealy 2013). Unfortunately, this requires a heavy context switch that requires a totally different skillset and impedes productivity. Moreover, for the average analyst, the opportunity costs involved with becoming competent with the complex world of web technologies is simply not worth the required investment. Even before the web, interactive graphics were shown to have great promise in aiding the exploration of high-dimensional data (D. Cook, Buja, and Swayne 2007). The ASA maintains an incredible video library, http://stat-graphics.org/movies/, documenting the use of interactive statistical graphics for tasks that otherwise wouldn’t have been easy or possible using numerical summaries and/or static graphics alone. Roughly speaking, these tasks tend to fall under three categories: Identifying structure that would otherwise go missing (J. W. Tukey and Fisherkeller 1973). Diagnosing models and understanding algorithms (Wickham, Cook, and Hofmann 2015). Aiding the sense-making process by searching for information quickly without fully specified questions (Unwin and Hofmann 1999). Today, you can find and run some of these and similar Graphical User Interface (GUI) systems for creating interactive graphics: DataDesk https://datadescription.com/, GGobi http://www.ggobi.org/, Mondrian http://www.theusrus.de/Mondrian/, JMP https://www.jmp.com, Tableau https://www.tableau.com/. Although these GUI-based systems have nice properties, they don’t gel with a code-based workflow: any tasks you complete through a GUI likely can’t be replicated without human intervention. That means, if at any point, the data changes, and analysis outputs must be regenerated, you need to remember precisely how to reproduce the outcome, which isn’t necessarily easy, trustworthy, or economical. Moreover, GUI-based systems are typically ‘closed’ systems that don’t allow themselves to be easily customized, extended, or integrated with another system. Programming interactive graphics allows you to leverage all the benefits of a code-based workflow while also helping with tasks that are difficult to accomplish with code alone. For an example, if you were to visualize engine displacement (displ) versus miles per gallon (hwy) using the mpg dataset, you might wonder: “what are these cars with an unusually high value of hwy given their displ?”. Rather than trying to write code to query those observations, it would be more easier and intuitive to draw an outline around the points to query the data behind them. library(ggplot2) ggplot(mpg, aes(displ, hwy)) + geom_point() FIGURE 1.2: A scatterplot of engine displacement versus miles per gallon made with the ggplot2 package. Figure 1.3 demonstrates how we can transform Figure 1.2 into an interactive version that can be used to query and inspect points of interest. The framework that enables this kind of linked brushing is discussed in depth within Section 16.1, but the point here is that the added effort required to enable such functionality is relatively small. This is important, because although interactivity can augment exploration by allowing us to pursue follow-up questions, it’s typically only practical when we can create and alter them quickly. That’s because, in a true exploratory setting, you have to make lots of visualizations, and investigate lots of follow-up questions, before stumbling across something truly valuable. library(plotly) m <- highlight_key(mpg) p <- ggplot(m, aes(displ, hwy)) + geom_point() gg <- highlight(ggplotly(p), "plotly_selected") crosstalk::bscols(gg, DT::datatable(m)) FIGURE 1.3: Linked brushing in a scatterplot to query more information about points of interest. By lasso selecting a region of unusual points, we learn that corvette’s have an unusually high miles per gallon considering the engine size. For the interactive, see https://plotly-r.com/interactives/mpg-lasso.html When a valuable insight surfaces, since the code behind Figure 1.3 generates HTML, the web-based graphic can be easily shared with collaborators through email and/or incorporated inside a larger automated report or website. Moreover, since these interactive graphics are based on the htmlwidgets framework, they work seamlessly inside of larger rmarkdown documents, inside shiny apps, RStudio, Jupyter notebooks, the R prompt, and more. Being able to share interactive graphics with collaborators through these different mediums enhances the conversation – your colleagues can point out things you may not yet have considered and, in some cases, they can get immediate responses from the graphics themselves. In the final stages of an analysis, when it comes time to publish your work to a general audience, rather than relying on the audience to interact with the graphics and discover insight for themselves, it’s always a good idea to clearly highlight your findings. For example, from Figure 1.3, we’ve learned that most of these unusual points can be explained by a single feature of the data (model == 'corvette'). As shown in Figure 1.4, the geom_mark_hull() function from the ggforce package provides a helpful way to annotate those points with a hull. Moreover, as Chapter 12 demonstrates, it can also be helpful to add and/or edit annotations interactively when preparing a graphic for publication. library(ggforce) ggplot(mpg, aes(displ, hwy)) + geom_point() + geom_mark_hull(aes(filter = model == "corvette", label = model)) + labs( title = "Fuel economy data from 1999 and 2008 for 38 popular models of car", caption = "Source: https://fueleconomy.gov/", x = "Engine Displacement", y = "Miles Per Gallon" ) FIGURE 1.4: Using the ggforce package to annotate the corvette’s in this dataset. This simple example quickly shows how interactive web graphics can assist EDA (for another, slightly more in-depth example, see Section 2.3). Being able to program these graphics from R allows one to combine their functionality within a world-class computing environment for data analysis and statistics. Programming interactive graphics may not be as intuitive as using a GUI-based system, but making the investment pays dividends in terms of workflow improvements: automation, scaling, provenance, and flexibility. 1.2 What you will learn This book provides a foundation for learning how to make interactive web-based graphics for data analysis from R via plotly, without assuming any prior experience with web technologies. The goal is to provide the context you need to go beyond copying existing plotly examples to having a useful mental model of the underlying framework, its capabilities, and how it fits into the larger R ecosystem. By learning this mental model, you’ll have a better understanding of how to create more sophisticated visualizatons, fix common issues, improve performance, understand the limitations, and even contribute back to the project itself. You may already be familiar with existing plotly documentation (e.g., https://plot.ly/r/), which is essentially a language-agnostic how-to guide, but this book is meant to be more holistic tutorial written by and for the R user. This book also focuses primarily on features that are unique to the plotly R package (i.e., things that don’t work the same for Python or JavaScript). This ranges from creation of a single graph using the plot_ly() special named arguments that make it easier to map data to visuals: plot_ly(diamonds, x = ~cut, color = ~clarity, colors = "Accent") FIGURE 1.5: An example of what you’ll learn: Figure 2.7. For the interactive, see https://plotly-r.com/interactives/intro-show-hide-preview.html To its ability to link multiple data views purely client-side (see Section 16.1): FIGURE 1.6: An example of what you’ll learn: Figure 16.21. For the interactive, see https://plotly-r.com/interactives/storms-preview.html To advanced server-side linking with shiny to implement responsive and scalable crossfilters (see Section 17.4.2): FIGURE 1.7: An example of what you’ll learn: Figure 17.28. For the interactive, see https://plotly-r.com/interactives/shiny-crossfilter-preview.html By going through the code behind these examples, you’ll see that many of them leverage other R packages in their implementation. To highlight a few of the R packages that you’ll see: dplyr and tidyr For transforming data into a form suitable for the visualization method. ggplot2 and friends (e.g., GGally, ggmosaic, etc) For creating plotly visualizations that would be tedious to implement without ggplotly(). sf, rnaturalearth, cartogram For obtaining and working with geo-spatial data structures in R. stats, MASS, broom, and forecast For working with statistical models and summaries. shiny For running R code in response to user input. htmltools, htmlwidgets For combining multiple views and saving the result. This book contains six parts and each part contains numerous chapters. A summary of each part is provided below. Creating views: introduces the process of transforming data into graphics via plotly’s programmatic interface. It focuses mostly on plot_ly(), which can interface directly with the underlying plotly.js graphing library, but emphasis is put on features unique to the R package that make it easier to transform data into graphics. Another way to create graphs with plotly is to use the ggplotly() function to transform ggplot2 graphs into plotly graphs. Section 2.3 discusses when and why ggplotly() might be desirable to plot_ly(). It’s also worth mentioning that this part (nor the book as a whole) does not intend to cover every possible chart type and option available in plotly – it’s more of a presentation of the most generally useful techniques with the greater R ecosystem in mind. For a more exhaustive gallery of examples of what plotly itself is capable of, see https://plot.ly/r/. Publishing views: discusses various techniques for exporting (as well as embedding) plotly graphs to various file formats (e.g., HTML, svg, pdf, png, etc). Also, Chapter 12 demonstrates how one could leverage editable layout components HTML to touch-up a graph, then export to a static file format of interest before publication. Indeed, this book was created using the techniques from this section. Combining multiple views: demonstrates how to combine multiple data views into a single web page (arranging) or graphic (animation). Most of these techniques are shown using plotly graphs, but techniques from Section 13.2 extend to any HTML content generated via htmltools (which includes htmlwidgets). Linking multiple views: provides an overview of the two models for linking plotly graph(s) to other data views. The first model, covered in Section 16.1, outlines plotly’s support for linking views purely client-side, meaning the resulting graphs render in any web browser on any machine without requiring external software. The second model, covered in Chapter 17, demonstrates how to link plotly with other views via shiny, a reactive web application framework for R. Relatively speaking, the second model grants the R user way more power and flexibility, but comes at the cost of requiring more computational infrastructure. That being said, RStudio provides accessible resources for deploying shiny apps https://shiny.rstudio.com/articles/#deployment. Custom behavior with JavaScript: demonstrates various ways to customize plotly graphs by writing custom JavaScript to handle certain user events. This part of the book is designed to be approachable for R users that want to learn just enough JavaScript to plotly to do something it doesn’t “natively” support. Various special topics: offers a grab-bag of topics that address common questions, mostly related to the customization of plotly graphs in R. You might already notice that this book often uses the term ‘view’ or ‘data view’, so here we take a moment to frame its use in a wider context. As Wills (2008) puts it: “a ‘data view’ is anything that gives the user a way of examining data so as to gain insight and understanding. A data view is usually thought of as a barchart, scatterplot, or other traditional statistical graphic, but we use the term more generally, including ‘views’ such as the results of a regression analysis, a neural net prediction, or a set of descriptive statistics”. In this book, more often than not, the term ‘view’ typically refers to a plotly graph or other htmlwidgets (e.g., DT, leaflet, etc). In particular, Section 16.1 is all about linking multiple htmlwidgets together through a graphical database querying framework. However, the term ‘view’ takes on a more general interpretation in Chapter 17 since the reactive programming framework that shiny provides allows us to have a more general conversation surrounding linked data views. 1.3 What you won’t learn (much of) 1.3.1 Web technologies Although this book is fundamentally about creating web graphics, it does not aim to teach you web technologies (e.g., HTML, SVG, CSS, JavaScript, etc). It’s true that mastering these technologies grants you the ability to build really impressive websites, but even expert web developers would say their skillset is much better suited for expository rather than exploratory visualization. That’s because, most web programming tools are not well-suited for the exploratory phase of a data science workflow where iteration between data visualization, transformation, and modeling is a necessary task that often impedes hypothesis generation and sense-making. As a result, for most data analysts whose primary function is to derive insight from data, the opportunity costs involved with mastering web technologies is usually not worth the investment. That being said, learning a little about web technologies can have a relatively large payoff with directed learning and instruction. In Chapter 18, you’ll learn how to customize plotly graphs with JavaScript – even if you haven’t seen JavaScript before, this chapter should be approachable, insightful, and provide you with some useful examples. 1.3.2 d3js The JavaScript library D3 is a great tool for data visualization assuming you’re familiar with web technologies and are primarily interested in expository (not exploratory) visualization. There are already lots of great resources for learning D3, including the numerous books by Murray (2013) and Murray (2017). It’s worth noting, however, if you do know D3, you can easily leverage it from a web page that are already a plotly graph, as demonstrated in Figure 22.1. 1.3.3 ggplot2 The book does contain some ggplot2 code examples (which are then converted to plotly via ggplotly()), but it’s not designed to teach you ggplot2. For those looking to learn ggplot2, I recommend using the learning materials listed at https://ggplot2.tidyverse.org. 1.3.4 Graphical data analysis How to perform data analysis via graphics (carefully, correctly, and creatively) is a large topic unto itself. Although this book does have examples of graphical data analysis, it does not aim to provide a comprehensive foundation. For nice comprehensive resources on the topic, see Unwin (2015) and D. Cook and Swayne (2007). 1.3.5 Data visualization best practices Encoding information in a graphic (concisely and effectively) is a large topic unto itself. Although this book does have some ramblings related to best practices in data visualization, it does not aim to provide a comprehensive foundation. For some approachable and fun resources on the topic, see Tufte (2001a), Yau (2011), Healey (2018), and Wilke (2018). 1.4 Prerequisites For those new to R and/or data visualization, R for Data Science provides an excellent foundation for understanding the vast majority of concepts covered in this book (Wickham and Grolemund 2018). In particular, if you have a solid grasp on Part I: Explore, Part II: Wrangle, and Part III: Program, you should be able to understand almost everything here. Although not explicitly covered, the book does make references to (and was creating using) rmarkdown, so if you’re new to rmarkdown, I also recommend reading the R Markdown chapter. 1.5 Run code examples This book contains many code examples in an effort to teach the art and science behind creating interactive web-based graphics using plotly. To see the actual interactive result of the code (rather than a video or static version), you may want to run the code examples in a suitable computational environment. Visit http://bit.ly/plotly-book-cloud for a cloud-based instance of RStudio with all the required software to run the code examples in this book. Most, if not all of these code examples assume you have the plotly package loaded: library(plotly) Within some chapters, there may be examples that assume packages we loaded during a previous example. If you’d like to avoid this situation, please load the library(plotlyBook) If you’d like to run examples on your local machine (instead of RStudio Cloud), you can install all the necessary R packages with: if (!require(remotes)) install.packages("remotes") remotes::install_github("cpsievert/plotly_book") 1.6 Getting help and learning more As Wickham and Grolemund (2018) states, “This book is not an island; there is no single resource that will allow you to master R [or plotly]. As you start to apply the techniques described in this book to your own data you will soon find questions that I do not answer. This section describes a few tips on how to get help, and to help you keep learning.” These tips on how to get help (e.g., Google, StackOverflow, Twitter, etc) also apply to getting help with plotly. RStudio’s community is another great place to ask broader questions about all things R and plotly. It’s worth mentioning that the R community is incredibly welcoming, compassionate, and generous; especially if you can demonstrate that you’ve done your research and/or provide minimally reproducible example of your problem. 1.7 Acknowledgements This book wouldn’t be possible without the generous assistance and mentorship of many people: Heike Hofmann and Di Cook for their mentorship and many helpful conversations about interactive graphics. Toby Dylan Hocking for many helpful conversations, his mentorship in the R packages animint and plotly, and laying the original foundation behind ggplotly(). Joe Cheng for many helpful conversations and inspiring Section 16.1. Étienne Tétreault-Pinard, Alex Johnson, and the other plotly.js core developers for responding to my feature requests and bug reports. Yihui Xie for his work on knitr, rmarkdown, bookdown, bookdown-crc, and responding to my feature requests. Anthony Unwin for helpful feedback, suggestions, and for inspiring Figure 16.13. Hadley Wickham and the ggplot2 team for maintaining ggplot2. Hadley Wickham and Garret Grolemund for writing R for Data Science and allowing me to model this introduction after their introduction. Kent Russell for contributions to plotly and writing reactR. Adam Loy for inspiring Figure 14.5. Many other R community members who contributed to the plotly package and provided feedback and corrections for this book. 1.8 Colophon An online version of this book is available at https://plotly-r.com. It will continue to evolve in between reprints of the physical book. The source of the book is available at https://github.com/cpsievert/plotly_book. The book is powered by https://bookdown.org which makes it easy to turn R markdown files into HTML, PDF, and EPUB. This book was built with the following computing environment: devtools::session_info("plotly") #> ─ Session info ────────────────────────────────────── #> setting value #> version R version 3.6.0 (2019-04-26) #> os macOS Mojave 10.14.3 #> system x86_64, darwin15.6.0 #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz America/Chicago #> date 2019-05-14 #> #> ─ Packages ────────────────────────────────────────── #> package * version date lib #> askpass 1.1 2019-01-13 [1] #> assertthat 0.2.1 2019-03-21 [1] #> backports 1.1.4 2019-04-10 [1] #> base64enc 0.1-3 2015-07-28 [1] #> BH 1.69.0-1 2019-01-07 [1] #> cli 1.1.0 2019-03-19 [1] #> colorspace 1.4-1 2019-03-18 [1] #> crayon 1.3.4 2017-09-16 [1] #> crosstalk 1.0.1 2019-05-02 [1] #> curl 3.3 2019-01-10 [1] #> data.table 1.12.2 2019-04-07 [1] #> digest 0.6.18 2018-10-10 [1] #> dplyr * 0.8.0.1 2019-02-15 [1] #> fansi 0.4.0 2018-10-05 [1] #> ggplot2 * 3.1.1 2019-04-07 [1] #> glue 1.3.1 2019-03-12 [1] #> gtable 0.3.0 2019-03-25 [1] #> hexbin 1.27.2 2018-01-15 [1] #> htmltools 0.3.6 2017-04-28 [1] #> htmlwidgets 1.3 2018-09-30 [1] #> httpuv 1.5.1 2019-04-05 [1] #> httr 1.4.0 2018-12-11 [1] #> jsonlite 1.6 2018-12-07 [1] #> labeling 0.3 2014-08-23 [1] #> later 0.8.0 2019-02-11 [1] #> lattice 0.20-38 2018-11-04 [1] #> lazyeval 0.2.2 2019-03-15 [1] #> magrittr 1.5 2014-11-22 [1] #> MASS 7.3-51.4 2019-03-31 [1] #> Matrix 1.2-17 2019-03-22 [1] #> mgcv 1.8-28 2019-03-21 [1] #> mime 0.6 2018-10-05 [1] #> munsell 0.5.0 2018-06-12 [1] #> nlme 3.1-140 2019-05-12 [1] #> openssl 1.3 2019-03-22 [1] #> pillar 1.4.0 2019-05-11 [1] #> pkgconfig 2.0.2 2018-08-16 [1] #> plogr 0.2.0 2018-03-25 [1] #> plotly * 4.9.0 2019-04-10 [1] #> plyr 1.8.4 2016-06-08 [1] #> promises 1.0.1 2018-04-13 [1] #> purrr 0.3.2 2019-03-15 [1] #> R6 2.4.0 2019-02-14 [1] #> RColorBrewer 1.1-2 2014-12-07 [1] #> Rcpp 1.0.1 2019-03-17 [1] #> reshape2 1.4.3 2017-12-11 [1] #> rlang 0.3.4 2019-04-07 [1] #> scales 1.0.0 2018-08-09 [1] #> shiny 1.3.2.9000 2019-05-10 [1] #> sourcetools 0.1.7 2018-04-25 [1] #> stringi 1.4.3 2019-03-12 [1] #> stringr 1.4.0 2019-02-10 [1] #> sys 3.2 2019-04-23 [1] #> tibble 2.1.1 2019-03-16 [1] #> tidyr 0.8.3 2019-03-01 [1] #> tidyselect 0.2.5 2018-10-11 [1] #> utf8 1.1.4 2018-05-24 [1] #> vctrs 0.1.0 2018-11-29 [1] #> viridisLite 0.3.0 2018-02-01 [1] #> withr 2.1.2 2018-03-15 [1] #> xtable 1.8-4 2019-04-21 [1] #> yaml 2.2.0 2018-07-25 [1] #> zeallot 0.1.0 2018-01-28 [1] #> source #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> Github (rstudio/crosstalk@feaf86b) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> local #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> #> [1] /Library/Frameworks/R.framework/Versions/3.6/Resources/library References "], ["overview.html", "2 Overview 2.1 Intro to plot_ly() 2.2 Intro to plotly.js 2.3 Intro to ggplotly()", " 2 Overview This part of the book teaches you how to leverage the plotly R package to create a variety of interactive graphics. There are two main ways to creating a plotly object: either by transforming a ggplot2 object (via ggplotly()) into a plotly object or by directly initializing a plotly object with plot_ly()/plot_geo()/plot_mapbox(). Both approaches have somewhat complementary strengths and weaknesses, so it can pay off to learn both approaches. Moreover, both approaches are an implementation of the Grammar of Graphics and both are powered by the JavaScript graphing library plotly.js, so many of the same concepts and tools that you learn for one interface can be reused in the other. The subsequent chapters within this ‘Creating views’ part dive into specific examples and use cases, but this introductory chapter outlines some over-arching concepts related to plotly in general. It also provides definitions for terminology used throughout the book and introduces some concepts useful for understanding the infrastructure behind any plotly object. Most of these details aren’t necessarily required to get started with plotly, but it will envitably help you get ‘un-stuck’, write better code, and do more advanced things with plotly. 2.1 Intro to plot_ly() Any graph made with the plotly R package is powered by the JavaScript library plotly.js. The plot_ly() function provides a ‘direct’ interface to plotly.js with some additional abstractions to help reduce typing. These abstractions, inspired by the Grammar of Graphics and ggplot2, make it much faster to iterate from one graphic to another, making it easier to discover interesting features in the data (Wilkinson 2005; Wickham 2009). To demonstrate, we’ll use plot_ly() to explore the diamonds dataset from ggplot2 and learn a bit how plotly and plotly.js work along the way. # load the plotly R package library(plotly) # load the diamonds dataset from the ggplot2 package data(diamonds, package = "ggplot2") diamonds #> # A tibble: 53,940 x 10 #> carat cut color clarity depth table price x #> <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> #> 1 0.23 Ideal E SI2 61.5 55 326 3.95 #> 2 0.21 Prem… E SI1 59.8 61 326 3.89 #> 3 0.23 Good E VS1 56.9 65 327 4.05 #> 4 0.290 Prem… I VS2 62.4 58 334 4.2 #> 5 0.31 Good J SI2 63.3 58 335 4.34 #> 6 0.24 Very… J VVS2 62.8 57 336 3.94 #> # … with 5.393e+04 more rows, and 2 more variables: #> # y <dbl>, z <dbl> If we assign variable names (e.g., cut, clarity, etc) to visual properties (e.g., x, y, color, etc) within plot_ly(), as done in Figure 2.1, it tries to find a sensible geometric representation of that information for us. Shortly we’ll cover how to specify these geometric representations (as well as other visual encodings) to create different kinds of charts. # create three visualizations of the diamonds dataset plot_ly(diamonds, x = ~cut) plot_ly(diamonds, x = ~cut, y = ~clarity) plot_ly(diamonds, x = ~cut, color = ~clarity, colors = "Accent") <img src=“images/intro-defaults.svg” alt=“Three examples of visualizing categorical data with plot_ly(): (top) mapping cut to x yields a bar chart, (middle) mapping cut & clarity to x & y yields a heatmap, and (c) mapping cut & clarity to x & color yields a dodged bar chart.” width=“100%” “data-url=/interactives/intro-defaults.html” /> FIGURE 2.1: Three examples of visualizing categorical data with plot_ly(): (top) mapping cut to x yields a bar chart, (middle) mapping cut & clarity to x & y yields a heatmap, and (c) mapping cut & clarity to x & color yields a dodged bar chart. The plot_ly() function has numerous arguments that are unique to the R package (e.g., color, stroke, span, symbol, linetype, etc) and make it easier to encode data variables (e.g., diamond clarity) as visual properties (e.g., color). By default, these arguments map values of a data variable to a visual range defined by the plural form of the argument. For example, in the bottom panel of 2.1, color is used to map each level of diamond clarity to a different color, then colors is used to specify the range of colors (which, in this case, the \"Accent\" color palette from the RColorBrewer package, but one can also supply custom color codes or a color palette function like colorRamp()). Figure 2.2 provides a visual diagram of how this particular mapping works, but the same sort of idea can be applied to other visual properties like size, shape, linetype, etc. FIGURE 2.2: Mapping data values to a visual color range. Since these arguments map data values to a visual range by default, you will obtain unexpected results if you try to specify the visual range directly, as in the top portion of Figure 2.3. If you want to specify the visual range directly, use the I() function to declare this value to be taken ‘AsIs’, as in the bottom portion of Figure 2.3. Throughout this book, you’ll see lots of examples that leverage these arguments, especially in Chapter 3. Another good resource to learn more about these arguments (especially their defaults) is the R documentation page available by entering help(plot_ly) in your R console. # doesn't produce black bars plot_ly(diamonds, x = ~cut, color = "black") # produces red bars with black outline plot_ly(diamonds, x = ~cut, color = I("red"), stroke = I("black"), span = I(2)) <img src=“images/intro-range.svg” alt=“Using I() to supply visual properties directly instead of mapping values to a visual range. In the top portion of this figure, the value 'black' is being mapped to a visual range spanned by colors (which, for discrete data, defaults to 'Set2').” width=“100%” “data-url=/interactives/intro-range.html” /> FIGURE 2.3: Using I() to supply visual properties directly instead of mapping values to a visual range. In the top portion of this figure, the value 'black' is being mapped to a visual range spanned by colors (which, for discrete data, defaults to 'Set2'). The plotly package takes a purely functional approach to a layered grammar of graphics (Wickham 2010).2 The purely functional part means, (almost) every function anticipates a plotly object as input to it’s first argument and returns a modified version of that plotly object. Furthermore, that modification is completely determined by the input values to the function (i.e., it doesn’t rely on any side-effects, unlike, for example, base R graphics). For a quick example, the layout() function anticipates a plotly object in it’s first argument and it’s other arguments add and/or modify various layout components of that object (e.g., the title): layout( plot_ly(diamonds, x = ~cut), title = "My beatiful histogram" ) For more complex plots that modify a plotly graph many times over, code written in this way can become cumbersome to read. In particular, we have to search for the inner-most part of the R expression, then work outwards towards the end result. The %>% operator from the magrittr package allows us to re-arrange this code so that we can read the sequence of modifications from left-to-right rather than inside-out (Bache and Wickham 2014). The %>% operator enable this by placing the object on the left-hand side of the %>% into the first argument of the function of the right-hand side. diamonds %>% plot_ly(x = ~cut) %>% layout(title = "My beatiful histogram") In addition to layout() for adding/modifying part(s) of the graphs’s layout, there are also a family of add_*() functions (e.g., add_histogram(), add_lines(), etc) that define how to render data into geometric objects. Borrowing terminology from the layered grammar of graphics, these functions add a graphical layer to a plot. A layer can be thought of as a group of graphical elements that can be sufficiently described using only 5 components: data, aethestic mappings (e.g., assigning clarity to color), a geometric representation (e.g. rectangles, circles, etc), statistical transformations (e.g., sum, mean, etc), and positional adjustments (e.g., dodge, stack, etc). If you’re paying attention, you’ll notice that in the examples thus far, we have not specified a layer! The layer has been added for us automatically by plot_ly(). To be explicit about what plot_ly(diamonds, x = ~cut) generates, we should add a add_histogram() layer: diamonds %>% plot_ly() %>% add_histogram(x = ~cut) As you’ll learn more about in Chapter 5, plotly has both add_histogram() and add_bars(). The difference is that add_histogram() performs statistics (i.e., a binning algorithm) dynamically in the web browser, whereas add_bars() requires the bar heights to be pre-specified. That means, to replicate the last example with add_bars(), the number of observations must be computed ahead-of-time. diamonds %>% dplyr::count(cut) %>% plot_ly() %>% add_bars(x = ~cut, y = ~n) There are numerous other add_*() functions that calculate statistics in the browser (e.g., add_histogram2d(), add_contour(), add_boxplot(), etc), but most other functions aren’t considered statistical. Making the distinction might not seem useful now, but they have their own respective trade-offs when it comes to speed and interactivity. Generally speaking, non-statistical layers will be faster and more responsive at run-time (since they require less computational work), whereas the statistical layers allow for more flexibility when it comes to client-side interactivity, as covered in Chapter 16. Practically speaking, the difference in performance is often negligible – the more common bottleneck occurs when attempting to render lots of graphical elements at a time (e.g., a scatterplot with a million points). In those scenarios, you likely want to render your plot in Canvas rather than SVG (the default) via toWebGL() – for more information on improving performance, see Chapter 24. In many scenarios, it can be useful to combine multiple graphical layers into a single plot. In this case, it becomes useful to know a few things about plot_ly(): Arguments specified in plot_ly() are global, meaning that any downstream add_*() functions inherit these arguments (unless inherit = FALSE). Data manipulation verbs from the dplyr package may be used to transform the data underlying a plotly object.3 Using these two properties of plot_ly(), Figure 2.4 demonstrates how we could leverage these properties of plot_ly() to do the following: Globally assign cut to x. Add a histogram layer (inherits the x from plot_ly()). Use dplyr verbs to modify the data underlying the plotly object. Here we just count the number of diamonds in each cut category. Add a layer of text using the summarized counts. Note that the global x mapping, as well as the other mappings local to this text layer (text and y), reflect data values from step 3. library(dplyr) diamonds %>% plot_ly(x = ~cut) %>% add_histogram() %>% group_by(cut) %>% summarise(n = n()) %>% add_text( text = ~scales::comma(n), y = ~n, textposition = "top middle", cliponaxis = FALSE ) <img src=“images/intro-dplyr.png” alt=“Using add_histogram(), add_text(), and dplyr verbs to compose a plot that leverages a raw form of the data (e.g., histogram) as well as a summarized version (e.g., text labels).” width=“100%” “data-url=/interactives/intro-dplyr.html” /> FIGURE 2.4: Using add_histogram(), add_text(), and dplyr verbs to compose a plot that leverages a raw form of the data (e.g., histogram) as well as a summarized version (e.g., text labels). Before using multiple add_*() in a single plot, make sure that you actually want to show those layers of information on the same set of axes. If it makes sense to display the information on the same axes, consider making multiple plotly objects and combining them into as grid-like layout using subplot(), as described in Chapter 13. Also, when using dplyr verbs to modify the data underlying the plotly object, you can use the plotly_data() function to obtain the data at any point in time, which is primarily useful for debugging purposes (i.e., inspecting the data of a particular graphical layer). diamonds %>% plot_ly(x = ~cut) %>% add_histogram() %>% group_by(cut) %>% summarise(n = n()) %>% plotly_data() #> # A tibble: 5 x 2 #> cut n #> <ord> <int> #> 1 Fair 1610 #> 2 Good 4906 #> 3 Very Good 12082 #> 4 Premium 13791 #> 5 Ideal 21551 This introduction to plot_ly() has mainly focused on concepts unique to the R package plotly that are generally useful for creating most kinds of data views. The next section outlines how plotly generates plotly.js figures and how to inspect the underlying data structure that plotly.js uses to render the graph. Not only is this information useful for debugging, but it’s also a nice way to learn how to work with plotly.js directly, which you may need to improve performance in shiny apps (Chapter 17.3.1) and/or for adding custom behavior with JavaScript (Chapter 18). 2.2 Intro to plotly.js To recreate the plots in Figure 2.1 using plotly.js directly, it would take significantly more code and knowledge of plotly.js. That being said, learning how plotly generates the underlying plotly.js figure is a useful introduction to plotly.js itself, and knowledge of plotly.js becomes useful when you need more flexible control over plotly. As Figure 2.5 illustrates, when you print any plotly object, the plotly_build() function is applied to that object, and that generates an R list which adheres to a syntax that plotly.js understands. This syntax is a JavaScript Object Notation (JSON) specification that plotly.js uses to represent, seralize, and render web graphics. A lot of documentation you’ll find online about plotly (e.g., the online figure reference) implictly refers to this JSON specification, so it can helpful to know how to “work backwards” from that documentation (i.e., translate JSON into to R code). If you’d like to learn details about mapping between R and JSON, Chapter 19 provides an introduction aimed at R programmers, and Ooms (2014) provides a cohesive overview of the jsonlite package, which is what plotly uses to map between R and JSON. FIGURE 2.5: A diagram of what happens when you print a plotly graph. For illustration purposes, Figure 2.5 shows how this workflow applies to a simple bar graph (with values directly supplied instead of a data column name reference like Figure 2.1), but the same concept applies for any graph created via plotly. As the diagram suggests, both the plotly_build() and plotly_json() functions can be used to inspect the underlying data structure on both the R and JSON side of things. For example, Figure 2.6 shows the data portion of the JSON created for the last graph in Figure 2.6. p <- plot_ly(diamonds, x = ~cut, color = ~clarity, colors = "Accent") plotly_json(p) <img src=“images/intro-json.png” alt=“A portion of the JSON data behind the bottom plot of Figure 2.1. This dodged bar chart has 8 layers of data (i.e., 8 traces) – one for each level of clarity.” width=“70%” “data-url=/interactives/intro-json.html” /> FIGURE 2.6: A portion of the JSON data behind the bottom plot of Figure 2.1. This dodged bar chart has 8 layers of data (i.e., 8 traces) – one for each level of clarity. In plotly.js terminology, a figure has two key components: data (aka, traces) and a layout. A trace defines a mapping from data and visuals.4 Every trace has a type (e.g., histogram, pie, scatter, etc) and the trace type determines what other attributes (i.e., visual and/or interactive properties, like x, hoverinfo, name) are available to control the trace mapping. That is, not every trace attribute is available to every trace type, but many attributes (e.g., the name of the trace) are available in every trace type and serve a similar purpose. From Figure 2.6, we can see that it takes multiple traces to generate the dodged bar chart, but instead of clicking through JSON viewer, sometimes it’s easier to use plotly_build() and compute on the plotly.js figure definition to verify certain things exist. Since plotly uses the htmlwidgets standard5, the actual plotly.js figure definition appears under a list element named x (Vaidyanathan et al. 2016). # use plotly_build() to get at the plotly.js definition # behind *any* plotly object b <- plotly_build(p) # Confirm there 8 traces length(b$x$data) #> [1] 8 # Extract the `name` of each trace. plotly.js uses `name` to # populate legend entries and tooltips purrr::map_chr(b$x$data, "name") #> [1] "IF" "VVS1" "VVS2" "VS1" "VS2" "SI1" "SI2" "I1" # Every trace has a type of histogram unique(purrr::map_chr(b$x$data, "type")) #> [1] "histogram" Here we’ve learned that plotly creates 8 histogram traces to generate the dodged bar chart: one trace for each level of clarity.6 Why one trace per category? As illustrated in Figure 2.7, there are two main reasons: to populate a tooltip and legend entry for each level of clarity level. FIGURE 2.7: Leveraging two interactive features that require one trace per level of clarity: (1) Using ‘Compare data on hover’ mode to get counts for every level of clarity for a given level of cut and (2) Using the ability to hide/show clarity levels via their legend entries. For the interactive, see https://plotly-r.com/interactives/intro-show-hide.html If we investigated further, we’d notice that color and colors are not officially part of the plotly.js figure definition – the plotly_build() function has effectively transformed that information into a sensible plotly.js figure definition (e.g., marker.color contains the actual bar color codes). In fact, the color argument in plot_ly() is just one example of an abstraction the R package has built on top of plotly.js to make it easier to map data values to visual attributes, and many of these are covered in Chapter 3. 2.3 Intro to ggplotly() The ggplotly() function from the plotly package has the ability to translate ggplot2 to plotly. This functionality can be really helpful for quickly adding interactivity to your existing ggplot2 workflow.7 Moreover, even if you know plot_ly() and plotly.js well, ggplotly() can still be desirable for creating visualizations that aren’t necessarily straight-forward to acheive without it. To demonstrate, let’s explore the relationship between price and other variables from the well-known diamonds dataset. Hexagonal binning (i.e., geom_hex()) is useful way to visualize a 2D density8, like the relationship between price and carat as shown in Figure 2.8. From Figure 2.8, we can see there is a strong positive linear relationship between the log of carat and price. It also shows that for many, the carat is only rounded to a particular number (indicated by the light blue bands) and no diamonds are priced around $1500. Making this plot interactive makes it easier to decode the hexagonal colors into the counts that they represent. p <- ggplot(diamonds, aes(x = log(carat), y = log(price))) + geom_hex(bins = 100) ggplotly(p) <img src=“images/hexbin.png” alt=“A hexbin plot of diamond carat versus price.” width=“100%” “data-url=/interactives/hexbin.html” /> FIGURE 2.8: A hexbin plot of diamond carat versus price. I often use ggplotly() over plot_ly() to leverage ggplot2’s consistent and expressive interface for exploring statistical summaries across groups. For example, by including a discrete color variable (e.g., cut) with geom_freqpoly(), you get a frequency polygon for each level of that variable. This ability to quickly generate visual encodings of statisitical summaries across an arbitrary number of groups works for basically any geom (e.g. geom_boxplot(), geom_histogram(), geom_density(), etc) and is a key feature of ggplot2. p <- ggplot(diamonds, aes(x = log(price), color = clarity)) + geom_freqpoly() ggplotly(p) FIGURE 2.9: Frequency polygons of diamond price by diamond clarity. This visualization indicates there may be significant main effects. Now, to see how price varies with both cut and clarity, we could repeat this same visualization for each level of cut. This is where ggplot2’s facet_wrap() comes in handy. Moreover, to facilitate comparisons, we can have geom_freqpoly() display relative rather than absolute frequencies. By making this plot interactive, we can more easily compare particular levels of clarity (as shown in Figure 2.10) by leveraging the legend filtering capabilites. p <- ggplot(diamonds, aes(x = log(price), color = clarity)) + geom_freqpoly(stat = "density") + facet_wrap(~cut) ggplotly(p) FIGURE 2.10: Diamond price by clarity and cut. For the interactive, see https://plotly-r.com/interactives/freqpoly-facet.html In addition to supporting most of the ‘core’ ggplot2 API, ggplotly() can automatically convert any ggplot2 extension packages that return a ‘standard’ ggplot2 object. By standard, I mean that the object is comprised of ‘core’ ggplot2 data structures and not the result of custom geoms.9 Some great examples of R packages that extend ggplot2 using core data structures are ggforce, naniar, and GGally (Pedersen 2019; Tierney et al. 2018; Schloerke et al. 2016). Figure 2.11 demonstrates another way of visualizing the same information found in Figure 2.10 using geom_sina() from the ggforce package (instead of geom_freqpoly()). This visualization jitters the raw data within the density for each group – allowing us not only to see where the majority observations fall within a group, but also across all across all groups. By making this layer interactive, we can query individual points for more information and zoom into interesting regions. The second layer of Figure 2.11 uses ggplot2’s stat_summary() to overlay a 95% confidence interval esimated via a bootstrap algorithm via the Hmisc package (Harrell Jr, Charles Dupont, and others. 2019). p <- ggplot(diamonds, aes(x = clarity, y = log(price), color = clarity)) + ggforce::geom_sina(alpha = 0.1) + stat_summary(fun.data = "mean_cl_boot", color = "black") + facet_wrap(~cut) # WebGL is a lot more efficient at rendering lots of points toWebGL(ggplotly(p)) FIGURE 2.11: A sina plot of diamond price by clarity and cut. As noted by Wickham and Grolemund (2018), it’s surprising that the diamond price would decline with an increase of diamond clarity. As it turns out, if we account for the carat of the diamond, then see that better diamond clarity does indeed lead to a higher diamond price, as shown in Figure 2.12. Seeing such a strong pattern in the residuals of simple linear model of carat vs price indicates that our model could be greatly improved by adding clarity as a predictor of price. m <- lm(log(price) ~ log(carat), data = diamonds) diamonds <- modelr::add_residuals(diamonds, m) p <- ggplot(diamonds, aes(x = clarity, y = resid, color = clarity)) + ggforce::geom_sina(alpha = 0.1) + stat_summary(fun.data = "mean_cl_boot", color = "black") + facet_wrap(~cut) toWebGL(ggplotly(p)) FIGURE 2.12: A sina plot of diamond price by clarity and cut, after accounting for carat. As discussed in Chapter 16.4.7, the GGally package provides a convenient interface for making similar types of model diagnostic visualizations via the ggnostic() function. It also provides a convenience function for visualizating the coefficient estimates and their standard errors via the ggcoef() function. Figure 2.13 shows how injecting interactivity into this plot allows us to query exact values and zoom in on the most interesting regions. library(GGally) m <- lm(log(price) ~ log(carat) + cut, data = diamonds) gg <- ggcoef(m) # dynamicTicks means generate new axis ticks on zoom ggplotly(gg, dynamicTicks = TRUE) FIGURE 2.13: Zooming in on a coefficient plot generated from the ggcoef() function from the GGally package. For the interactive, see https://plotly-r.com/interactives/ggally.html Although the diamonds dataset does not contain any missing values, it’s a very common problem in real data analysis problems. The naniar package provides a suite of computational and visual resources for working with and revealing structure in missing values. All the ggplot2 based visualizations return an object that can be converted by ggplotly(). Moreover, naniar provides a custom geom, geom_miss_point(), that can be useful for visualizing missingness structure. Figure 2.14 demonstrates this by introducing fake missing values to the diamond price. library(naniar) # fake some missing data diamonds$price_miss <- ifelse(diamonds$depth > 60, diamonds$price, NA) p <- ggplot(diamonds, aes(x = clarity, y = log(price_miss))) + geom_miss_point(alpha = 0.1) + stat_summary(fun.data = "mean_cl_boot", colour = "black") + facet_wrap(~cut) toWebGL(ggplotly(p)) FIGURE 2.14: Using the geom_miss_point() function from the naniar package to visualize missing values in relation to non-missing values. Missing values are shown in red. In short, the ggplot2 ecosystem provides a world-class exploratory visualization toolkit, and having the ability to quickly insert interactivity such as hover, zoom, and filter via ggplotly() makes it even more powerful for exploratory analysis. In this introduction to ggplotly(), we’ve only seen relatively simple techniques that come for free out-of-the-box, but the true power of interactive graphics lies in linking multiple views. In that part of the book, you can find lots of examples of linking multiple (ggplotly() & plot_ly()) graphs purely client-side as well as with shiny. It’s also worth mentioning that ggplotly() conversions are not always perfect and ggplot2 doesn’t provide an API for interactive features, so sometimes it’s desirable to modify the return values of ggplotly(). Chapter 33 talks generally about modifying the data structure underlying ggplotly() (which, by the way, uses the same a plotly.js figure definition as discussed in Section 2.2). Moreover, Chapter 25.2 outlines various ways to customize the tooltip that ggplotly() produces. References "], ["scatter-traces.html", "3 Scattered foundations 3.1 Markers 3.2 Lines 3.3 Polygons", " 3 Scattered foundations As we learned in Section 2.2, a plotly.js figure contains one (or more) trace(s), and every trace has a type. The trace type scatter is great for drawing low-level geometries (e.g., points, lines, text, and polygons) and provides the foundation for many add_*() functions (e.g., add_markers(), add_lines(), add_paths(), add_segments(), add_ribbons(), add_area(), and add_polygons()) as well as many ggplotly() charts. These scatter-based layers provide a more convenient interface to special cases of the scatter trace by doing a bit of data wrangling and transformation under-the-hood before mapping to scatter trace(s). For a simple example, add_lines() ensures lines are drawn according to the ordering of x, which is desirable for a time series plotting. This behavior is subtly different than add_paths() which uses row ordering instead. library(plotly) data(economics, package = "ggplot2") # sort economics by psavert, just to # show difference between paths and lines p <- economics %>% arrange(psavert) %>% plot_ly(x = ~date, y = ~psavert) add_paths(p) add_lines(p) FIGURE 3.1: The difference between add_paths() and add_lines(): the top panel connects observations according to the ordering of psavert (personal savings rate), whereas the bottom panel connects observations according to the ordering of x (the date). Section 2.1 introduced ‘aesthetic mapping’ arguments (unique to the R package) which make it easier to map data to visual properties (e.g., color, linetype, etc). In addition to these arguments, dplyr groupings can be used to ensure there is at least one geometry per group. The top panel of Figure 3.1 demonstrates how group_by() could be used to effectively wrap the time series from Figure 3.1 by year, which can be useful for visualizing annual seasonality. Another approach to generating at least one geometry per ‘group’ is to provide categorical variable to a relevant aesthetic (e.g., color), as shown in the bottom panel of Figure 3.1. library(lubridate) econ <- economics %>% mutate(yr = year(date), mnth = month(date)) # one trace (more performant, but less interactive) econ %>% group_by(yr) %>% plot_ly(x = ~mnth, y = ~uempmed) %>% add_lines(text = ~yr) # multiple traces (less performant, but more interactive) plot_ly(econ, x = ~mnth, y = ~uempmed) %>% add_lines(color = ~ordered(yr)) # the split argument guarantees one trace per group level (regardless of the variable type) # this is useful if you want a consistent visual properties over multiple traces # plot_ly(econ, x = ~mnth, y = ~uempmed) %>% # add_lines(split = ~yr, color = I("black")) FIGURE 3.2: Drawing multiple lines using dplyr groups (top panel) versus a categorical color mapping (bottom panel). Comparatively speaking, the bottom panel has more interactive capabilites (e.g., legend-based filtering and multiple tooltips), but it does not scale as well with many lines. For the interactive, see https://plotly-r.com/interactives/scatter-lines.html Not only do these plots differ in visual appearance, they also differ in interactive capabilties, computational performance, and underlying implementation. That’s because, the grouping approach (top panel of Figure 3.2) uses just one plotly.js trace (more performant, less interactive), whereas the color approach (bottom panel of Figure 3.2) generates one trace per line/year. In this case, the benefit of having multiple traces is that we can perform interactive filtering via the legend and compare multiple y-values at a given x. The cost of having those capabilities is that plots starts to be become sluggish after a few hundred traces, whereas thousands of lines can be rendered fairly easily in one trace. See Chapter 24 for more details on scaling and performance. These features make it easier to get started using plotly.js, but it still pays off to learn how to use plotly.js directly. You won’t find plotly.js attributes listed as explicit arguments in any plotly function (except for the special type attribute), but they are passed along verbatim to the plotly.js figure definition through the ... operator. The scatter-based layers in this chapter fix the type plotly.js attribute to \"scatter\" as well as the mode (e.g., add_markers() uses mode='markers' etc), but you could also use the lower-level add_trace() to work more directly with plotly.js. For example, Figure 3.3 shows how to render markers, lines, and text in the same scatter trace. It also demonstrates how to leverage nested plotly.js attributes, like textfont and xaxis – these attributes contain other attributes, so you need to supply a suitable named list to these arguments. set.seed(99) plot_ly() %>% add_trace( type = "scatter", mode = "markers+lines+text", x = 4:6, y = 4:6, text = replicate(3, praise::praise("You are ${adjective}! 🙌")), textposition = "right", hoverinfo = "text", textfont = list(family = "Roboto Condensed", size = 16) ) %>% layout(xaxis = list(range = c(3, 8))) FIGURE 3.3: Using the generic add_trace() function to render markers, lines, and text in a single scatter trace. This add_trace() function, as well as any add_*() function allows you to directly specify plotly.js attributes. If you are new to plotly.js, I recommend taking a bit of time to look through the plotly.js attributes that are available to the scatter trace type and think how you might be able to use them. Most of these attributes work for other trace types as well, so learning an attribute once for a specific plot can pay off in other contexts as well. The online plotly.js figure reference, https://plot.ly/r/reference/#scatter, is a decent place to search and learn about the attributes, but I recommend using the schema() function instead for a few reasons: schema() provides a bit more information than the online docs (e.g., value types, default values, acceptable ranges, etc). The interface makes it a bit easier to traverse and discover new attributes. You can be absolutely sure it matches the version used in the R package (the online docs might use a different – probably older – version). schema() FIGURE 3.4: Using schema() function to traverse through the attributes available to a given trace type (e.g., scatter) The sections that follow in this chapter demonstrate various type of data views using scatter-based layers. In attempt to avoid duplication of documentation, a particular emphasis is put on features only currently availble from the R package (e.g., the aesthetic mapping arguments). 3.1 Markers This section details scatter traces with a mode of \"markers\" (i.e., add_markers()). For simplicity, many of the examples here use add_markers() with a numeric x and y axis, which results in scatterplot – a common way to visualize the association between two quantitative variables. The content that follows is still relevant markers displayed non-numeric x and y (aka dot pots) as shown in Section 3.1.6. 3.1.1 Alpha blending As Unwin (2015) notes, scatterplots can be useful for exposing other important features including: casual relationships, outliers, clusters, gaps, barriers, and conditional relationships. A common problem with scatterplots, however is overplotting, meaning that there are multiple observations occupying the same (or similar) x/y locations. Figure 3.5 demonstrates one way to combat overplotting via alpha blending. When dealing with tens of thousands of points (or more), consider using toWebGL() to render plots using Canvas rather than SVG (more in Chapter 24, or leveraging 2D density estimation (Section 7.2). subplot( plot_ly(mpg, x = ~cty, y = ~hwy, name = "default"), plot_ly(mpg, x = ~cty, y = ~hwy) %>% add_markers(alpha = 0.2, name = "alpha") ) FIGURE 3.5: Combating overplotting in a scatterplot with alpha blending. 3.1.2 Colors As discussed in Section 2.2, mapping a discrete variable to color produces one trace per category, which is desirable for it’s legend and hover properties. On the other hand, mapping a numeric variable to color produces one trace, as well as a colorbar guide for visually decoding colors back to data values. The colorbar() function can be used to customize the appearance of this automatically generated guide. The default colorscale is viridis, a perceptually-uniform colorscale (even when converted to black-and-white), and perceivable even to those with common forms of color blindness (Berkeley Institute for Data Science 2016). Viridis is also the default colorscale for ordered factors. p <- plot_ly(mpg, x = ~cty, y = ~hwy, alpha = 0.5) subplot( add_markers(p, color = ~cyl, showlegend = FALSE) %>% colorbar(title = "Viridis"), add_markers(p, color = ~factor(cyl)) ) FIGURE 3.6: Variations on a numeric color mapping. There are numerous ways to alter the default color scale via the colors argument. This argument excepts one of the following: (1) a color brewer palette name (see the row names of RColorBrewer::brewer.pal.info for valid names), (2) a vector of colors to interpolate, or (3) a color interpolation function like colorRamp() or scales::colour_ramp(). Although this grants a lot of flexibility, one should be conscious of using a sequential colorscale for numeric variables (& ordered factors) as shown in Figure 3.7, and a qualitative colorscale for discrete variables as shown in Figure 3.8. col1 <- c("#132B43", "#56B1F7") col2 <- viridisLite::inferno(10) col3 <- colorRamp(c("red", "white", "blue")) subplot( add_markers(p, color = ~cyl, colors = col1) %>% colorbar(title = "ggplot2 default"), add_markers(p, color = ~cyl, colors = col2) %>% colorbar(title = "Inferno"), add_markers(p, color = ~cyl, colors = col3) %>% colorbar(title = "colorRamp") ) %>% hide_legend() FIGURE 3.7: Three variations on a numeric color mapping. col1 <- "Accent" col2 <- colorRamp(c("red", "blue")) col3 <- c(`4` = "red", `5` = "black", `6` = "blue", `8` = "green") subplot( add_markers(p, color = ~factor(cyl), colors = col1), add_markers(p, color = ~factor(cyl), colors = col2), add_markers(p, color = ~factor(cyl), colors = col3) ) %>% hide_legend() FIGURE 3.8: Three variations on a discrete color mapping. As introduced in Figure 2.3, color codes can be specified manually (i.e., avoid mapping data values to a visual range) by using the I() function. Figure 3.9 provides a simple example using add_markers(). Any color understood by the col2rgb() function from the grDevices package can be used in this way. Chapter 27 provides even more details about working with different color specifications when specifying colors manually. add_markers(p, color = I("black")) FIGURE 3.9: Setting a fixed color directly using I(). The color argument is meant to control the ‘fill-color’ of a geometric object, whereas stroke (Section 3.1.4) is meant to control the ‘outline-color’ of a geometric object. In the case of add_markers(), that means color maps to the plotly.js attribute marker.color and stroke maps to marker.line.color. Not all, but many, marker symbols have a notion of stroke. 3.1.3 Symbols The symbol argument can be used to map data values to the marker.symbol plotly.js attribute. It uses the same semantics that we’ve already seen for color: A numeric mapping generates trace. A discrete mapping generates multiple traces (one trace per category). The plural, symbols, can be used to specify the visual range for the mapping. Mappings are avoided entirely through I(). For example, the left panel of Figure 3.10 uses a numeric mapping and the right panel uses a discrete mapping. As a result, the left panel is linked to the first legend entry, whereas the right panel is linked to the bottom three legend entries. When plotting multiple traces and no color is specifed, the plotly.js colorway is applied (i.e., each trace will be rendered a different color). To set a fixed color, you can set the color of every trace generated from this layer with color = I(\"black\"), or similar. p <- plot_ly(mpg, x = ~cty, y = ~hwy, alpha = 0.3) subplot( add_markers(p, symbol = ~cyl, name = "A single trace"), add_markers(p, symbol = ~factor(cyl), color = I("black")) ) FIGURE 3.10: Mapping symbol to a numeric variable (left panel) and a factor (right panel). There are two ways to specify the visual range of symbols: (1) numeric codes (interpreted as a pch codes) or (2) a character string specifying a valid marker.symbol value. Figure 3.11 uses pch codes (left panel) as well as their corresponding marker.symbol name (right panel) to specify the visual range. subplot( add_markers(p, symbol = ~cyl, symbols = c(17, 18, 19)), add_markers( p, color = I("black"), symbol = ~factor(cyl), symbols = c("triangle-up", "diamond", "circle") ) ) FIGURE 3.11: Specifying the visual range of symbols. These symbols (i.e., the visual range) can also be supplied directly to symbol through I(). For example, Figure 3.12 fixes the marker symbol to a diamond shape. plot_ly(mpg, x = ~cty, y = ~hwy) %>% add_markers(symbol = I(18), alpha = 0.5) FIGURE 3.12: Setting a fixed symbol directly using I(). If you’d like to see all the symbols available to plotly, as well as a method for supplying your own custom glyphs, see Chapter 28. 3.1.4 Stroke and span The stroke argument follows the same semantics as color and symbol when it comes to variable mappings and specifying visual ranges. Typically you don’t want to map data values to stroke, you just want to specify a fixed outline color. For example, Figure 3.13 modifies Figure 3.12 to simply add a black outline. By default, the span, or width of the stroke, is zero, you’ll likely want to set the width to be around one pixel. plot_ly(mpg, x = ~cty, y = ~hwy) %>% add_markers(symbol = I(18), alpha = 0.5, stroke = I("black"), span = I(1)) FIGURE 3.13: Using stroke and span to control the outline color as well as the width of that outline. 3.1.5 Size For scatterplots, the size argument controls the area of markers (unless otherwise specified via sizemode), and must be a numeric variable. The sizes argument controls the minimum and maximum size of circles, in pixels: p <- plot_ly(mpg, x = ~cty, y = ~hwy, alpha = 0.3) subplot( add_markers(p, size = ~cyl, name = "default"), add_markers(p, size = ~cyl, sizes = c(1, 500), name = "custom") ) FIGURE 3.14: Controlling the size range via sizes (measured in pixels). Similar to other arguments, I() can be used to specify the size directly. In the case of markers, size controls the marker.size plotly.js attribute. Remember, you always have the option to set this attribute directly by doing something similar to Figure 3.15. plot_ly(mpg, x = ~cty, y = ~hwy, alpha = 0.3, size = I(30)) FIGURE 3.15: Setting a fixed marker size directly using marker.size. 3.1.6 Dotplots & error bars A dotplot is similar to a scatterplot, except instead of two numeric axes, one is categorical. The usual goal of a dotplot is to compare value(s) on a numerical scale over numerous categories. In this context, dotplots are preferable to pie charts since comparing position along a common scale is much easier than comparing angle or area (Cleveland and McGill 1984; Bostock 2010). Furthermore, dotplots can be preferable to bar charts, especially when comparing values within a narrow range far away from 0 (Few 2006). Also, when presenting point estimates, and uncertainty associated with those estimates, bar charts tend to exaggerate the difference in point estimates, and lose focus on uncertainty (Messing 2012). A popular application for dotplots (with error bars) is the so-called “coefficient plot” for visualizing the point estimates of coefficients and their standard error. The coefplot() function in the coefplot package (Lander 2016) and the ggcoef() function in the GGally both produce coefficient plots for many types of model objects in R using ggplot2, which we can translate to plotly via ggplotly(). Since these packages use points and segments to draw the coefficient plots, the hover information is not the best, and it’d be better to use error objects. Figure 3.16 uses the tidy() function from the broom package (Robinson 2016) to obtain a data frame with one row per model coefficient, and produce a coefficient plot with error bars along the x-axis. # Fit a full-factorial linear model m <- lm(Sepal.Length ~ Sepal.Width * Petal.Length * Petal.Width, data = iris) # (1) get a tidy() data structure of covariate-level info (e.g., point estimate, standard error, etc) # (2) make sure term column is a factor ordered by the estimate # (3) plot estimate by term with an error bar for the standard error broom::tidy(m) %>% mutate(term = forcats::fct_reorder(term, estimate)) %>% plot_ly(x = ~estimate, y = ~term) %>% add_markers( error_x = ~list(value = std.error), color = I("black"), hoverinfo = "x" ) FIGURE 3.16: A coefficient plot. 3.2 Lines Many of the same principles we learned about aesthetic mappings with respect to markers (Section 3.1) also apply to lines.10 Moreover, at the start of this chapter (namely Figure 3.2) we also learned how to use dplyr’s group_by() to ensure there is at least one geometry (in this case, line) per group. We also learned the difference between add_paths() and add_lines() – the former draws lines according to row ordering whereas the latter draw them according to x. In this chapter, we’ll learn about linetype/linetype, an aesthetic that applies to lines and polygons. We’ll also discuss some other important chart types that can be implemented with add_paths(), add_lines(), and add_segments(). 3.2.1 Linetypes Generally speaking, it’s hard to perceive more than 8 different colors/linetypes/symbols in a given plot, so sometimes we have to filter data to use these effectively. Here we use the dplyr package to find the top 5 cities in terms of average monthly sales (top5), then effectively filter the original data to contain just these cities via semi_join(). As Figure 3.17 demonstrates, once we have the data filtered, mapping city to color or linetype is trivial. The color palette can be altered via the colors argument, and follows the same rules as scatterplots. The linetype palette can be altered via the linetypes argument, and accepts R’s lty values or plotly.js dash values. library(dplyr) top5 <- txhousing %>% group_by(city) %>% summarise(m = mean(sales, na.rm = TRUE)) %>% arrange(desc(m)) %>% top_n(5) tx5 <- semi_join(txhousing, top5, by = "city") plot_ly(tx5, x = ~date, y = ~median) %>% add_lines(linetype = ~city) FIGURE 3.17: Using color and/or linetype to differentiate groups of lines. If you’d like to control exactly which linetype is used to encode a particular data value, you can provide a named character vector, like in Figure 3.18. Note that this is similar to how we provided a discrete colorscale manually for markers in Figure 3.8. ltys <- c( Austin = "dashdot", `Collin County` = "longdash", Dallas = "dash", Houston = "solid", `San Antonio` = "dot" ) plot_ly(tx5, x = ~date, y = ~median) %>% add_lines(linetype = ~city, linetypes = ltys) FIGURE 3.18: Providing a named character vector to linetypes in order to control exactly what linetype gets mapped to which city. 3.2.2 Segments The add_segments() function essentially provides a way to connect two points [(x, y) to (xend, yend)] with a line. Segments form the building blocks for numerous useful chart types, including slopegraphs, dumbell charts, candlestick charts, and more. Slopegraphs and dumbell charts are useful for comparing numeric values across numerous categories. Candlestick charts are typically used for visualizing change in a financial asset over time. Segments can also provide a useful alternative to add_bars() (covered in Chapter 5), especially for animations. In particular, Figure 14.5 of Section 14.2 shows how implement an animated population pyramid using segments instead of bars. 3.2.2.1 Slopegraph The slope graph, made popular by Tufte (2001b), is a great way to compare the change in a measurement across numerous groups. This change could be along either a discrete or a continuous axis. For a continuous axis, the slopegraph could be thought of as a decomposition of a line graph into multiple segments. The slopegraph R package provides a succinct interface for creating slopegraphs with base or ggplot2 graphics and also some convenient data sets which we’ll make use of here (Leeper 2017). Figure 3.19 recreates an example from Tufte (2001b), using the gdp data set from slopegraph, and demonstrates a common issue with labelling in slopegraphs – it’s easy to have overlapping labels when anchoring labels on data values. For that reason, this implementation leverages plotly ability to interactively edit annotation positions. See Chapter 12 for similar examples of ‘editing views’. Click to show code data(gdp, package = "slopegraph") gdp$Country <- row.names(gdp) plot_ly(gdp) %>% add_segments( x = 1, xend = 2, y = ~Year1970, yend = ~Year1979, color = I("gray90") ) %>% add_annotations( x = 1, y = ~Year1970, text = ~paste(Country, " ", Year1970), xanchor = "right", showarrow = FALSE ) %>% add_annotations( x = 2, y = ~Year1979, text = ~paste(Year1979, " ", Country), xanchor = "left", showarrow = FALSE ) %>% layout( title = "Current Receipts of Goverment as a Percentage of Gross Domestic Product", showlegend = FALSE, xaxis = list( range = c(0, 3), ticktext = c("1970", "1979"), tickvals = c(1, 2), zeroline = FALSE ), yaxis = list( title = "", showgrid = FALSE, showticks = FALSE, showticklabels = FALSE ) ) %>% config(edits = list(annotationPosition = TRUE)) FIGURE 3.19: Interactively editing the label positioning in a slopegraph. For the interactive, see https://plotly-r.com/interactives/slopegraph.html 3.2.2.2 Dumbell So called dumbell charts are similar in concept to slope graphs, but not quite as general. They are typically used to compare two different classes of numeric values across numerous groups. Figure 3.20 uses the dumbell approach to show average miles per gallon city and highway for different car models. With a dumbell chart, it’s always a good idea to order the categories by a sensible metric – for Figure 3.20, the categories are ordered by the city miles per gallon. mpg %>% group_by(model) %>% summarise(c = mean(cty), h = mean(hwy)) %>% mutate(model = forcats::fct_reorder(model, c)) %>% plot_ly() %>% add_segments( x = ~c, y = ~model, xend = ~h, yend = ~model, color = I("gray"), showlegend = FALSE ) %>% add_markers( x = ~c, y = ~model, color = I("blue"), name = "mpg city" ) %>% add_markers( x = ~h, y = ~model, color = I("red"), name = "mpg highway" ) %>% layout(xaxis = list(title = "Miles per gallon")) FIGURE 3.20: A dumbell chart of mile per gallon city vs highway by model of car. 3.2.2.3 Candlestick Figure 3.21 uses the quantmod package (Ryan 2016) to obtain stock price data for Microsoft and plots two segments for each day: one to encode the opening/closing values, and one to encode the daily high/low. library(quantmod) msft <- getSymbols("MSFT", auto.assign = F) dat <- as.data.frame(msft) dat$date <- index(msft) dat <- subset(dat, date >= "2016-01-01") names(dat) <- sub("^MSFT\\\\.", "", names(dat)) plot_ly(dat, x = ~date, xend = ~date, color = ~Close > Open, colors = c("red", "forestgreen"), hoverinfo = "none") %>% add_segments(y = ~Low, yend = ~High, size = I(1)) %>% add_segments(y = ~Open, yend = ~Close, size = I(3)) %>% layout(showlegend = FALSE, yaxis = list(title = "Price")) %>% rangeslider() FIGURE 3.21: A candlestick chart built out of segments 3.2.3 Density plots In Chapter 5, we leverage a number of algorithms in R for computing the “optimal” number of bins for a histogram, via hist(), and routing those results to add_bars(). We can leverage the density() function for computing kernel density estimates in a similar way, and route the results to add_lines(), as is done in Figure 3.22. kerns <- c("gaussian", "epanechnikov", "rectangular", "triangular", "biweight", "cosine", "optcosine") p <- plot_ly() for (k in kerns) { d <- density(economics$pce, kernel = k, na.rm = TRUE) p <- add_lines(p, x = d$x, y = d$y, name = k) } p FIGURE 3.22: Various kernel density estimates. 3.2.4 Parallel Coordinates One very useful, but often overlooked, visualization technique is the parallel coordinates plot. Parallel coordinates provide a way to compare values along a common (or non-aligned) positional scale(s) – the most basic of all perceptual tasks – in more than 3 dimensions (Cleveland and McGill 1984). Usually each line represents every measurement for a given row (or observation) in a data set. It’s true that plotly.js provides a trace type, parcoords, specifically for parallel coordinates that offers desirable interactive capabilities (e.g., highlighting and reordering of axes).11. However, it can also be useful learn how to use add_lines() to implement parallel coordinates as it can offer more flexibility and control over the axis scales. When measurements are on very different scales, some care must be taken, and variables must transformed to be put on a common scale. As Figure 3.23 shows, even when variables are measured on a similar scale, it can still be informative to transform variables in different ways. iris$obs <- seq_len(nrow(iris)) iris_pcp <- function(transform = identity) { iris[] <- purrr::map_if(iris, is.numeric, transform) tidyr::gather(iris, variable, value, -Species, -obs) %>% group_by(obs) %>% plot_ly(x = ~variable, y = ~value, color = ~Species) %>% add_lines(alpha = 0.3) } subplot( iris_pcp(), iris_pcp(scale), iris_pcp(scales::rescale), nrows = 3, shareX = TRUE ) %>% hide_legend() FIGURE 3.23: Parallel coordinates plots of the Iris dataset. The top panel shows all variables on a common scale. The middle panel scales each variable to have mean of 0 and standard deviation of 1. In the bottom panel, each variable is scaled to have a minimum of 0 and a maximum of 1. It is also worth noting that the GGally offers a ggparcoord() function which creates parallel coordinate plots via ggplot2, which we can convert to plotly via ggplotly(). Thanks to the linked highlighting framework, parallel coordinates created in this way could be linked to lower dimensional (but sometimes higher resolution) graphics of related data to guide multi-variate data exploration. The pedestrians package provides some examples of linking parallel coordinates to other views such as a grand tour for exposing unusual features in a high-dimensional space (Sievert 2019a). 3.3 Polygons The add_polygons() function is essentially equivalent to add_paths() with the fill attribute set to “toself”. Polygons form the basis for other, higher-level scatter-based layers (e.g., add_ribbons() and add_sf()) that don’t have a dedicated plotly.js trace type. Polygons can be use to draw many things, but perhaps the most familiar application where you might want to use add_polygons() is to draw geo-spatial objects. If and when you use add_polygons() to draw a map, make sure you fix the aspect ratio (e.g., xaxis.scaleanchor) and also consider using plotly_empty() over plot_ly() to hide axis labels, ticks, and the background grid. On the other hand, Section 4.2 shows you how to make a custom maps using the sf package and add_sf(), which is a bit of work to get started, but is absolutely worth the investment. base <- map_data("world", "canada") %>% group_by(group) %>% plotly_empty(x = ~long, y = ~lat, alpha = 0.2) %>% layout(showlegend = FALSE, xaxis = list(scaleanchor = "y")) base %>% add_polygons(hoverinfo = "none", color = I("black")) %>% add_markers(text = ~paste(name, "<br />", pop), hoverinfo = "text", color = I("red"), data = maps::canada.cities) FIGURE 3.24: Using add_polygons() to make a map of Canada and major Canadian cities via data provided by the maps package. As discussion surrounding Figure 4.9 points out, scatter-based polygon layers (i.e., add_polygons(), add_ribbons(), etc) render all the polygons using one plotly.js trace by default. This approach is computationally efficient, but it’s not always desirable (e.g., can’t have multiple fills per trace, interactivity is relatively limited). To work around the limitations, consider using split (or color with a discrete variable) to split the polygon data into multiple traces. Figure 3.25 demonstrates using split which will impose plotly.js’ colorway to each trace (i.e., subregion) and leverage hoveron to generate one tooltip per sub-region. add_polygons(base, split = ~subregion, hoveron = "fills") FIGURE 3.25: Using split to render polygons with different fills and interactive properties. 3.3.1 Ribbons Ribbons are useful for showing uncertainty bounds as a function of x. The add_ribbons() function creates ribbons and requires the arguments: x, ymin, and ymax. The augment() function from the broom package appends observational-level model components (e.g., fitted values stored as a new column .fitted) which is useful for extracting those components in a convenient form for visualization. Figure 3.26 shows the fitted values and uncertainty bounds from a linear model object. m <- lm(mpg ~ wt, data = mtcars) broom::augment(m) %>% plot_ly(x = ~wt, showlegend = FALSE) %>% add_markers(y = ~mpg, color = I("black")) %>% add_ribbons(ymin = ~.fitted - 1.96 * .se.fit, ymax = ~.fitted + 1.96 * .se.fit, color = I("gray80")) %>% add_lines(y = ~.fitted, color = I("steelblue")) FIGURE 3.26: Plotting fitted values and uncertainty bounds of a linear model via the broom package. References "], ["maps.html", "4 Maps 4.1 Integrated maps 4.2 Custom maps", " 4 Maps There are numerous ways to make a map with plotly – each with it’s own strengths and weaknesses. Generally speaking the approaches fall under two categories: integrated or custom. Integrated maps leverage plotly.js’ built-in support for rendering a basemap layer. Currently there are two supported ways of making integrated maps: either via Mapbox or via an integrated d3.js powered basemap. The integrated approach is convenient if you need a quick map and don’t necessarily need sophisticated representations of geo-spatial objects. On the other hand, the custom mapping approach offers complete control since you’re providing all the information necessary to render the geo-spatial object(s). Section 4.2 covers making sophisticated maps (e.g., cartograms) using the sf R package, but it’s also possible to make custom plotly maps via other tools for geo-computing (e.g., sp, ggmap, etc). 4.1 Integrated maps 4.1.1 Overview If you have fairly simple latitude/longitude data and want to make a quick map, you may want to try one of plotly’s integrated mapping options (i.e., plot_mapbox() and plot_geo()). Generally speaking, you can treat these constructor functions as a drop-in replacement for plot_ly() and get a dynamic basemap rendered behind your data. Furthermore, all the scatter-based layers we learned about in Section 3 work as you’d expect it to with plot_ly().12 For example, Figure 4.1 uses plot_mapbox() and add_markers() to create a bubble chart: plot_mapbox(maps::canada.cities) %>% add_markers( x = ~long, y = ~lat, size = ~pop, color = ~country.etc, colors = "Accent", text = ~paste(name, pop), hoverinfo = "text" ) FIGURE 4.1: A mapbox powered bubble chart showing the population of various cities in Canada. For the interactive, see https://plotly-r.com/interactives/mapbox-bubble.html The Mapbox basemap styling is controlled through the layout.mapbox.style attribute. The plotly package comes with support for 7 different styles, but you can also supply a custom URL to a custom mapbox style. To obtain all the pre-packaged basemap style names, you can grab them from the official plotly.js schema(): styles <- schema()$layout$layoutAttributes$mapbox$style$values styles #> [1] "basic" "streets" #> [3] "outdoors" "light" #> [5] "dark" "satellite" #> [7] "satellite-streets" Any one of these values can be used for a mapbox style. Figure 4.2 demonstrates the satellite earth imagery basemap. layout( plot_mapbox(), mapbox = list(style = "satellite") ) FIGURE 4.2: Zooming in on earth satellite imagery using plot_mapbox(). For the interactive, see https://plotly-r.com/interactives/satellite.html Figure 4.3 demonstrates how to create an integrated plotly.js dropdown menu to control the basemap style via the layout.updatemenus attribute. The idea behind an integrated plotly.js dropdown is to supply a list of buttons (i.e., menu items) where each button invokes a plotly.js method with some arguments. In this case, each button uses the relayout method to modify the layout.mapbox.style attribute.13 style_buttons <- lapply(styles, function(s) { list(label = s, method = "relayout", args = list("mapbox.style", s)) }) layout( plot_mapbox(), mapbox = list(style = "dark"), updatemenus = list( list(y = 0.8, buttons = style_buttons) ) ) FIGURE 4.3: Providing a dropdown menu to control the styling of the mapbox baselayer. For the interactive, see https://plotly-r.com/interactives/mapbox-style-dropdown.html The other integrated mapping solution in plotly is plot_geo(). Compared to plot_mapbox(), this approach has support for different mapping projections, but styling the basemap is limited and can be more cumbersome. Figure 4.4 demonstrates using plot_geo() in conjunction with add_markers() and add_segments() to visualize flight paths within the United States. Whereas plot_mapbox() is fixed to a mercator projection, the plot_geo() constructor has a handful of different projection available to it, including the orthographic projection which gives the illusion of the 3D globe. Click to show code library(plotly) library(dplyr) # airport locations air <- read.csv('https://raw.githubusercontent.com/plotly/datasets/master/2011_february_us_airport_traffic.csv') # flights between airports flights <- read.csv('https://raw.githubusercontent.com/plotly/datasets/master/2011_february_aa_flight_paths.csv') flights$id <- seq_len(nrow(flights)) # map projection geo <- list( projection = list( type = 'orthographic', rotation = list(lon = -100, lat = 40, roll = 0) ), showland = TRUE, landcolor = toRGB("gray95"), countrycolor = toRGB("gray80") ) plot_geo(color = I("red")) %>% add_markers( data = air, x = ~long, y = ~lat, text = ~airport, size = ~cnt, hoverinfo = "text", alpha = 0.5 ) %>% add_segments( data = group_by(flights, id), x = ~start_lon, xend = ~end_lon, y = ~start_lat, yend = ~end_lat, alpha = 0.3, size = I(1), hoverinfo = "none" ) %>% layout(geo = geo, showlegend = FALSE) FIGURE 4.4: Using the integrated orthographic projection to visualize flight patterns on a ‘3D’ globe. For the interactive, see https://plotly-r.com/interactives/geo-flights.html One nice thing about plot_geo() is that it automatically projects geometries into the proper coordinate system defined by the map projection. For example, in Figure 4.5 the simple line segment is straight when using plot_mapbox() yet curved when using plot_geo(). It’s possible to acheive the same effect using plot_ly() or plot_mapbox(), but the relevant marker/line/polygon data has to be put into an sf data structure before rendering (see Section 4.2.1 for more details). map1 <- plot_mapbox() %>% add_segments(x = -100, xend = -50, y = 50, yend = 75) %>% layout( mapbox = list( zoom = 0, center = list(lat = 65, lon = -75) ) ) map2 <- plot_geo() %>% add_segments(x = -100, xend = -50, y = 50, yend = 75) %>% layout(geo = list(projection = list(type = "mercator"))) library(htmltools) browsable(tagList(map1, map2)) FIGURE 4.5: A comparison of plotly’s integrated mapping solutions: plot_mapbox() (top) and plot_geo() (bottom). The plot_geo() approach will transform line segments to correctly reflect their projection into a non-cartesian coordinate system. 4.1.2 Choropleths In addition to scatter-based layers, the plot_geo() constructor also supports a choropleth layer. Figure 4.6 shows the population density of the U.S. via a choropleth, and also layers on markers for the state center locations, using the U.S. state data from the datasets package (R Core Team 2016). By simply providing a z attribute, plotly_geo() objects will try to create a choropleth, but you’ll also need to provide locations and a locationmode. It’s worth noting that the locationmode is currently limited to countries and US states, so if you need to a different geo-unit (e.g., counties, muncipalities, etc), you can use the the custom mapping approach discussed in Section 4.2. density <- state.x77[, "Population"] / state.x77[, "Area"] g <- list( scope = 'usa', projection = list(type = 'albers usa'), lakecolor = toRGB('white') ) plot_geo() %>% add_trace( z = ~density, text = state.name, span = I(0), locations = state.abb, locationmode = 'USA-states' ) %>% layout(geo = g) FIGURE 4.6: A map of U.S. population density using the state.x77 data from the datasets package. Figure 4.6 helps illuminate a problem with choropleths from a graphical perception point of view. We typically use the color in choropleths to encode a numeric variable (e.g., GDP, net exports, average SAT score, etc) and the eye naturally perceives the area that a particular color covers as proportional to its overall effect. This ends up being misleading since the area the color covers typically has no sensible relationship with the data encoded by the color. A classic example of this misleading effect in action is in US election maps – the proportion of red to blue coloring is not representative of the overall popular vote (Newman 2016). Cartograms are an approach to reducing this misleading effect and grants another dimension to encode data through the size of geo-spatial features. Section 4.2.2 covers how to render cartograms in plotly using sf and cartogram. 4.2 Custom maps 4.2.1 Simple features (sf) The sf R package is a modern approach to working with geo-spatial data structures based on tidy data principles (Pebesma 2018; Wickham 2014b). The key idea behind sf is that it stores geo-spatial geometries in a list-column of a data frame. This allows each row to represent the real unit of observation/interest – whether it’s a polygon, multi-polygon, point, line, or even a collection of these features – and as a result, works seamlessly inside larger tidy workflows.14 The sf package itself does not really provide geo-spatial data – it provides the framework and utilties for storing and computing on geo-spatial data structures in an opinionated way. There are numerous packages for accessing geo-spatial data as simple features data structures. A couple notable examples include rnaturalearth and USAboundaries. The rnaturalearth package is better for obtaining any map data in the world via an API provided by https://www.naturalearthdata.com/ (South 2017). The USAboundaries package is great for obtaining map data for the United States at any point in history (Mullen and Bratt 2018). It doesn’t really matter what tool you use to obtain or create an sf object – once you have one, plot_ly() knows how to render it: library(rnaturalearth) world <- ne_countries(returnclass = "sf") class(world) #> [1] "sf" "data.frame" plot_ly(world, color = I("gray90"), stroke = I("black"), span = I(1)) FIGURE 4.7: Rendering all the world’s countries using plot_ly() and the ne_countries() function from the rnaturalearth package. How does plot_ly() know how to render the countries? It’s because the geo-spatial features are encoded in special (geometry) list-column. Also, meta-data about the geo-spatial structure are retained as special attributes of the data. Figure 4.8 augments the print method for sf to data frames to demonstrate that all the information needed to render the countries (i.e., polygons) in Figure 4.7 is contained within the world data frame. Note also, that sf provides special dplyr methods for this special class of data frame so that you can treat data manipulations as if it were a ‘tidy’ data structure. One thing about this method is that the special ‘geometry’ column is always retained – if we try to just select the name column, then we get both the name and the geometry. library(sf) world %>% select(name) %>% print(n = 4) FIGURE 4.8: A diagram of a simple features data frame. The geometry column tracks the spatial features attached to each row in the data frame. There are actually 4 different ways to render sf objects with plotly: plot_ly(), plot_mapbox(), plot_geo(), and via ggplot2’s geom_sf(). These functions render multiple polygons using a single trace by default, which is fast, but you may want to leverage the added flexibility of multiple traces. For example, a given trace can only have one fillcolor, so it’s impossible to render multiple polygons with different colors using a single trace. For this reason, if you want to vary the color of multiple polygons, make sure the split by a unique identifier (e.g. name), as done in Figure 4.9. Note that, as discussed for line charts in Figure 3.2, using multiple traces automatically adds the ability to filter name via legend entries. canada <- ne_states(country = "Canada", returnclass = "sf") plot_ly(canada, split = ~name, color = ~provnum_ne) FIGURE 4.9: Using split and color to create a choropleth map of provinces in Canada. Another important feature for maps that may require you to split multiple polygons into multiple traces is the ability to display a different hover-on-fill for each polygon. By providing text that is unique within each polygon and specifying hoveron='fills', the tooltip behavior is tied to the trace’s fill (instead of displayed at each point along the polygon). plot_ly( canada, split = ~name, color = I("gray90"), text = ~paste(name, "is \\n province number", provnum_ne), hoveron = "fills", hoverinfo = "text", showlegend = FALSE ) FIGURE 4.10: Using split, text, and hoveron='fills' to display a tooltip specific to each Canadian province. Although the integrated mapping approaches (plot_mapbox() and plot_geo()) can render sf objects, the custom mapping approaches (plot_ly() and geom_sf()) are more flexible because they allow for any well-defined mapping projection. Working with and understanding map projections can be intimatidating for a causal map maker. Thankfully, there are nice resources for searching map projections in a human-friendly interface, like http://spatialreference.org/. Through this website, one can search desirable projections for a given portion of the globe and extract commands for projecting their geo-spatial objects into that projection. One way to perform the projection is to supply the relevant PROJ4 command to the st_transform() function in sf (PROJ contributors 2018). # filter the world sf object down to canada canada <- filter(world, name == "Canada") # coerce cities lat/long data to an official sf object cities <- st_as_sf( maps::canada.cities, coords = c("long", "lat"), crs = 4326 ) # A PROJ4 projection designed for Canada # http://spatialreference.org/ref/sr-org/7/ # http://spatialreference.org/ref/sr-org/7/proj4/ moll_proj <- "+proj=moll +lon_0=0 +x_0=0 +y_0=0 +ellps=WGS84 +units=m +no_defs" # perform the projections canada <- st_transform(canada, moll_proj) cities <- st_transform(cities, moll_proj) # plot with geom_sf() p <- ggplot() + geom_sf(data = canada) + geom_sf(data = cities, aes(size = pop), color = "red", alpha = 0.3) ggplotly(p) FIGURE 4.11: The population of various Canadian cities rendered on a custom basemap using a Mollweide projection. Some geo-spatial objects have an unnecessarily high resolution for a given visualization. In these cases, you may want to consider simplifying the geo-spatial object to improve the speed of the R code and responsiveness of the visualization. For example, we could recreate Figure 4.7 with a much higher resolution by specifying scale = \"large\" in ne_countries() this gives us a sf object with over 50 times more spatial coordinates than the default scale. The higher resolution allows us to zoom in better on more complex geo-spatial regions, but it allow leads to slower R code, larger HTML files, and slower responsiveness. Sievert (2018b) explores this issue in more depth and demonstrates how to use the st_simplify() function from sf to simplify features before plotting them. sum(rapply(world$geometry, nrow)) #> [1] 10586 world_large <- ne_countries(scale = "large", returnclass = "sf") sum(rapply(world_large$geometry, nrow)) #> [1] 548121 Analogous to the discussion surrounding 3.2, it pays to be aware of the tradeoffs involved with rendering plotly graphics using one or many traces, and knowledgable about how to leverage either approach. Specifically, by default, plotly attempts to render all simple features in a single trace, which is performant, but doesn’t have a lot of interactivity. plot_mapbox(world_large, color = NA, stroke = I("black"), span = I(0.5)) For those interested in learning more about geocomputation in R with sf and other great R packages like sp and raster, Robin Lovelace (2019) provides lots of nice and freely available learning resources (Pebesma and Bivand 2005; Hijmans 2019). 4.2.2 Cartograms Cartograms distort the size of geo-spatial polygons to encode a numeric variable other than the land size. There are numerous types of cartograms and they are typically categorized by their ability to perserve shape and maintain contingous regions. Cartograms has been shown to be an effective approach to both encode and teach about geo-spatial data, though the effects certainly vary by cartogram type (Nusrat S, Alam MJ, Kobourov S. 2018). The R package cartogram provides an interface to several popular cartogram algorithms (Jeworutzki 2018). A number of other R packages provide cartogram algorithms, but the great thing about cartogram is that all the functions can take an sf (or sp) object as input and return an sf object. This makes it incredibly easy to go from raw spatial objects, to transformed objects, to visual. Figure 4.12 demonstrates a continuous area cartogram of US population in 2014 using a rubber sheet distortion algorithm from James A. Dougenik, Nicholas R. Chrisman, Duane R. Niemeyer (1985). library(cartogram) library(albersusa) us_cont <- cartogram_cont(usa_sf("laea"), "pop_2014") plot_ly(us_cont) %>% add_sf( color = ~pop_2014, split = ~name, span = I(1), text = ~paste(name, scales::number_si(pop_2014)), hoverinfo = "text", hoveron = "fills" ) %>% layout(showlegend = FALSE) %>% colorbar(title = "Population \\n 2014") FIGURE 4.12: A cartogram of US population in 2014. A cartogram sizes the area of geo-spatial objects proportional to some metric (e.g., population). Figure 4.13 demonstrates a non-continuous Dorling cartogram of US population in 2014 from Dorling, D (1996). This cartogram does not try to preserve the shape of polygons (i.e., states), but instead uses circles instead to represent each geo-spatial object, then encodes the variable of interest (i.e., population) using the area of the circle. us <- usa_sf("laea") us_dor <- cartogram_dorling(us, "pop_2014") plot_ly(stroke = I("black"), span = I(1)) %>% add_sf( data = us, color = I("gray95"), hoverinfo = "none" ) %>% add_sf( data = us_dor, color = ~pop_2014, split = ~name, text = ~paste(name, scales::number_si(pop_2014)), hoverinfo = "text", hoveron = "fills" ) %>% layout(showlegend = FALSE) FIGURE 4.13: A dorling cartogram of US population in 2014. A dorling cartogram sizes the circles proportional to some metric (e.g., population). Figure 4.14 demonstrates a non-continuous cartogram of US population in 2014 from Olson, J. M. (1976). In contrast to the Dorling cartogram, this approach does preserve the shape of polygons. The implementation behind Figure 4.14 is to simply take the implementation of Figure 4.13 and change cartogram_dorling() to cartogram_ncont(). FIGURE 4.14: A non-continguous cartogram of US population in 2014 that preserves shape. A popular class of contiguous cartograms that do not preserve shape are sometimes referred to as tile catograms (aka tilegrams). At the time of writing, there doesn’t seem to be a great R package for computing tilegrams, but Pitch Interactive provides a nice web service where you can generate tilegrams from existing or custom data https://pitchinteractiveinc.github.io/tilegrams/. Moreover, the service allows you to download a TopoJSON file of the generated tilegram, which we can read in R and convert into an sf object via geojsonio (Chamberlain and Teucher 2018). Figure 4.15 demonstrates a tilegram of U.S. Population in 2016 exported directly from Pitch’s free web service. library(geojsonio) tiles <- geojson_read("~/Downloads/tiles.topo.json", what = "sp") tiles_sf <- st_as_sf(tiles) plot_ly(tiles_sf, split = ~name) FIGURE 4.15: A tile cartogram of U.S. population in 2016. References "], diff --git a/images/empty b/images/empty new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/images/empty @@ -0,0 +1 @@ + diff --git a/introduction.Rmd b/introduction.Rmd index b2300f4..83f4a90 100644 --- a/introduction.Rmd +++ b/introduction.Rmd @@ -109,7 +109,7 @@ This book contains six parts and each part contains numerous chapters. A summary 3. _Combining multiple views:_ demonstrates how to combine multiple data views into a single web page (arranging) or graphic (animation). Most of these techniques are shown using **plotly** graphs, but techniques from Section \@ref(arranging-htmlwidgets) extend to any HTML content generated via **htmltools** (which includes **htmlwidgets**). -4. _Linking multiple views:_ provides an overview of the two models for linking **plotly** graph(s) to other data views. The first model, covered in Section \@ref(graphical-queries), outlines **plotly**'s support for linking views purely client-side, meaning the resulting graphs render in any web browser on any machine without requiring external software. The second model, covered in Chapter \@ref(linking-views-with-shiny), demonstrates how to link **plotly** with other views via **shiny**, a reactive web application framework for `R`. Relatively speaking, the second model grants the `R` user way more power and flexbility, but comes at the cost of requiring more computational infrastructure. That being said, RStudio provides accessible resources for deploying **shiny** apps . +4. _Linking multiple views:_ provides an overview of the two models for linking **plotly** graph(s) to other data views. The first model, covered in Section \@ref(graphical-queries), outlines **plotly**'s support for linking views purely client-side, meaning the resulting graphs render in any web browser on any machine without requiring external software. The second model, covered in Chapter \@ref(linking-views-with-shiny), demonstrates how to link **plotly** with other views via **shiny**, a reactive web application framework for `R`. Relatively speaking, the second model grants the `R` user way more power and flexibility, but comes at the cost of requiring more computational infrastructure. That being said, RStudio provides accessible resources for deploying **shiny** apps . 5. _Custom behavior with JavaScript:_ demonstrates various ways to customize **plotly** graphs by writing custom JavaScript to handle certain user events. This part of the book is designed to be approachable for `R` users that want to learn just enough JavaScript to **plotly** to do something it doesn't "natively" support.