Skip to content

Commit b4593d8

Browse files
committed
minor changes
1 parent b8677f7 commit b4593d8

File tree

1 file changed

+47
-46
lines changed

1 file changed

+47
-46
lines changed

viz.Rmd

Lines changed: 47 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -21,17 +21,17 @@ plots, line plots, and histograms) for data using R.
2121

2222
## Chapter learning objectives
2323

24-
By the end of the chapter, readers will be able to:
24+
By the end of the chapter, readers will be able to do the following:
2525

2626
- Describe when to use the following kinds of visualizations to answer specific questions using a data set:
2727
- scatter plots
2828
- line plots
2929
- bar plots
3030
- histogram plots
31-
- Given a data set and a question, select from the above plot types and use R to create a visualization that best answers the question
32-
- Given a visualization and a question, evaluate the effectiveness of the visualization and suggest improvements to better answer the question
33-
- Referring to the visualization, communicate the conclusions in non-technical terms
34-
- Identify rules of thumb for creating effective visualizations
31+
- Given a data set and a question, select from the above plot types and use R to create a visualization that best answers the question.
32+
- Given a visualization and a question, evaluate the effectiveness of the visualization and suggest improvements to better answer the question.
33+
- Referring to the visualization, communicate the conclusions in non-technical terms.
34+
- Identify rules of thumb for creating effective visualizations.
3535
- Define the three key aspects of ggplot objects:
3636
- aesthetic mappings
3737
- geometric objects
@@ -40,11 +40,11 @@ By the end of the chapter, readers will be able to:
4040
- geometric objects: `geom_point`, `geom_line`, `geom_histogram`, `geom_bar`, `geom_vline`, `geom_hline`
4141
- scales: `xlim`, `ylim`
4242
- aesthetic mappings: `x`, `y`, `fill`, `color`, `shape`
43-
- labelling: `xlab`, `ylab`, `labs`
43+
- labeling: `xlab`, `ylab`, `labs`
4444
- font control and legend positioning: `theme`
4545
- subplots: `facet_grid`
46-
- Describe the difference in raster and vector output formats
47-
- Use `ggsave` to save visualizations in `.png` and `.svg` format
46+
- Describe the difference in raster and vector output formats.
47+
- Use `ggsave` to save visualizations in `.png` and `.svg` format.
4848

4949
## Choosing the visualization
5050
#### *Ask a question, and answer it* {-}
@@ -65,7 +65,7 @@ from Chapter \@ref(intro).
6565
With the visualizations we will cover in this chapter,
6666
we will be able to answer *only descriptive and exploratory* questions.
6767
Be careful to not answer any *predictive, inferential, causal*
68-
*or mechanistic* questions with visualizations presented here,
68+
*or mechanistic* questions with the visualizations presented here,
6969
as we have not learned the tools necessary to do that properly just yet.
7070

7171
As with most coding tasks, it is totally fine (and quite common) to make
@@ -200,11 +200,11 @@ options(warn = -1)
200200

201201
The [Mauna Loa CO$_{\text{2}}$ data set](https://www.esrl.noaa.gov/gmd/ccgg/trends/data.html),
202202
curated by [Dr. Pieter Tans, NOAA/GML](https://www.esrl.noaa.gov/gmd/staff/Pieter.Tans/)
203-
and [Dr. Ralph Keeling, Scripps Institution of Oceanography](https://scrippsco2.ucsd.edu/)
203+
and [Dr. Ralph Keeling, Scripps Institution of Oceanography,](https://scrippsco2.ucsd.edu/)
204204
records the atmospheric concentration of carbon dioxide
205205
(CO$_{\text{2}}$, in parts per million)
206206
at the Mauna Loa research station in \index{Mauna Loa CO2} Hawaii
207-
from 1959 onwards [@maunadata].
207+
from 1959 onward [@maunadata].
208208
For this book, we are going to focus on the last 40 years of the data set,
209209
1980-2020.
210210

@@ -247,7 +247,7 @@ that was measured on each date, and is type `double`.
247247
> For example, `date` type vectors allow functions like `ggplot`
248248
> to treat them as numeric dates and not as character vectors,
249249
> even though they contain non-numeric characters
250-
> (e.g., `-` in the `date_measured` column in the `co2_df` data frame).
250+
> (e.g., in the `date_measured` column in the `co2_df` data frame).
251251
> This means R will not accidentally plot the dates in the wrong order
252252
> (i.e., not alphanumerically as would happen if it was a character vector).
253253
> An in-depth study of dates and times is beyond the scope of the book,
@@ -268,15 +268,15 @@ There are a few basic aspects of a plot that we need to specify:
268268
\index{ggplot!aesthetic mapping}
269269
\index{ggplot!geometric object}
270270

271-
- the name of the data frame object to visualize
272-
- here, we specify the `co2_df` data frame
273-
- the **aesthetic mapping**, which tells \index{aesthetic mapping} `ggplot` how the columns in the data frame map to properties of the visualization
274-
- to create an aesthetic mapping, we use the `aes` function
275-
- here, we set the plot `x` axis to the `date_measured` variable, and the plot `y` axis to the `ppm` variable
276-
- the `+` operator, which tells `ggplot` that we would like to add another layer to the plot.\index{aaaplussymb@$+$|see{ggplot!add layer}}\index{ggplot!add layer}
277-
- the **geometric object**, which specifies \index{aesthetic mapping} how the mapped data should be displayed
278-
- to create a geometric object, we use a `geom_*` function (see the [ggplot reference](https://ggplot2.tidyverse.org/reference/) for a list of geometric objects)
279-
- here, we use the `geom_point` function to visualize our data as a scatter plot
271+
- The name of the data frame object to visualize.
272+
- Here, we specify the `co2_df` data frame.
273+
- The **aesthetic mapping**, which tells \index{aesthetic mapping} `ggplot` how the columns in the data frame map to properties of the visualization.
274+
- To create an aesthetic mapping, we use the `aes` function.
275+
- Here, we set the plot `x` axis to the `date_measured` variable, and the plot `y` axis to the `ppm` variable.
276+
- The `+` operator, which tells `ggplot` that we would like to add another layer to the plot.\index{aaaplussymb@$+$|see{ggplot!add layer}}\index{ggplot!add layer}
277+
- The **geometric object**, which specifies \index{aesthetic mapping} how the mapped data should be displayed.
278+
- To create a geometric object, we use a `geom_*` function (see the [ggplot reference](https://ggplot2.tidyverse.org/reference/) for a list of geometric objects).
279+
- Here, we use the `geom_point` function to visualize our data as a scatter plot.
280280

281281
Figure \@ref(fig:03-ggplot-function-scatter)
282282
shows how each of these aspects map to code
@@ -352,12 +352,6 @@ change the font size, we use the `theme` function with the `text` argument:
352352
\index{ggplot!xlab,ylab}
353353
\index{ggplot!theme}
354354

355-
> **Note:** The `theme` function is quite complex and has many arguments
356-
> that can be specified to control many non-data aspects of a visualization.
357-
> An in-depth discussion of the `theme` function is beyond the scope of this book.
358-
> Interested readers may consult the `theme` function documentation;
359-
> see the additional resources section at the end of this chapter.
360-
361355
```{r 03-data-co2-line-2, warning=FALSE, message=FALSE, fig.height = 3.5, fig.width = 4.5, fig.align = "center", fig.cap = "Line plot of atmospheric concentration of CO$_{2}$ over time with clearer axes and labels."}
362356
co2_line <- ggplot(co2_df, aes(x = date_measured, y = ppm)) +
363357
geom_line() +
@@ -368,6 +362,12 @@ co2_line <- ggplot(co2_df, aes(x = date_measured, y = ppm)) +
368362
co2_line
369363
```
370364

365+
> **Note:** The `theme` function is quite complex and has many arguments
366+
> that can be specified to control many non-data aspects of a visualization.
367+
> An in-depth discussion of the `theme` function is beyond the scope of this book.
368+
> Interested readers may consult the `theme` function documentation;
369+
> see the additional resources section at the end of this chapter.
370+
371371
Finally, let's see if we can better understand the oscillation by changing the
372372
visualization slightly. Note that it is totally fine to use a small number of
373373
visualizations to answer different aspects of the question you are trying to
@@ -467,7 +467,7 @@ faithful_scatter <- ggplot(faithful, aes(x = waiting, y = eruptions)) +
467467
faithful_scatter
468468
```
469469

470-
We can see in Figure \@ref(fig:03-data-faithful-scatter) the data tend to fall
470+
We can see in Figure \@ref(fig:03-data-faithful-scatter) that the data tend to fall
471471
into two groups: one with short waiting and eruption times, and one with long
472472
waiting and eruption times. Note that in this case, there is no overplotting:
473473
the points are generally nicely visually separated, and the pattern they form
@@ -1045,7 +1045,7 @@ minus 299,000; this ensures it is coded the same way as the
10451045
measurements in the `morley` data frame.
10461046
We would also like to fine tune this vertical line,
10471047
styling it so that it is dashed and 1 point in thickness.
1048-
A point is a measurement unit commonly used with font,
1048+
A point is a measurement unit commonly used with fonts,
10491049
and 1 point is about 0.353 mm.
10501050
We do this by setting `linetype = "dashed"` and `size = 1`, respectively.
10511051
There is a similar function, `geom_hline`,
@@ -1099,6 +1099,7 @@ with the data types in the `morley` data frame. In particular, the `Expt` column
10991099
is currently an *integer* (you can see the label `<int>` underneath the `Expt` column in \index{integer} the printed
11001100
data frame at the start of this section). But we want to treat it as a
11011101
*category*, i.e., there should be one category per type of experiment.
1102+
11021103
To fix this issue we can convert the `Expt` variable into a *factor* by \index{factor}
11031104
passing it to `as_factor` in the `fill` aesthetic mapping.
11041105
Recall that factor is a data type in R that is often used to represent
@@ -1124,7 +1125,7 @@ morley_hist
11241125
11251126
Unfortunately, the attempt to separate out the experiment number visually has
11261127
created a bit of a mess. All of the colors in Figure
1127-
\@ref(fig:03-data-morley-hist-3) are blending together, and although it is
1128+
\@ref(fig:03-data-morley-hist-with-factor) are blending together, and although it is
11281129
possible to derive *some* insight from this (e.g., experiments 1 and 3 had some
11291130
of the most incorrect measurements), it isn't the clearest way to convey our
11301131
message and answer the question. Let's try a different strategy of creating
@@ -1139,8 +1140,7 @@ If the plot is to be split horizontally, into rows,
11391140
then the `rows` argument is used.
11401141
If the plot is to be split vertically, into columns,
11411142
then the `columns` argument is used.
1142-
Both the `rows` and `columns` argument take the column names to split the data
1143-
on when creating the subplots.
1143+
Both the `rows` and `columns` arguments take the column names on which to split the data when creating the subplots.
11441144
One key thing is that the column names must be surrounded by the `vars` function.
11451145
This function allows the column names to be correctly evaluated
11461146
in the context of the data frame.
@@ -1161,7 +1161,7 @@ with respect to one another.
11611161
The most variable measurements came from Experiment 1.
11621162
There the measurements ranged from about 650 - 1050 km / sec.
11631163
The least variable measurements came from Experiment 2.
1164-
There the measurements ranged from about 750 - 950 km / sec.
1164+
There, the measurements ranged from about 750 - 950 km / sec.
11651165
The most different experiments still obtained quite similar results!
11661166

11671167
There are two finishing touches to make this visualization even clearer. First and foremost, we need to add informative axis labels
@@ -1320,14 +1320,14 @@ suggest directions for future work.
13201320
Regardless of where it appears, a good way to discuss your visualization \index{visualization!explanation} is as
13211321
a story:
13221322

1323-
1) Establish the setting and scope, and motivate why you did what you did.
1323+
1) Establish the setting and scope, and describe why you did what you did.
13241324
2) Pose the question that your visualization answers. Justify why the question is important to answer.
13251325
3) Answer the question using your visualization. Make sure you describe *all* aspects of the visualization (including describing the axes). But you
13261326
can emphasize different aspects based on what is important to answer your question:
13271327
- **trends (lines):** Does a line describe the trend well? If so, the trend is *linear*, and if not, the trend is *nonlinear*. Is the trend increasing, decreasing, or neither?
13281328
Is there a periodic oscillation (wiggle) in the trend? Is the trend noisy (does the line "jump around" a lot) or smooth?
1329-
- **distributions (scatters, histograms):** How spread out are the data? Where are they centered, roughly? Are there any obvious "clusters" or "subgroups", which would be visible as multiple bumps in the histogram?
1330-
- **distributions of two variables (scatters):** is there a clear / strong relationship between the variables (points fall in a distinct pattern), a weak one (points fall in a pattern but there is some noise), or no discernible
1329+
- **distributions (scatters, histograms):** How spread out are the data? Where are they centered, roughly? Are there any obvious "clusters" or "subgroups", which would be visible as multiple bumps in the histogram?
1330+
- **distributions of two variables (scatters):** Is there a clear / strong relationship between the variables (points fall in a distinct pattern), a weak one (points fall in a pattern but there is some noise), or no discernible
13311331
relationship (the data are too noisy to make any conclusion)?
13321332
- **amounts (bars):** How large are the bars relative to one another? Are there patterns in different groups of bars?
13331333
4) Summarize your findings, and use them to motivate whatever you will discuss next.
@@ -1342,7 +1342,7 @@ greenhouse gases, typically primarily carbon dioxide (CO$_{\text{2}}$), as a
13421342
byproduct. Too much of these gases in the Earth's atmosphere will cause it to
13431343
trap more heat from the sun, leading to global warming. (2) In order to assess
13441344
how quickly the atmospheric concentration of CO$_{\text{2}}$ is increasing over
1345-
time, we (3) used a data set from the Mauna Loa observatory from Hawaii,
1345+
time, we (3) used a data set from the Mauna Loa observatory in Hawaii,
13461346
consisting of CO$_{\text{2}}$ measurements from 1980 to 2020. We plotted the
13471347
measured concentration of CO$_{\text{2}}$ (on the vertical axis) over time (on
13481348
the horizontal axis). From this plot, you can see a clear, increasing, and
@@ -1355,10 +1355,10 @@ perhaps worth investigating more into the causes.
13551355
**Michelson Light Speed Experiments:** (1) \index{Michelson speed of light} Our
13561356
modern understanding of the physics of light has advanced significantly from
13571357
the late 1800s when Michelson and Morley's experiments first demonstrated that
1358-
it had a finite speed. We now know based on modern experiments that it moves at
1359-
roughly 299,792.458 kilometres per second. (2) But how accurately were we first
1358+
it had a finite speed. We now know, based on modern experiments, that it moves at
1359+
roughly 299,792.458 kilometers per second. (2) But how accurately were we first
13601360
able to measure this fundamental physical constant, and did certain experiments
1361-
produce more accurate results than others? (3) To better understand this we
1361+
produce more accurate results than others? (3) To better understand this, we
13621362
plotted data from 5 experiments by Michelson in 1879, each with 20 trials, as
13631363
histograms stacked on top of one another. The horizontal axis shows the
13641364
accuracy of the measurements relative to the true speed of light as we know it
@@ -1384,7 +1384,7 @@ and *vector* \index{vector graphics} formats.
13841384
**Raster** images are represented as a 2-D grid of square pixels, each
13851385
with its own color. Raster images are often *compressed* before storing so they
13861386
take up less space. A compressed format is *lossy* if the image cannot be
1387-
perfectly recreated when loading and displaying, with the hope that the change
1387+
perfectly re-created when loading and displaying, with the hope that the change
13881388
is not noticeable. *Lossless* formats, on the other hand, allow a perfect
13891389
display of the original image.
13901390
\index{raster graphics!file types}
@@ -1415,7 +1415,7 @@ computer has to draw all the elements each time it is displayed. For example,
14151415
if you have a scatter plot with 1 million points stored as an SVG file, it may
14161416
take your computer some time to open the image. On the other hand, you can zoom
14171417
into / scale up vector graphics as much as you like without the image looking
1418-
bad, while raster images eventually start to look "pixellated."
1418+
bad, while raster images eventually start to look "pixelated."
14191419

14201420
> **Note:** The portable document format [PDF](https://en.wikipedia.org/wiki/PDF) (`.pdf`) is commonly used to
14211421
> store *both* raster and vector formats. If you try to open a PDF and it's taking a long time
@@ -1447,7 +1447,7 @@ This can include the path to the directory where you would like to save the file
14471447
and the name of the plot object to save as its second argument.
14481448
The kind of image to save is specified by the file extension.
14491449
For example,
1450-
to create a PNG image file we specify that the file extension is `.png`.
1450+
to create a PNG image file, we specify that the file extension is `.png`.
14511451
Below we demonstrate how to save PNG, JPG, BMP, TIFF and SVG file types
14521452
for the `faithful_plot`:
14531453

@@ -1495,8 +1495,9 @@ based on mathematical formulas, vector graphics can be scaled up to arbitrary
14951495
sizes. This makes them great for presentation media of all sizes, from papers
14961496
to posters to billboards.
14971497

1498+
(ref:03-raster-image) Zoomed in `faithful`, raster (PNG, left) and vector (SVG, right) formats.
14981499

1499-
```{r 03-raster-image, echo=FALSE, fig.cap = "Zoomed in `faithful`, raster (PNG, left) and vector (SVG, right) formats.", fig.show="hold", fig.align= "center", message =F, out.width="100%"}
1500+
```{r 03-raster-image, echo=FALSE, fig.cap = "(ref:03-raster-image)", fig.show="hold", fig.align= "center", message =F, out.width="100%"}
15001501
knitr::include_graphics("img/png-vs-svg.png")
15011502
```
15021503

@@ -1513,11 +1514,11 @@ found in Chapter \@ref(move-to-your-own-machine).
15131514
## Additional resources
15141515
- The [`ggplot2` page on the tidyverse website](https://ggplot2.tidyverse.org) is where you should look if you want to learn more about the functions in this chapter, the full set of arguments you can use, and other related functions. The site also provides a very nice cheat sheet that summarizes many of the data wrangling functions from this chapter.
15151516
- The [Fundamentals of Data Visualization](https://serialmentor.com/dataviz/) has a wealth of information on designing effective visualizations. It is not specific to any particular programming language or library. If you want to improve your visualization skills, this is the next place to look.
1516-
- [R for Data Science](https://r4ds.had.co.nz/) has a chapter on [creating visualizations using `ggplot2`](https://r4ds.had.co.nz/data-visualisation.html). This reference is specific to R and `ggplot2`, but provides a much more detailed introduction to the full set of tools that `ggplot2` provides. This chapter is where you should look if you want to learn how to make more intricate visualizations in `ggplot2` than what is included in this chapter.
1517+
- [*R for Data Science*](https://r4ds.had.co.nz/) has a chapter on [creating visualizations using `ggplot2`](https://r4ds.had.co.nz/data-visualisation.html). This reference is specific to R and `ggplot2`, but provides a much more detailed introduction to the full set of tools that `ggplot2` provides. This chapter is where you should look if you want to learn how to make more intricate visualizations in `ggplot2` than what is included in this chapter.
15171518
- The [`theme` function documentation](https://ggplot2.tidyverse.org/reference/theme.html)
15181519
is an excellent reference to see how you can fine tune the non-data aspects
15191520
of your visualization.
1520-
- [R for Data Science](https://r4ds.had.co.nz/) has a chapter on
1521+
- [*R for Data Science*](https://r4ds.had.co.nz/) has a chapter on
15211522
[dates and times](https://r4ds.had.co.nz/dates-and-times.html).
15221523
This chapter is where you should look if you want to learn about `date` vectors,
15231524
including how to create them,

0 commit comments

Comments
 (0)