Skip to content

Commit c9b277a

Browse files
Merge pull request #558 from UBC-DSCI/bar-mean
Caveats of bar plots & plot titles
2 parents c6714d1 + a134490 commit c9b277a

File tree

1 file changed

+32
-25
lines changed

1 file changed

+32
-25
lines changed

source/viz.Rmd

Lines changed: 32 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -895,10 +895,13 @@ islands_df
895895
Here, we have a data frame of Earth's landmasses,
896896
and are trying to compare their sizes.
897897
The right type of visualization to answer this question is a bar plot.
898-
In a bar plot, the height of the bar represents the value of a summary statistic
899-
(usually a size, count, proportion or percentage).
900-
They are particularly useful for comparing summary statistics between different
901-
groups of a categorical variable.
898+
In a bar plot, the height of each bar represents the value of an *amount*
899+
(a size, count, proportion, percentage, etc).
900+
They are particularly useful for comparing counts or proportions across different
901+
groups of a categorical variable. Note, however, that bar plots should generally not be
902+
used to display mean or median values, as they hide important information about
903+
the variation of the data. Instead it's better to show the distribution of
904+
all the individual data points, e.g., using a histogram, which we will discuss further in Section \@ref(histogramsviz).
902905

903906
We specify that we would like to use a bar plot
904907
via the `geom_bar` function in `ggplot2`.
@@ -908,7 +911,7 @@ the landmass sizes. So we have to pass the `stat = "identity"` argument to `geom
908911
shown in Figure \@ref(fig:03-data-islands-bar).
909912
\index{ggplot!geom\_bar}
910913

911-
```{r 03-data-islands-bar, warning=FALSE, message=FALSE, fig.width=5, fig.height=2.75, fig.align = "center", fig.pos = "H", out.extra="", fig.cap = "Bar plot of all Earth's landmasses' size with squished labels."}
914+
```{r 03-data-islands-bar, warning=FALSE, message=FALSE, fig.width=5, fig.height=2.75, fig.align = "center", fig.pos = "H", out.extra="", fig.cap = "Bar plot of Earth's landmass sizes with squished labels."}
912915
islands_bar <- ggplot(islands_df, aes(x = landmass, y = size)) +
913916
geom_bar(stat = "identity")
914917
@@ -946,14 +949,12 @@ islands_bar
946949
The plot in Figure \@ref(fig:03-data-islands-bar-2) is definitely clearer now,
947950
and allows us to answer our question
948951
("Are the top 7 largest landmasses continents?") in the affirmative.
949-
However, we could still improve this visualization by organizing
950-
the bars by landmass size rather than by alphabetical order,
951-
and by coloring the bars based on whether they correspond to a continent.
952-
The data for this is stored in the `landmass_type` column.
953-
To use this to color the bars,
954-
we add the `fill` argument to the aesthetic mapping
955-
and set it to `landmass_type`.
956-
952+
However, we could still improve this visualization by
953+
coloring the bars based on whether they correspond to a continent,
954+
and by organizing the bars by landmass size rather than by alphabetical order.
955+
The data for coloring the bars is stored in the `landmass_type` column,
956+
so we add the `fill` argument to the aesthetic mapping
957+
and set it to `landmass_type`.
957958
To organize the landmasses by their `size` variable,
958959
we will use the `tidyverse` `fct_reorder` function
959960
in the aesthetic mapping to organize the landmasses by their `size` variable.
@@ -967,25 +968,31 @@ by setting `.desc = TRUE`.
967968
We do this here so that the largest bar will be closest to the axis line,
968969
which is more visually appealing.
969970

970-
To label the x and y axes, we will use the `labs` function
971-
instead of the `xlab` and `ylab` functions from earlier in this chapter.
972-
The `labs` function is more general; we are using it in this case because
973-
we would also like to change the legend label.
974-
The default label is the name of the column being mapped to `fill`. Here that
975-
would be `landmass_type`;
976-
however `landmass_type` is not proper English (and so is less readable).
977-
Thus we use the `fill` argument inside `labs` to change that to "Type."
971+
To finalize this plot we will customize the axis and legend labels,
972+
and add a title to the chart. Plot titles are not always required, especially when
973+
it would be redundant with an already-existing
974+
caption or surrounding context (e.g., in a slide presentation with annotations).
975+
But if you decide to include one, a good plot title should provide the take home message
976+
that you want readers to focus on, e.g., "Earth's seven largest landmasses are continents,"
977+
or a more general summary of the information displayed, e.g., "Earth's twelve largest landmasses."
978+
979+
To make these final adjustments we will use the `labs` function rather than the `xlab` and `ylab` functions
980+
we have seen earlier in this chapter, as `labs` lets us modify the legend label and title in addition to axis labels.
981+
We provide a label for each aesthetic mapping in the plot&mdash;in this case, `x`, `y`, and `fill`&mdash;as well as one for the `title` argument.
978982
Finally, we again \index{ggplot!reorder} use the `theme` function
979983
to change the font size.
980984

981-
```{r 03-data-islands-bar-4, warning = FALSE, message = FALSE, fig.width=5, fig.height=2.75, fig.align="center", fig.pos = "H", out.extra="", fig.cap = "Bar plot of size for Earth's largest 12 landmasses colored by whether its a continent with clearer axes and labels."}
985+
```{r 03-data-islands-bar-4, warning = FALSE, message = FALSE, fig.width=5, fig.height=2.75, fig.align="center", fig.pos = "H", out.extra="", fig.cap = "Bar plot of size for Earth's largest 12 landmasses, colored by landmass type, with clearer axes and labels."}
982986
islands_bar <- ggplot(islands_top12,
983987
aes(x = size,
984988
y = fct_reorder(landmass, size, .desc = TRUE),
985989
fill = landmass_type)) +
986990
geom_bar(stat = "identity") +
987-
labs(x = "Size (1000 square mi)", y = "Landmass", fill = "Type") +
988-
theme(text = element_text(size = 12))
991+
labs(x = "Size (1000 square mi)",
992+
y = "Landmass",
993+
fill = "Type",
994+
title = "Earth's twelve largest landmasses") +
995+
theme(text = element_text(size = 10))
989996
990997
islands_bar
991998
```
@@ -995,7 +1002,7 @@ visualization for answering our original questions. Landmasses are organized by
9951002
their size, and continents are colored differently than other landmasses,
9961003
making it quite clear that continents are the largest seven landmasses.
9971004

998-
### Histograms: the Michelson speed of light data set
1005+
### Histograms: the Michelson speed of light data set {#histogramsviz}
9991006
The `morley` data set \index{Michelson speed of light}
10001007
contains measurements of the speed of light
10011008
collected in experiments performed in 1879.

0 commit comments

Comments
 (0)