@@ -439,7 +439,7 @@ when we need to make the data frame longer and narrower.
439
439
To learn how to use ` melt ` , we will work through an example with the
440
440
` region_lang_top5_cities_wide.csv ` data set. This data set contains the
441
441
counts of how many Canadians cited each language as their mother tongue for five
442
- major Canadian cities (Toronto, Montréal, Vancouver, Calgary and Edmonton) from
442
+ major Canadian cities (Toronto, Montréal, Vancouver, Calgary, and Edmonton) from
443
443
the 2016 Canadian census.
444
444
To get started,
445
445
we will use ` pd.read_csv ` to load the (untidy) data.
@@ -454,7 +454,7 @@ What is wrong with the untidy format above?
454
454
The table on the left in {numref}` fig:img-pivot-longer-with-table `
455
455
represents the data in the "wide" (messy) format.
456
456
From a data analysis perspective, this format is not ideal because the values of
457
- the variable * region* (Toronto, Montréal, Vancouver, Calgary and Edmonton)
457
+ the variable * region* (Toronto, Montréal, Vancouver, Calgary, and Edmonton)
458
458
are stored as column names. Thus they
459
459
are not easily accessible to the data analysis functions we will apply
460
460
to our data set. Additionally, the * mother tongue* variable values are
@@ -586,7 +586,7 @@ we will work through an example
586
586
with the ` region_lang_top5_cities_long.csv ` data set.
587
587
This data set contains the number of Canadians reporting
588
588
the primary language at home and work for five
589
- major cities (Toronto, Montréal, Vancouver, Calgary and Edmonton).
589
+ major cities (Toronto, Montréal, Vancouver, Calgary, and Edmonton).
590
590
591
591
``` {code-cell} ipython3
592
592
:tags: ["output_scroll"]
@@ -631,7 +631,8 @@ Syntax for the `pivot` function.
631
631
632
632
+++
633
633
634
- We will apply the function as detailed in {numref}` fig:img-pivot-wider ` .
634
+ We will apply the function as detailed in {numref}` fig:img-pivot-wider ` , and then
635
+ rename the columns.
635
636
636
637
``` {code-cell} ipython3
637
638
:tags: ["output_scroll"]
@@ -710,7 +711,7 @@ more columns, and we would see the data set "widen."
710
711
711
712
Data are also not considered tidy when multiple values are stored in the same
712
713
cell. The data set we show below is even messier than the ones we dealt with
713
- above: the ` Toronto ` , ` Montréal ` , ` Vancouver ` , ` Calgary ` and ` Edmonton ` columns
714
+ above: the ` Toronto ` , ` Montréal ` , ` Vancouver ` , ` Calgary ` , and ` Edmonton ` columns
714
715
contain the number of Canadians reporting their primary language at home and
715
716
work in one column separated by the separator (` / ` ). The column names are the
716
717
values of a variable, * and* each value does not have its own cell! To turn this
@@ -756,8 +757,8 @@ one containing only the counts of Canadians
756
757
that speak each language most at home,
757
758
and the other containing only the counts of Canadians
758
759
that speak each language most at work for each region.
759
- We then drop the no-longer-needed ` value ` column from the ` lang_messy_longer `
760
- data frame, and assign the two columns from ` str.split ` to two new columns.
760
+ We drop the no-longer-needed ` value ` column from the ` lang_messy_longer `
761
+ data frame, and then assign the two columns from ` str.split ` to two new columns.
761
762
{numref}` fig:img-separate `
762
763
outlines what we need to specify to use ` str.split ` .
763
764
@@ -1191,7 +1192,7 @@ which provides the ability to index with the position rather than the label of t
1191
1192
For example, the column labels of the ` tidy_lang ` data frame are
1192
1193
` ["category", "language", "region", "most_at_home", "most_at_work"] ` .
1193
1194
Using ` iloc[] ` , you can ask for the ` language ` column by requesting the
1194
- column at index ` 1 ` (remember that Python starts counting at ` 0 ` , so the second item ` "language" `
1195
+ column at index ` 1 ` (remember that Python starts counting at ` 0 ` , so the second column ` "language" `
1195
1196
has index ` 1 ` !).
1196
1197
1197
1198
``` {code-cell} ipython3
@@ -1423,7 +1424,7 @@ for each of the regions in the data set.
1423
1424
A summary statistic function paired with `groupby` is useful for calculating that statistic
1424
1425
on one or more column(s) for each group. It
1425
1426
creates a new data frame with one row for each group
1426
- and one column for each summary statistic.The darker, top row of each table
1427
+ and one column for each summary statistic. The darker, top row of each table
1427
1428
represents the column headers. The gray, blue, and green colored rows
1428
1429
correspond to the rows that belong to each of the three groups being
1429
1430
represented in this cartoon example.
0 commit comments