Skip to content

Commit 681d304

Browse files
bug hunt ch3 py issue
1 parent 164080d commit 681d304

File tree

1 file changed

+10
-9
lines changed

1 file changed

+10
-9
lines changed

source/wrangling.md

Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -439,7 +439,7 @@ when we need to make the data frame longer and narrower.
439439
To learn how to use `melt`, we will work through an example with the
440440
`region_lang_top5_cities_wide.csv` data set. This data set contains the
441441
counts of how many Canadians cited each language as their mother tongue for five
442-
major Canadian cities (Toronto, Montréal, Vancouver, Calgary and Edmonton) from
442+
major Canadian cities (Toronto, Montréal, Vancouver, Calgary, and Edmonton) from
443443
the 2016 Canadian census.
444444
To get started,
445445
we will use `pd.read_csv` to load the (untidy) data.
@@ -454,7 +454,7 @@ What is wrong with the untidy format above?
454454
The table on the left in {numref}`fig:img-pivot-longer-with-table`
455455
represents the data in the "wide" (messy) format.
456456
From a data analysis perspective, this format is not ideal because the values of
457-
the variable *region* (Toronto, Montréal, Vancouver, Calgary and Edmonton)
457+
the variable *region* (Toronto, Montréal, Vancouver, Calgary, and Edmonton)
458458
are stored as column names. Thus they
459459
are not easily accessible to the data analysis functions we will apply
460460
to our data set. Additionally, the *mother tongue* variable values are
@@ -586,7 +586,7 @@ we will work through an example
586586
with the `region_lang_top5_cities_long.csv` data set.
587587
This data set contains the number of Canadians reporting
588588
the primary language at home and work for five
589-
major cities (Toronto, Montréal, Vancouver, Calgary and Edmonton).
589+
major cities (Toronto, Montréal, Vancouver, Calgary, and Edmonton).
590590

591591
```{code-cell} ipython3
592592
:tags: ["output_scroll"]
@@ -631,7 +631,8 @@ Syntax for the `pivot` function.
631631

632632
+++
633633

634-
We will apply the function as detailed in {numref}`fig:img-pivot-wider`.
634+
We will apply the function as detailed in {numref}`fig:img-pivot-wider`, and then
635+
rename the columns.
635636

636637
```{code-cell} ipython3
637638
:tags: ["output_scroll"]
@@ -710,7 +711,7 @@ more columns, and we would see the data set "widen."
710711

711712
Data are also not considered tidy when multiple values are stored in the same
712713
cell. The data set we show below is even messier than the ones we dealt with
713-
above: the `Toronto`, `Montréal`, `Vancouver`, `Calgary` and `Edmonton` columns
714+
above: the `Toronto`, `Montréal`, `Vancouver`, `Calgary`, and `Edmonton` columns
714715
contain the number of Canadians reporting their primary language at home and
715716
work in one column separated by the separator (`/`). The column names are the
716717
values of a variable, *and* each value does not have its own cell! To turn this
@@ -756,8 +757,8 @@ one containing only the counts of Canadians
756757
that speak each language most at home,
757758
and the other containing only the counts of Canadians
758759
that speak each language most at work for each region.
759-
We then drop the no-longer-needed `value` column from the `lang_messy_longer`
760-
data frame, and assign the two columns from `str.split` to two new columns.
760+
We drop the no-longer-needed `value` column from the `lang_messy_longer`
761+
data frame, and then assign the two columns from `str.split` to two new columns.
761762
{numref}`fig:img-separate`
762763
outlines what we need to specify to use `str.split`.
763764

@@ -1191,7 +1192,7 @@ which provides the ability to index with the position rather than the label of t
11911192
For example, the column labels of the `tidy_lang` data frame are
11921193
`["category", "language", "region", "most_at_home", "most_at_work"]`.
11931194
Using `iloc[]`, you can ask for the `language` column by requesting the
1194-
column at index `1` (remember that Python starts counting at `0`, so the second item `"language"`
1195+
column at index `1` (remember that Python starts counting at `0`, so the second column `"language"`
11951196
has index `1`!).
11961197

11971198
```{code-cell} ipython3
@@ -1423,7 +1424,7 @@ for each of the regions in the data set.
14231424
A summary statistic function paired with `groupby` is useful for calculating that statistic
14241425
on one or more column(s) for each group. It
14251426
creates a new data frame with one row for each group
1426-
and one column for each summary statistic.The darker, top row of each table
1427+
and one column for each summary statistic. The darker, top row of each table
14271428
represents the column headers. The gray, blue, and green colored rows
14281429
correspond to the rows that belong to each of the three groups being
14291430
represented in this cartoon example.

0 commit comments

Comments
 (0)