Skip to content

Commit 22fe5aa

Browse files
cleaning up additional resources reading wrangling
1 parent 2bc5de2 commit 22fe5aa

File tree

3 files changed

+95
-48
lines changed

3 files changed

+95
-48
lines changed

reading.Rmd

Lines changed: 32 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -346,7 +346,8 @@ above, R assigns each column a name of `X1, X2, X3, X4, X5, X6`.
346346
It is best to rename your columns to help differentiate between them
347347
(e.g., `X1, X2`, etc., are not very descriptive names and will make it more confusing as
348348
you code). To rename your columns, you can use the `rename` function
349-
\index{rename} from the `dplyr` \index{dplyr} package (one of the packages
349+
\index{rename} from [the `dplyr` R package](https://dplyr.tidyverse.org/) [@dplyr]
350+
\index{dplyr} (one of the packages
350351
loaded with `tidyverse`, so we don't need to load it separately). The first
351352
argument is the data set, and in the subsequent arguments you
352353
write `new_name = old_name` for the selected variables to
@@ -1225,14 +1226,33 @@ please follow the instructions for computer setup needed to run the worksheets
12251226
found in Chapter \@ref(move-to-your-own-machine).
12261227
12271228
## Additional resources
1228-
- The [`readr` page on the Tidyverse website](https://readr.tidyverse.org/) is where you should look if you want to learn more about the functions in this chapter, the full set of arguments you can use, and other related functions. The site also provides a very nice cheat sheet that summarizes many of the data wrangling functions from this chapter.
1229-
- Sometimes you might run into data in such poor shape that none of the reading functions we cover in this chapter works. In that case, you can consult the [data import chapter](https://r4ds.had.co.nz/data-import.html) from [R for Data Science](https://r4ds.had.co.nz/), which goes into a lot more detail about how R parses text from files into data frames.
1230-
- The documentation for many of the reading functions we cover in this chapter can be found [on the Tidyverse website](https://readr.tidyverse.org/reference/read_delim.html). This site shows you the full set of arguments available for each function.
1231-
- The [`here` package](https://cran.r-project.org/web/packages/here/index.html) provides a way for you to construct or find your files' paths.
1232-
- The [`readxl` documentation](https://readxl.tidyverse.org/) provides more details on reading data from Excel, such as reading in data with multiple sheets, or specifying the cells to read in.
1233-
- The [`rio` package](https://github.com/leeper/rio) provides an alternative set of tools for reading and writing data in R. It aims to be a "Swiss army knife" for data reading/writing/converting, and supports a wide variety of data types (including data formats generated by other statistical software like SPSS and SAS).
1234-
- This [video](https://www.youtube.com/embed/ephId3mYu9o) from the [Udacity course "Linux Command Line Basics"](https://www.udacity.com/course/linux-command-line-basics--ud595) provides a good explanation of absolute versus relative paths.
1235-
- If you read the subsection on obtaining data from the web via scraping and APIs, we provide two companion tutorial video links:
1236-
- [A brief video tutorial](https://www.youtube.com/embed/YdIWI6K64zo) on using the SelectorGadget tool to obtain desired CSS selectors for extracting the price and size data for apartment listings on Craigslist
1237-
- [Another brief video tutorial](https://www.youtube.com/embed/O9HKbdhqYzk) on using the SelectorGadget tool to obtain desired CSS selectors for extracting Canadian city names and 2016 census populations from Wikipedia
1238-
- The [`polite` package](https://cran.r-project.org/web/packages/polite/index.html) provides a set of tools for responsibly scraping data from websites.
1229+
- The [`readr` documentation](https://readr.tidyverse.org/)
1230+
provides the documentation for many of the reading functions we cover in this chapter.
1231+
It is where you should look if you want to learn more about the functions in this
1232+
chapter, the full set of arguments you can use, and other related functions.
1233+
The site also provides a very nice cheat sheet that summarizes many of the data
1234+
wrangling functions from this chapter.
1235+
- Sometimes you might run into data in such poor shape that none of the reading
1236+
functions we cover in this chapter work. In that case, you can consult the
1237+
[data import chapter](https://r4ds.had.co.nz/data-import.html) from *R for Data
1238+
Science* [@wickham2016r], which goes into a lot more detail about how R parses
1239+
text from files into data frames.
1240+
- The [`here` R package](https://here.r-lib.org/) [@here]
1241+
provides a way for you to construct or find your files' paths.
1242+
- The [`readxl` documentation](https://readxl.tidyverse.org/) provides more
1243+
details on reading data from Excel, such as reading in data with multiple
1244+
sheets, or specifying the cells to read in.
1245+
- The [`rio` R package](https://github.com/leeper/rio) [@rio] provides an alternative
1246+
set of tools for reading and writing data in R. It aims to be a "Swiss army
1247+
knife" for data reading/writing/converting, and supports a wide variety of data
1248+
types (including data formats generated by other statistical software like SPSS
1249+
and SAS).
1250+
- A [video](https://www.youtube.com/embed/ephId3mYu9o) from the Udacity
1251+
course *Linux Command Line Basics* provides a good explanation of absolute versus relative paths.
1252+
- If you read the subsection on obtaining data from the web via scraping and
1253+
APIs, we provide two companion tutorial video links for how to use the
1254+
SelectorGadget tool to obtain desired CSS selectors for:
1255+
- [extracting the price and size data for apartment listings on Craigslist](https://dmi3kno.github.io/polite/)
1256+
- [extracting Canadian city names and 2016 census populations from Wikipedia](https://www.youtube.com/embed/O9HKbdhqYzk)
1257+
- The [`polite` R package](https://dmi3kno.github.io/polite/) [@polite] provides
1258+
a set of tools for responsibly scraping data from websites.

references.bib

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -364,3 +364,33 @@ @book{wickham2019advanced
364364
publisher={CRC Press},
365365
url = {https://adv-r.hadley.nz/}
366366
}
367+
368+
@Manual{here,
369+
title = {{here R package}},
370+
author = {Kirill M\"uller},
371+
year = {2020},
372+
url = {https://here.r-lib.org/}}
373+
374+
@Manual{rio,
375+
title = {{rio R package}},
376+
author = {Thomas Leeper},
377+
year = {2021},
378+
url = {https://cloud.r-project.org/web/packages/rio/index.html}}
379+
380+
@Manual{polite,
381+
title = {{polite R package}},
382+
author = {Dmytro Perepolkin},
383+
year = {2021},
384+
url = {https://dmi3kno.github.io/polite/}}
385+
386+
@Manual{dplyr,
387+
title = {{dplyr R package}},
388+
author = {Hadley Wickham and Romain Fran\c{c}ois and Lionel Henry and Kirill M\"uller},
389+
year = {2021},
390+
url = {https://dplyr.tidyverse.org/}}
391+
392+
@Manual{tidyselect,
393+
title = {{tidyselect R package}},
394+
author = {Lionel Henry and Hadley Wickham},
395+
year = {2021},
396+
url = {https://tidyselect.r-lib.org/}}

wrangling.Rmd

Lines changed: 33 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -1625,39 +1625,36 @@ found in Chapter \@ref(move-to-your-own-machine).
16251625

16261626
## Additional resources
16271627

1628-
- As we mentioned earlier, `tidyverse` is actually an *R
1629-
meta package*: it installs and loads a collection of R packages that all
1630-
follow the tidy data philosophy we discussed above. One of the `tidyverse`
1631-
packages is `dplyr`—a data wrangling workhorse. You have already met many
1632-
of `dplyr`'s functions
1633-
(`select`, `filter`, `mutate`, `arrange`, `summarize`, and `group_by`).
1634-
To learn more about these functions and meet a few more useful
1635-
functions, we recommend you check out [this
1636-
chapter](https://stat545.com/block010_dplyr-end-single-table.html#where-were-we)
1637-
of the data wrangling, exploration, and analysis with R book.
1638-
- The [`dplyr` page on the Tidyverse website](https://dplyr.tidyverse.org/) is
1639-
another resource to learn more about the functions in this
1640-
chapter, the full set of arguments you can use, and other related functions.
1641-
The site also provides a very nice cheat sheet that summarizes many of the
1642-
data wrangling functions from this chapter.
1643-
- Check out the [`tidyselect` page](https://tidyselect.r-lib.org/reference/select_helpers.html) for a
1644-
comprehensive list of `select` helpers.
1645-
- [*R for Data Science*](https://r4ds.had.co.nz/) has a few chapters related to
1646-
data wrangling that go into more depth than this book. For example, the
1647-
[tidy data](https://r4ds.had.co.nz/tidy-data.html) chapter covers tidy data,
1648-
`pivot_longer`/`pivot_wider` and `separate`, but also covers missing values
1649-
and additional wrangling functions (like `unite`). The [data
1650-
transformation](https://r4ds.had.co.nz/transform.html) chapter covers
1651-
`select`, `filter`, `arrange`, `mutate`, and `summarize`. And the [`map`
1652-
functions](https://r4ds.had.co.nz/iteration.html#the-map-functions) chapter
1653-
provides more about the `map` functions.
1654-
- You will occasionally encounter a case where you need to iterate over items
1655-
in a data frame, but none of the above functions are flexible enough to do
1656-
what you want. In that case, you may consider using [a for
1657-
loop](https://r4ds.had.co.nz/iteration.html#iteration).
1658-
- There are many `select` helpers that can be used to efficiently subset
1659-
columns in a data frame when paired with the `select` function,
1660-
or other functions that also use the `tidyselect` syntax for column selection
1661-
(e.g., `pivot-longer`).
1662-
The [documentation for `select` helpers](https://tidyselect.r-lib.org/reference/select_helpers.html)
1663-
is a useful reference to find the helper you need for your particular problem.
1628+
- As we mentioned earlier, `tidyverse` is actually an *R
1629+
meta package*: it installs and loads a collection of R packages that all
1630+
follow the tidy data philosophy we discussed above. One of the `tidyverse`
1631+
packages is `dplyr`—a data wrangling workhorse. You have already met many
1632+
of `dplyr`'s functions
1633+
(`select`, `filter`, `mutate`, `arrange`, `summarize`, and `group_by`).
1634+
To learn more about these functions and meet a few more useful
1635+
functions, we recommend you check out Chapters 5-9 of the [STAT545 online notes](https://stat545.com/).
1636+
of the data wrangling, exploration, and analysis with R book.
1637+
- The [`dplyr` R package documentation](https://dplyr.tidyverse.org/) [@dplyr] is
1638+
another resource to learn more about the functions in this
1639+
chapter, the full set of arguments you can use, and other related functions.
1640+
The site also provides a very nice cheat sheet that summarizes many of the
1641+
data wrangling functions from this chapter.
1642+
- Check out the [`tidyselect` R package page](https://tidyselect.r-lib.org/reference/select_helpers.html)
1643+
[@tidyselect] for a comprehensive list of `select` helpers.
1644+
These helpers can be used to choose columns in a data frame when paired with the `select` function
1645+
(and other functions that use the `tidyselect` syntax, such as `pivot_longer`).
1646+
The [documentation for `select` helpers](https://tidyselect.r-lib.org/reference/select_helpers.html)
1647+
is a useful reference to find the helper you need for your particular problem.
1648+
- *R for Data Science* [@wickham2016r] has a few chapters related to
1649+
data wrangling that go into more depth than this book. For example, the
1650+
[tidy data chapter](https://r4ds.had.co.nz/tidy-data.html) covers tidy data,
1651+
`pivot_longer`/`pivot_wider` and `separate`, but also covers missing values
1652+
and additional wrangling functions (like `unite`). The [data
1653+
transformation chapter](https://r4ds.had.co.nz/transform.html) covers
1654+
`select`, `filter`, `arrange`, `mutate`, and `summarize`. And the [`map`
1655+
functions chapter](https://r4ds.had.co.nz/iteration.html#the-map-functions)
1656+
provides more about the `map` functions.
1657+
- You will occasionally encounter a case where you need to iterate over items
1658+
in a data frame, but none of the above functions are flexible enough to do
1659+
what you want. In that case, you may consider using [a for
1660+
loop](https://r4ds.had.co.nz/iteration.html#iteration).

0 commit comments

Comments
 (0)