Skip to content

Commit 35b6a81

Browse files
remaining additional resource sections
1 parent 37a1abd commit 35b6a81

File tree

7 files changed

+113
-20
lines changed

7 files changed

+113
-20
lines changed

classification2.Rmd

Lines changed: 21 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1394,5 +1394,24 @@ please follow the instructions for computer setup needed to run the worksheets
13941394
found in Chapter \@ref(move-to-your-own-machine).
13951395

13961396
## Additional resources
1397-
- The [`tidymodels` website](https://tidymodels.org/packages) is an excellent reference for more details on, and advanced usage of, the functions and packages in the past two chapters. Aside from that, it also has a [nice beginner's tutorial](https://www.tidymodels.org/start/) and [an extensive list of more advanced examples](https://www.tidymodels.org/learn/) that you can use to continue learning beyond the scope of this book. It's worth noting that the `tidymodels` package does a lot more than just classification, and so the examples on the website similarly go beyond classification as well. In the next two chapters, you'll learn about another kind of predictive modeling setting, so it might be worth visiting the website only after reading through those chapters.
1398-
- [*An Introduction to Statistical Learning*](https://www.statlearning.com/) [@james2013introduction] provides a great next stop in the process of learning about classification. Chapter 4 discusses additional basic techniques for classification that we do not cover, such as logistic regression, linear discriminant analysis, and naive Bayes. Chapter 5 goes into much more detail about cross-validation. Chapters 8 and 9 cover decision trees and support vector machines, two very popular but more advanced classification methods. Finally, Chapter 6 covers a number of methods for selecting predictor variables. Note that while this book is still a very accessible introductory text, it requires a bit more mathematical background than we require.
1397+
- The [`tidymodels` website](https://tidymodels.org/packages) is an excellent
1398+
reference for more details on, and advanced usage of, the functions and
1399+
packages in the past two chapters. Aside from that, it also has a [nice
1400+
beginner's tutorial](https://www.tidymodels.org/start/) and [an extensive list
1401+
of more advanced examples](https://www.tidymodels.org/learn/) that you can use
1402+
to continue learning beyond the scope of this book. It's worth noting that the
1403+
`tidymodels` package does a lot more than just classification, and so the
1404+
examples on the website similarly go beyond classification as well. In the next
1405+
two chapters, you'll learn about another kind of predictive modeling setting,
1406+
so it might be worth visiting the website only after reading through those
1407+
chapters.
1408+
- *An Introduction to Statistical Learning* [@james2013introduction] provides
1409+
a great next stop in the process of
1410+
learning about classification. Chapter 4 discusses additional basic techniques
1411+
for classification that we do not cover, such as logistic regression, linear
1412+
discriminant analysis, and naive Bayes. Chapter 5 goes into much more detail
1413+
about cross-validation. Chapters 8 and 9 cover decision trees and support
1414+
vector machines, two very popular but more advanced classification methods.
1415+
Finally, Chapter 6 covers a number of methods for selecting predictor
1416+
variables. Note that while this book is still a very accessible introductory
1417+
text, it requires a bit more mathematical background than we require.

clustering.Rmd

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1106,4 +1106,12 @@ please follow the instructions for computer setup needed to run the worksheets
11061106
found in Chapter \@ref(move-to-your-own-machine).
11071107

11081108
## Additional resources
1109-
- Chapter 10 of [*An Introduction to Statistical Learning*](https://www.statlearning.com/) [@james2013introduction] provides a great next stop in the process of learning about clustering and unsupervised learning in general. In the realm of clustering specifically, it provides a great companion introduction to K-means, but also covers *hierarchical* clustering for when you expect there to be subgroups, and then subgroups within subgroups, etc., in your data. In the realm of more general unsupervised learning, it covers *principal components analysis (PCA)*, which is a very popular technique for reducing the number of predictors in a dataset.
1109+
- Chapter 10 of *An Introduction to Statistical
1110+
Learning* [@james2013introduction] provides a
1111+
great next stop in the process of learning about clustering and unsupervised
1112+
learning in general. In the realm of clustering specifically, it provides a
1113+
great companion introduction to K-means, but also covers *hierarchical*
1114+
clustering for when you expect there to be subgroups, and then subgroups within
1115+
subgroups, etc., in your data. In the realm of more general unsupervised
1116+
learning, it covers *principal components analysis (PCA)*, which is a very
1117+
popular technique for reducing the number of predictors in a dataset.

inference.Rmd

Lines changed: 17 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1183,5 +1183,20 @@ found in Chapter \@ref(move-to-your-own-machine).
11831183

11841184
## Additional resources
11851185

1186-
- Chapters 7 to 10 of [*Modern Dive*](https://moderndive.com/) provide a great next step in learning about inference. In particular, Chapters 7 and 8 cover sampling and bootstrapping using `tidyverse` and `infer` in a slightly more in-depth manner than the present chapter. Chapters 9 and 10 take the next step beyond the scope of this chapter and begin to provide some of the initial mathematical underpinnings of inference and more advanced applications of the concept of inference in testing hypotheses and performing regression. This material offers a great starting point for getting more into the technical side of statistics.
1187-
- Chapters 4 to 7 of [*OpenIntro Statistics*](https://www.openintro.org/) provide a good next step after *Modern Dive*. Although it is still certainly an introductory text, things get a bit more mathematical here. Depending on your background, you may actually want to start going through Chapters 1 to 3 first, where you will learn some fundamental concepts in probability theory. Although it may seem like a diversion, probability theory is *the language of statistics*; if you have a solid grasp of probability, more advanced statistics will come naturally to you!
1186+
- Chapters 7 to 10 of *Modern Dive* [@moderndive] provide a great
1187+
next step in learning about inference. In particular, Chapters 7 and 8 cover
1188+
sampling and bootstrapping using `tidyverse` and `infer` in a slightly more
1189+
in-depth manner than the present chapter. Chapters 9 and 10 take the next step
1190+
beyond the scope of this chapter and begin to provide some of the initial
1191+
mathematical underpinnings of inference and more advanced applications of the
1192+
concept of inference in testing hypotheses and performing regression. This
1193+
material offers a great starting point for getting more into the technical side
1194+
of statistics.
1195+
- Chapters 4 to 7 of *OpenIntro Statistics* [@openintro]
1196+
provide a good next step after *Modern Dive*. Although it is still certainly
1197+
an introductory text, things get a bit more mathematical here. Depending on
1198+
your background, you may actually want to start going through Chapters 1 to 3
1199+
first, where you will learn some fundamental concepts in probability theory.
1200+
Although it may seem like a diversion, probability theory is *the language of
1201+
statistics*; if you have a solid grasp of probability, more advanced statistics
1202+
will come naturally to you!

jupyter.Rmd

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -419,10 +419,11 @@ files using lower case characters and separating words by a dash (`-`) or an
419419
underscore (`_`).
420420

421421
## Additional resources
422-
- The [JupyterLab Documentation](https://jupyterlab.readthedocs.io/en/latest/) is a good
423-
next place to look for more information about working in Jupyter notebooks. This documentation
424-
goes into significantly more detail about all of the topics we covered in this chapter, and
425-
covers more advanced topics as well.
426-
- If you are keen to learn about the Markdown language for rich text formatting, two good places to
427-
start are [this Markdown cheatsheet](https://commonmark.org/help/)
428-
and [Markdown tutorial](https://commonmark.org/help/tutorial/), both provided by CommonMark.
422+
- The [JupyterLab Documentation](https://jupyterlab.readthedocs.io/en/latest/)
423+
is a good next place to look for more information about working in Jupyter
424+
notebooks. This documentation goes into significantly more detail about all of
425+
the topics we covered in this chapter, and covers more advanced topics as well.
426+
- If you are keen to learn about the Markdown language for rich text
427+
formatting, two good places to start are CommonMark's [Markdown
428+
cheatsheet](https://commonmark.org/help/) and [Markdown
429+
tutorial](https://commonmark.org/help/tutorial/).

references.bib

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -349,6 +349,13 @@ @misc{stanfordhealthcare
349349
url = {https://stanfordhealthcare.org/medical-conditions/cancer/cancer.html}
350350
}
351351

352+
@book{moderndive,
353+
title={Statistical Inference via Data Science: A {M}odern{D}ive into {R} and the {T}idyverse},
354+
author = {Chester Ismay and Albert Kim},
355+
year = {2020},
356+
publisher = {Chapman and Hall/CRC Press},
357+
url = {https://moderndive.com/}}
358+
352359
@book{wickham2016r,
353360
title={R for Data Science: Import, Tidy, Transform, Visualize, and Model Data},
354361
author={Wickham, Hadley and Grolemund, Garrett},
@@ -415,3 +422,10 @@ @article{lubridatepaper
415422
volume = {40},
416423
number = {3},
417424
pages = {1--25}}
425+
426+
@book{openintro,
427+
title = {OpenIntro Statistics},
428+
author = {David Diez and Mine \c{C}etinkaya-Rundel and Christopher Barr},
429+
year = {2019},
430+
publisher = {OpenIntro, Inc.},
431+
url = {https://openintro.org/book/os/}}

regression2.Rmd

Lines changed: 25 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -902,6 +902,28 @@ please follow the instructions for computer setup needed to run the worksheets
902902
found in Chapter \@ref(move-to-your-own-machine).
903903

904904
## Additional resources
905-
- The [`tidymodels` website](https://tidymodels.org/packages) is an excellent reference for more details on, and advanced usage of, the functions and packages in the past two chapters. Aside from that, it also has a [nice beginner's tutorial](https://www.tidymodels.org/start/) and [an extensive list of more advanced examples](https://www.tidymodels.org/learn/) that you can use to continue learning beyond the scope of this book.
906-
- [*Modern Dive*](https://moderndive.com/) is another textbook that uses the `tidyverse` / `tidymodels` framework. Chapter 6 complements the material in the current chapter well; it covers some slightly more advanced concepts than we do without getting mathematical. Give this chapter a read before moving on to the next reference. It is also worth noting that this book takes a more "explanatory" / "inferential" approach to regression in general (in Chapters 5, 6, and 10), which provides a nice complement to the predictive tack we take in the present book.
907-
- [*An Introduction to Statistical Learning*](https://www.statlearning.com/) [@james2013introduction] provides a great next stop in the process of learning about regression. Chapter 3 covers linear regression at a slightly more mathematical level than we do here, but it is not too large a leap and so should provide a good stepping stone. Chapter 6 discusses how to pick a subset of "informative" predictors when you have a data set with many predictors, and you expect only a few of them to be relevant. Chapter 7 covers regression models that are more flexible than linear regression models but still enjoy the computational efficiency of linear regression. In contrast, the KNN methods we covered earlier are indeed more flexible but become very slow when given lots of data.
905+
- The [`tidymodels` website](https://tidymodels.org/packages) is an excellent
906+
reference for more details on, and advanced usage of, the functions and
907+
packages in the past two chapters. Aside from that, it also has a [nice
908+
beginner's tutorial](https://www.tidymodels.org/start/) and [an extensive list
909+
of more advanced examples](https://www.tidymodels.org/learn/) that you can use
910+
to continue learning beyond the scope of this book.
911+
- *Modern Dive* [@moderndive] is another textbook that uses the
912+
`tidyverse` / `tidymodels` framework. Chapter 6 complements the material in
913+
the current chapter well; it covers some slightly more advanced concepts than
914+
we do without getting mathematical. Give this chapter a read before moving on
915+
to the next reference. It is also worth noting that this book takes a more
916+
"explanatory" / "inferential" approach to regression in general (in Chapters 5,
917+
6, and 10), which provides a nice complement to the predictive tack we take in
918+
the present book.
919+
- *An Introduction to Statistical Learning* [@james2013introduction] provides
920+
a great next stop in the process of
921+
learning about regression. Chapter 3 covers linear regression at a slightly
922+
more mathematical level than we do here, but it is not too large a leap and so
923+
should provide a good stepping stone. Chapter 6 discusses how to pick a subset
924+
of "informative" predictors when you have a data set with many predictors, and
925+
you expect only a few of them to be relevant. Chapter 7 covers regression
926+
models that are more flexible than linear regression models but still enjoy the
927+
computational efficiency of linear regression. In contrast, the KNN methods we
928+
covered earlier are indeed more flexible but become very slow when given lots
929+
of data.

version-control.Rmd

Lines changed: 19 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -940,8 +940,22 @@ found in Chapter \@ref(move-to-your-own-machine).
940940
Now that you've picked up the basics of version control with Git and GitHub,
941941
you can expand your knowledge through the resources listed below:
942942

943-
- GitHub's [guides website](https://guides.github.com/) and [YouTube channel](https://www.youtube.com/githubguides),
944-
and [*Happy Git with R*](https://happygitwithr.com/) are great resources to take the next steps in learning about Git and GitHub.
945-
- [Good enough practices in scientific computing](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005510#sec014) [-@wilson2014best] provides more advice on useful workflows and "good enough" practices in data analysis projects.
946-
- In addition to [GitHub](https://github.com), there are other popular Git repository hosting services such as [GitLab](https://gitlab.com) and [BitBucket](https://bitbucket.org). Comparing all of these options is beyond the scope of this book, and until you become a more advanced user, you are perfectly fine to just stick with GitHub. Just be aware that you have options!
947-
- [GitHub's documentation on creating a personal access token](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token) and the *Happy Git with R* [cache credentials for HTTPS](https://happygitwithr.com/credential-caching.html) chapter are both excellent additional resources to consult if you need additional help generating and using personal access tokens.
943+
- GitHub's [guides website](https://guides.github.com/) and [YouTube
944+
channel](https://www.youtube.com/githubguides), and [*Happy Git and GitHub
945+
for the useR*](https://happygitwithr.com/) are great resources to take the next steps in
946+
learning about Git and GitHub.
947+
- [Good enough practices in scientific
948+
computing](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005510#sec014)
949+
[@wilson2014best] provides more advice on useful workflows and "good enough"
950+
practices in data analysis projects.
951+
- In addition to [GitHub](https://github.com), there are other popular Git
952+
repository hosting services such as [GitLab](https://gitlab.com) and
953+
[BitBucket](https://bitbucket.org). Comparing all of these options is beyond
954+
the scope of this book, and until you become a more advanced user, you are
955+
perfectly fine to just stick with GitHub. Just be aware that you have options!
956+
- GitHub's [documentation on creating a personal access
957+
token](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token)
958+
and the *Happy Git and GitHub for the useR*
959+
[personal access tokens chapter](https://happygitwithr.com/https-pat.html) are both
960+
excellent additional resources to consult if you need additional help
961+
generating and using personal access tokens.

0 commit comments

Comments
 (0)