You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Then, when we create our data analysis workflow, we use the `fit_resamples` function \index{crossvalidation!fit\_resamples}\index{tidymodels!fit\_resamples}
656
+
Then, when we create our data analysis workflow, we use the `fit_resamples` function \index{cross-validation!fit\_resamples}\index{tidymodels!fit\_resamples}
657
657
instead of the `fit` function for training. This runs cross-validation on each
658
658
train/validation split.
659
659
@@ -679,7 +679,7 @@ knn_fit <- workflow() |>
679
679
knn_fit
680
680
```
681
681
682
-
The `collect_metrics` \index{tidymodels!collect\_metrics}\index{crossvalidation!collect\_metrics} function is used to aggregate the *mean* and *standard error*
682
+
The `collect_metrics` \index{tidymodels!collect\_metrics}\index{cross-validation!collect\_metrics} function is used to aggregate the *mean* and *standard error*
683
683
of the classifier's validation accuracy across the folds. You will find results
684
684
related to the accuracy in the row with `accuracy` listed under the `.metric` column.
685
685
You should consider the mean (`mean`) to be the estimated accuracy, while the standard
Copy file name to clipboardExpand all lines: clustering.Rmd
+5-5Lines changed: 5 additions & 5 deletions
Original file line number
Diff line number
Diff line change
@@ -107,7 +107,7 @@ collected by [Dr. Kristen Gorman](https://www.uaf.edu/cfos/people/faculty/detail
107
107
the [Palmer Station, Antarctica Long Term Ecological Research Site](https://pal.lternet.edu/), and includes
108
108
measurements for adult penguins found near there [@palmerpenguins]. We have
109
109
modified the data set for use in this chapter. Here we will focus on using two
110
-
variables---penguin bill and flipper length, both in millimeters---to determine whether
110
+
variables—penguin bill and flipper length, both in millimeters—to determine whether
111
111
there are distinct types of penguins in our data.
112
112
Understanding this might help us with species discovery and classification in a data-driven
113
113
way.
@@ -332,7 +332,7 @@ base <- base +
332
332
base
333
333
```
334
334
335
-
The larger the value of $S^2$, the more spread-out the cluster is, since large $S^2$ means that points are far from the cluster center.
335
+
The larger the value of $S^2$, the more spreadout the cluster is, since large $S^2$ means that points are far from the cluster center.
336
336
Note, however, that "large" is relative to *both* the scale of the variables for clustering *and* the number of points in the cluster. A cluster where points are very close to the center might still have a large $S^2$ if there are many data points in the cluster.
337
337
338
338
After we have calculated the WSSD for all the clusters,
@@ -591,7 +591,7 @@ These, however, are beyond the scope of this book.
591
591
592
592
### Random restarts
593
593
594
-
Unlike the classification and regression models we studied in previous chapters, K-means \index{K-means!restart,nstart} can get "stuck" in a bad solution.
594
+
Unlike the classification and regression models we studied in previous chapters, K-means \index{K-means!restart,nstart} can get "stuck" in a bad solution.
595
595
For example, Figure \@ref(fig:10-toy-kmeans-bad-init) illustrates an unlucky random initialization by K-means.
@@ -789,7 +789,7 @@ mean of the sample is \$`r round(estimates$sample_mean, 2)`.
789
789
Remember, in practice, we usually only have this one sample from the population. So
790
790
this sample and estimate are the only data we can work with.
791
791
792
-
We now perform steps (1) - (5) listed above to generate a single bootstrap
792
+
We now perform steps 1–5 listed above to generate a single bootstrap
793
793
sample in R and calculate a point estimate from that bootstrap sample. We will
794
794
use the `rep_sample_n` function as we did when we were
795
795
creating our sampling distribution. But critically, note that we now
@@ -1173,4 +1173,4 @@ found in Chapter \@ref(move-to-your-own-machine).
1173
1173
## Additional resources
1174
1174
1175
1175
- Chapters 7 to 10 of [*Modern Dive*](https://moderndive.com/) provide a great next step in learning about inference. In particular, Chapters 7 and 8 cover sampling and bootstrapping using `tidyverse` and `infer` in a slightly more in-depth manner than the present chapter. Chapters 9 and 10 take the next step beyond the scope of this chapter and begin to provide some of the initial mathematical underpinnings of inference and more advanced applications of the concept of inference in testing hypotheses and performing regression. This material offers a great starting point for getting more into the technical side of statistics.
1176
-
- Chapters 4 to 7 of [*OpenIntro Statistics - Fourth Edition*](https://www.openintro.org/) provide a good next step after *Modern Dive*. Although it is still certainly an introductory text, things get a bit more mathematical here. Depending on your background, you may actually want to start going through Chapters 1 to 3 first, where you will learn some fundamental concepts in probability theory. Although it may seem like a diversion, probability theory is *the language of statistics*; if you have a solid grasp of probability, more advanced statistics will come naturally to you!
1176
+
- Chapters 4 to 7 of [*OpenIntro Statistics*](https://www.openintro.org/) provide a good next step after *Modern Dive*. Although it is still certainly an introductory text, things get a bit more mathematical here. Depending on your background, you may actually want to start going through Chapters 1 to 3 first, where you will learn some fundamental concepts in probability theory. Although it may seem like a diversion, probability theory is *the language of statistics*; if you have a solid grasp of probability, more advanced statistics will come naturally to you!
Copy file name to clipboardExpand all lines: jupyter.Rmd
+5-5Lines changed: 5 additions & 5 deletions
Original file line number
Diff line number
Diff line change
@@ -144,7 +144,7 @@ that indicates the status of your kernel. If the circle is empty (`r fa("circle"
144
144
the kernel is idle and ready to execute code. If the circle is filled in (`r fa("circle", fill = "black", stroke = "black", stroke_width = "10px", height = "12px")`),
145
145
the kernel is busy running some code.
146
146
147
-
You may run into problems where your kernel \index{kernel!interrupt,restart} is stuck for an excessive amount
147
+
You may run into problems where your kernel \index{kernel!interrupt,restart} is stuck for an excessive amount
148
148
of time, your notebook is very slow and unresponsive, or your kernel loses its
149
149
connection. If this happens, try the following steps:
150
150
@@ -245,8 +245,8 @@ referenced in another distinct code cell (Figure \@ref(fig:out-of-order-1)).
245
245
Together, this means that you could then write a code cell further above in the
246
246
notebook that references `y` and execute it without error in the current session
247
247
(Figure \@ref(fig:out-of-order-2)). This could also be done successfully in
248
-
future sessions if, and only if, you run the cells in the same non-conventional
249
-
order. However, it is difficult to remember this non-conventional order, and it
248
+
future sessions if, and only if, you run the cells in the same unconventional
249
+
order. However, it is difficult to remember this unconventional order, and it
250
250
is not the order that others would expect your code to be executed in. Thus, in
251
251
the future, this would lead
252
252
to errors when the notebook is run in the conventional
@@ -287,7 +287,7 @@ is an issue. Knowing this sooner rather than later will allow you to
287
287
fix the issue and ensure your notebook can be run linearly from start to finish.
288
288
289
289
We recommend as a best practice to run the entire notebook in a fresh R session
290
-
at least 2-3 times within any period of work. Note that,
290
+
at least 2–3 times within any period of work. Note that,
291
291
critically, you *must do this in a fresh R session* by restarting your kernel.
292
292
We recommend using either the **Kernel** >>
293
293
**Restart Kernel and Run All Cells...** command from the menu or the `r fa("fast-forward", height = "11px")`
@@ -328,7 +328,7 @@ their computer to run the analysis successfully.
328
328
1. Write code so that it can be executed in a linear order.
329
329
330
330
2. As you write code in a Jupyter notebook, run the notebook in a linear order
331
-
and in its entirety often (2-3 times every work session) via the **Kernel** >>
331
+
and in its entirety often (2–3 times every work session) via the **Kernel** >>
332
332
**Restart Kernel and Run All Cells...** command from the Jupyter menu or the `r fa("fast-forward", height = "11px")`
0 commit comments