You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -173,16 +173,17 @@ column of a data frame into a vector.
173
173
174
174
```{r 05-levels}
175
175
cancer |>
176
-
select(Class) |>
177
-
pull() |> # turns a data frame into a vector
176
+
pull(Class) |> # turns a data frame into a vector
178
177
levels()
179
178
```
180
179
181
180
### Exploring the cancer data
182
181
183
182
Before we start doing any modelling, let's explore our data set. Below we use
184
-
the `group_by` and `summarize` functions we used before to see that we have
185
-
357 (63\%) benign and 212 (37\%) malignant tumour observations.
183
+
the `group_by`, `summarize` and `n` functions to find the number and percentage
184
+
of benign and maligant tumour observations in our data set. The `n` function within
185
+
the `summarize` function counts the number of observations in each `Class` group.
186
+
We have 357 (63\%) benign and 212 (37\%) malignant tumour observations.
186
187
187
188
```{r 05-tally}
188
189
num_obs <- nrow(cancer)
@@ -277,7 +278,7 @@ Figure \@ref(fig:05-knn-1).
277
278
278
279
```{r 05-knn-1, echo = FALSE, fig.height = 4, fig.width = 5, fig.cap="Scatter plot of concavity versus perimeter with new observation labelled in red"}
279
280
perim_concav +
280
-
geom_point(aes(x = new_point[1], y = new_point[2]), color = "red", size = 2.5)
281
+
geom_point(aes(x = new_point[1], y = new_point[2]), color = "red", size = 2.5, pch = 17)
281
282
```
282
283
</center>
283
284
@@ -291,7 +292,8 @@ they would have the same diagnosis.
291
292
```{r 05-knn-2, echo = FALSE, fig.height = 4, fig.width = 5, fig.cap="Scatter plot of concavity versus perimeter, with malignant nearest neighbour to a new observation highlighted"}
292
293
perim_concav + geom_point(aes(x = new_point[1], y = new_point[2]),
293
294
color = "red",
294
-
size = 2.5
295
+
size = 2.5,
296
+
pch = 17
295
297
) +
296
298
geom_segment(aes(
297
299
x = new_point[1],
@@ -320,7 +322,8 @@ not, if you consider the other nearby points...
320
322
```{r 05-knn-4, echo = FALSE, fig.height = 4, fig.width = 5, fig.cap="Scatter plot of concavity versus perimeter, with benign nearest neighbour to a new observation highlighted"}
321
323
perim_concav + geom_point(aes(x = new_point[1], y = new_point[2]),
322
324
color = "red",
323
-
size = 2.5
325
+
size = 2.5,
326
+
pch = 17
324
327
) +
325
328
geom_segment(aes(
326
329
x = new_point[1],
@@ -344,7 +347,8 @@ observation as malignant.
344
347
```{r 05-knn-5, echo = FALSE, fig.height = 4, fig.width = 5, fig.cap="Scatter plot of concavity versus perimeter with three nearest neighbours"}
345
348
perim_concav + geom_point(aes(x = new_point[1], y = new_point[2]),
geom_point(aes(x = new_point[1], y = new_point[2]), color = "red", size = 2.5)
410
+
geom_point(aes(x = new_point[1], y = new_point[2]), color = "red", size = 2.5, pch = 17)
407
411
perim_concav
408
412
```
409
413
</center>
@@ -445,7 +449,7 @@ math_table <- math_table %>%
445
449
```
446
450
447
451
```{r 05-multiknn-mathtable, echo = FALSE}
448
-
kable(math_table, booktabs = TRUE, caption = "Evaluating the distances from the new observation to each of its 5 nearest neighbours", escape = FALSE)
452
+
knitr::kable(math_table, booktabs = TRUE, caption = "Evaluating the distances from the new observation to each of its 5 nearest neighbours", escape = FALSE)
449
453
```
450
454
451
455
The result of this computation shows that 3 of the 5 nearest neighbours to our new observation are
@@ -489,7 +493,7 @@ the data look like when we visualize them as a 3-dimensional scatter.
489
493
In this case, the formula above is just the straight line distance in this 3-dimensional space.
490
494
491
495
492
-
```{r 05-more, echo = FALSE, fig.cap = "3D scatter plot of the symmetry, concavity, and perimeter variables."}
496
+
```{r 05-more, echo = FALSE, message = FALSE, fig.cap = "3D scatter plot of the symmetry, concavity, and perimeter variables."}
493
497
library(plotly)
494
498
cancer |>
495
499
plot_ly(
@@ -539,7 +543,7 @@ in this collection will help keep our code simple, readable and accurate; the
539
543
less we have to code ourselves, the fewer mistakes we are likely to make. We
0 commit comments