You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: classification1.Rmd
+16-17Lines changed: 16 additions & 17 deletions
Original file line number
Diff line number
Diff line change
@@ -210,7 +210,7 @@ for light orange and `"steelblue2"` for light blue—and
210
210
We also make the category labels ("B" and "M") more readable by
211
211
changing them to "Benign" and "Malignant" using the `labels` argument.
212
212
213
-
```{r 05-scatter, fig.height = 4, fig.width = 5, fig.cap= "Scatter plot of concavity versus perimeter colored by diagnosis label."}
213
+
```{r 05-scatter, fig.height = 3.5, fig.width = 4.5, fig.cap= "Scatter plot of concavity versus perimeter colored by diagnosis label."}
214
214
perim_concav <- cancer %>%
215
215
ggplot(aes(x = Perimeter, y = Concavity, color = Class)) +
216
216
geom_point(alpha = 0.6) +
@@ -286,7 +286,7 @@ new observation, with standardized perimeter of `r new_point[1]` and standardize
286
286
diagnosis "Class" is unknown. This new observation is depicted by the red, diamond point in
287
287
Figure \@ref(fig:05-knn-1).
288
288
289
-
```{r 05-knn-1, echo = FALSE, fig.height = 4, fig.width = 5, fig.cap="Scatter plot of concavity versus perimeter with new observation represented as a red diamond."}
289
+
```{r 05-knn-1, echo = FALSE, fig.height = 3.5, fig.width = 4.5, fig.cap="Scatter plot of concavity versus perimeter with new observation represented as a red diamond."}
290
290
perim_concav_with_new_point <- bind_rows(cancer,
291
291
tibble(Perimeter = new_point[1],
292
292
Concavity = new_point[2],
@@ -318,7 +318,7 @@ then the perimeter and concavity values are similar, and so we may expect that
318
318
they would have the same diagnosis.
319
319
320
320
321
-
```{r 05-knn-2, echo = FALSE, fig.height = 4, fig.width = 5, fig.cap="Scatter plot of concavity versus perimeter. The new observation is represented as a red diamond with a line to the one nearest neighbor, which has a malignant label."}
321
+
```{r 05-knn-2, echo = FALSE, fig.height = 3.5, fig.width = 4.5, fig.cap="Scatter plot of concavity versus perimeter. The new observation is represented as a red diamond with a line to the one nearest neighbor, which has a malignant label."}
322
322
perim_concav_with_new_point +
323
323
geom_segment(aes(
324
324
x = new_point[1],
@@ -343,7 +343,7 @@ Does this seem like the right prediction to make for this observation? Probably
343
343
not, if you consider the other nearby points...
344
344
345
345
346
-
```{r 05-knn-4, echo = FALSE, fig.height = 4, fig.width = 5, fig.cap="Scatter plot of concavity versus perimeter. The new observation is represented as a red diamond with a line to the one nearest neighbor, which has a benign label."}
346
+
```{r 05-knn-4, echo = FALSE, fig.height = 3.5, fig.width = 4.5, fig.cap="Scatter plot of concavity versus perimeter. The new observation is represented as a red diamond with a line to the one nearest neighbor, which has a benign label."}
347
347
348
348
perim_concav_with_new_point2 <- bind_rows(cancer,
349
349
tibble(Perimeter = new_point[1],
@@ -383,7 +383,7 @@ see that the diagnoses of 2 of the 3 nearest neighbors to our new observation
383
383
are malignant. Therefore we take majority vote and classify our new red, diamond
384
384
observation as malignant.
385
385
386
-
```{r 05-knn-5, echo = FALSE, fig.height = 4, fig.width = 5, fig.cap="Scatter plot of concavity versus perimeter with three nearest neighbors."}
386
+
```{r 05-knn-5, echo = FALSE, fig.height = 3.5, fig.width = 4.5, fig.cap="Scatter plot of concavity versus perimeter with three nearest neighbors."}
387
387
perim_concav_with_new_point2 +
388
388
geom_segment(aes(
389
389
x = new_point[1], y = new_point[2],
@@ -433,7 +433,7 @@ You will see in the `mutate` \index{mutate} step below, we compute the straight-
433
433
distance using the formula above: we square the differences between the two observations' perimeter
434
434
and concavity coordinates, add the squared differences, and then take the square root.
435
435
436
-
```{r 05-multiknn-1, echo = FALSE, fig.height = 4, fig.width = 5, fig.cap="Scatter plot of concavity versus perimeter with new observation represented as a red diamond."}
436
+
```{r 05-multiknn-1, echo = FALSE, fig.height = 3.5, fig.width = 4.5, fig.cap="Scatter plot of concavity versus perimeter with new observation represented as a red diamond."}
437
437
perim_concav <- bind_rows(cancer,
438
438
tibble(Perimeter = new_point[1],
439
439
Concavity = new_point[2],
@@ -515,7 +515,7 @@ The result of this computation shows that 3 of the 5 nearest neighbors to our ne
515
515
malignant (`M`); since this is the majority, we classify our new observation as malignant.
516
516
These 5 neighbors are circled in Figure \@ref(fig:05-multiknn-3).
517
517
518
-
```{r 05-multiknn-3, echo = FALSE, fig.height = 4, fig.width = 5, fig.cap="Scatter plot of concavity versus perimeter with 5 nearest neighbors circled."}
518
+
```{r 05-multiknn-3, echo = FALSE, fig.height = 3.5, fig.width = 4.5, fig.cap="Scatter plot of concavity versus perimeter with 5 nearest neighbors circled."}
cancer |> filter(Class == "M") |> slice_head(n = 3)
@@ -1095,7 +1094,7 @@ benign, and the benign vote will always win. For example, Figure \@ref(fig:05-up
1095
1094
shows what happens for a new tumor observation that is quite close to three observations
1096
1095
in the training data that were tagged as malignant.
1097
1096
1098
-
```{r 05-upsample, echo=FALSE, fig.height = 4, fig.width = 5, fig.cap = "Imbalanced data with 7 nearest neighbors to a new observation highlighted."}
1097
+
```{r 05-upsample, echo=FALSE, fig.height = 3.5, fig.width = 4.5, fig.cap = "Imbalanced data with 7 nearest neighbors to a new observation highlighted."}
@@ -1147,7 +1146,7 @@ each area of the plot to the predictions the $K$-nearest neighbor
1147
1146
classifier would make. We can see that the decision is
1148
1147
always "benign," corresponding to the blue color.
1149
1148
1150
-
```{r 05-upsample-2, echo = FALSE, fig.height = 4, fig.width = 5, fig.cap = "Imbalanced data with background color indicating the decision of the classifier and the points represent the labeled data."}
1149
+
```{r 05-upsample-2, echo = FALSE, fig.height = 3.5, fig.width = 4.5, fig.cap = "Imbalanced data with background color indicating the decision of the classifier and the points represent the labeled data."}
@@ -1225,7 +1224,7 @@ classifier would make. We can see that the decision is more reasonable; when the
1225
1224
to those labeled malignant, the classifier predicts a malignant tumor, and vice versa when they are
1226
1225
closer to the benign tumor observations.
1227
1226
1228
-
```{r 05-upsample-plot, echo = FALSE, fig.height = 4, fig.width = 5, fig.cap = "Upsampled data with background color indicating the decision of the classifier."}
1227
+
```{r 05-upsample-plot, echo = FALSE, fig.height = 3.5, fig.width = 4.5, fig.cap = "Upsampled data with background color indicating the decision of the classifier."}
@@ -1335,7 +1334,7 @@ predict the label of each, and visualize the predictions with a colored scatter
1335
1334
> textbook. It is included for those readers who would like to use similar
1336
1335
> visualizations in their own data analyses.
1337
1336
1338
-
```{r 05-workflow-plot-show, fig.height = 4, fig.width = 5, fig.cap = "Scatter plot of smoothness versus area where background color indicates the decision of the classifier."}
1337
+
```{r 05-workflow-plot-show, fig.height = 3.5, fig.width = 4.6, fig.cap = "Scatter plot of smoothness versus area where background color indicates the decision of the classifier."}
1339
1338
# create the grid of area/smoothness vals, and arrange in a data frame
0 commit comments