Skip to content

Commit 5dc63b6

Browse files
committed
fixing figure sizes in classification1
1 parent c192cab commit 5dc63b6

File tree

1 file changed

+16
-17
lines changed

1 file changed

+16
-17
lines changed

classification1.Rmd

Lines changed: 16 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -210,7 +210,7 @@ for light orange and `"steelblue2"` for light blue—and
210210
We also make the category labels ("B" and "M") more readable by
211211
changing them to "Benign" and "Malignant" using the `labels` argument.
212212

213-
```{r 05-scatter, fig.height = 4, fig.width = 5, fig.cap= "Scatter plot of concavity versus perimeter colored by diagnosis label."}
213+
```{r 05-scatter, fig.height = 3.5, fig.width = 4.5, fig.cap= "Scatter plot of concavity versus perimeter colored by diagnosis label."}
214214
perim_concav <- cancer %>%
215215
ggplot(aes(x = Perimeter, y = Concavity, color = Class)) +
216216
geom_point(alpha = 0.6) +
@@ -286,7 +286,7 @@ new observation, with standardized perimeter of `r new_point[1]` and standardize
286286
diagnosis "Class" is unknown. This new observation is depicted by the red, diamond point in
287287
Figure \@ref(fig:05-knn-1).
288288

289-
```{r 05-knn-1, echo = FALSE, fig.height = 4, fig.width = 5, fig.cap="Scatter plot of concavity versus perimeter with new observation represented as a red diamond."}
289+
```{r 05-knn-1, echo = FALSE, fig.height = 3.5, fig.width = 4.5, fig.cap="Scatter plot of concavity versus perimeter with new observation represented as a red diamond."}
290290
perim_concav_with_new_point <- bind_rows(cancer,
291291
tibble(Perimeter = new_point[1],
292292
Concavity = new_point[2],
@@ -318,7 +318,7 @@ then the perimeter and concavity values are similar, and so we may expect that
318318
they would have the same diagnosis.
319319

320320

321-
```{r 05-knn-2, echo = FALSE, fig.height = 4, fig.width = 5, fig.cap="Scatter plot of concavity versus perimeter. The new observation is represented as a red diamond with a line to the one nearest neighbor, which has a malignant label."}
321+
```{r 05-knn-2, echo = FALSE, fig.height = 3.5, fig.width = 4.5, fig.cap="Scatter plot of concavity versus perimeter. The new observation is represented as a red diamond with a line to the one nearest neighbor, which has a malignant label."}
322322
perim_concav_with_new_point +
323323
geom_segment(aes(
324324
x = new_point[1],
@@ -343,7 +343,7 @@ Does this seem like the right prediction to make for this observation? Probably
343343
not, if you consider the other nearby points...
344344

345345

346-
```{r 05-knn-4, echo = FALSE, fig.height = 4, fig.width = 5, fig.cap="Scatter plot of concavity versus perimeter. The new observation is represented as a red diamond with a line to the one nearest neighbor, which has a benign label."}
346+
```{r 05-knn-4, echo = FALSE, fig.height = 3.5, fig.width = 4.5, fig.cap="Scatter plot of concavity versus perimeter. The new observation is represented as a red diamond with a line to the one nearest neighbor, which has a benign label."}
347347
348348
perim_concav_with_new_point2 <- bind_rows(cancer,
349349
tibble(Perimeter = new_point[1],
@@ -383,7 +383,7 @@ see that the diagnoses of 2 of the 3 nearest neighbors to our new observation
383383
are malignant. Therefore we take majority vote and classify our new red, diamond
384384
observation as malignant.
385385

386-
```{r 05-knn-5, echo = FALSE, fig.height = 4, fig.width = 5, fig.cap="Scatter plot of concavity versus perimeter with three nearest neighbors."}
386+
```{r 05-knn-5, echo = FALSE, fig.height = 3.5, fig.width = 4.5, fig.cap="Scatter plot of concavity versus perimeter with three nearest neighbors."}
387387
perim_concav_with_new_point2 +
388388
geom_segment(aes(
389389
x = new_point[1], y = new_point[2],
@@ -433,7 +433,7 @@ You will see in the `mutate` \index{mutate} step below, we compute the straight-
433433
distance using the formula above: we square the differences between the two observations' perimeter
434434
and concavity coordinates, add the squared differences, and then take the square root.
435435

436-
```{r 05-multiknn-1, echo = FALSE, fig.height = 4, fig.width = 5, fig.cap="Scatter plot of concavity versus perimeter with new observation represented as a red diamond."}
436+
```{r 05-multiknn-1, echo = FALSE, fig.height = 3.5, fig.width = 4.5, fig.cap="Scatter plot of concavity versus perimeter with new observation represented as a red diamond."}
437437
perim_concav <- bind_rows(cancer,
438438
tibble(Perimeter = new_point[1],
439439
Concavity = new_point[2],
@@ -515,7 +515,7 @@ The result of this computation shows that 3 of the 5 nearest neighbors to our ne
515515
malignant (`M`); since this is the majority, we classify our new observation as malignant.
516516
These 5 neighbors are circled in Figure \@ref(fig:05-multiknn-3).
517517

518-
```{r 05-multiknn-3, echo = FALSE, fig.height = 4, fig.width = 5, fig.cap="Scatter plot of concavity versus perimeter with 5 nearest neighbors circled."}
518+
```{r 05-multiknn-3, echo = FALSE, fig.height = 3.5, fig.width = 4.5, fig.cap="Scatter plot of concavity versus perimeter with 5 nearest neighbors circled."}
519519
perim_concav + annotate("path",
520520
x = new_point[1] + 1.4 * cos(seq(0, 2 * pi,
521521
length.out = 100
@@ -999,7 +999,7 @@ ggarrange(unscaled, scaled, ncol = 2, common.legend = TRUE, legend = "bottom")
999999
10001000
```
10011001

1002-
```{r 05-scaling-plt-zoomed, fig.height = 4, fig.width = 10, echo = FALSE, fig.cap = "Close up of three nearest neighbors for unstandardized data."}
1002+
```{r 05-scaling-plt-zoomed, fig.height = 5, fig.width = 10, echo = FALSE, fig.cap = "Close up of three nearest neighbors for unstandardized data."}
10031003
library(ggforce)
10041004
ggplot(unscaled_cancer, aes(x = Area,
10051005
y = Smoothness,
@@ -1031,11 +1031,10 @@ ggplot(unscaled_cancer, aes(x = Area,
10311031
x = unlist(new_obs[1]), y = unlist(new_obs[2]),
10321032
xend = unlist(neighbors[3, attrs[1]]),
10331033
yend = unlist(neighbors[3, attrs[2]])
1034-
), color = "black") + theme_light() +
1035-
# facet_zoom( xlim = c(399.7, 401.6), ylim = c(0.08, 0.14), zoom.size = 2) +
1034+
), color = "black") +
10361035
facet_zoom(x = ( Area > 380 & Area < 420) ,
10371036
y = (Smoothness > 0.08 & Smoothness < 0.14), zoom.size = 2) +
1038-
theme_bw()
1037+
theme_bw() + theme(legend.position="bottom", text = element_text(size = 16))
10391038
```
10401039

10411040
### Balancing
@@ -1060,14 +1059,14 @@ function, which takes two arguments: a data frame-like object,
10601059
and the number of rows to select from the top (`n`).
10611060
The new imbalanced data is shown in Figure \@ref(fig:05-unbalanced).
10621061

1063-
```{r 05-unbalanced-seed, echo = FALSE, fig.height = 4, fig.width = 5, warning = FALSE, message = FALSE}
1062+
```{r 05-unbalanced-seed, echo = FALSE, fig.height = 3.5, fig.width = 4.5, warning = FALSE, message = FALSE}
10641063
# hidden seed here for reproducibility
10651064
# randomness shouldn't affect much in this use of step_upsample,
10661065
# but just in case...
10671066
set.seed(3)
10681067
```
10691068

1070-
```{r 05-unbalanced, fig.height = 4, fig.width = 5, fig.cap = "Imbalanced data."}
1069+
```{r 05-unbalanced, fig.height = 3.5, fig.width = 4.5, fig.cap = "Imbalanced data."}
10711070
rare_cancer <- bind_rows(
10721071
filter(cancer, Class == "B"),
10731072
cancer |> filter(Class == "M") |> slice_head(n = 3)
@@ -1095,7 +1094,7 @@ benign, and the benign vote will always win. For example, Figure \@ref(fig:05-up
10951094
shows what happens for a new tumor observation that is quite close to three observations
10961095
in the training data that were tagged as malignant.
10971096

1098-
```{r 05-upsample, echo=FALSE, fig.height = 4, fig.width = 5, fig.cap = "Imbalanced data with 7 nearest neighbors to a new observation highlighted."}
1097+
```{r 05-upsample, echo=FALSE, fig.height = 3.5, fig.width = 4.5, fig.cap = "Imbalanced data with 7 nearest neighbors to a new observation highlighted."}
10991098
new_point <- c(2, 2)
11001099
attrs <- c("Perimeter", "Concavity")
11011100
my_distances <- table_with_distances(rare_cancer[, attrs], new_point)
@@ -1147,7 +1146,7 @@ each area of the plot to the predictions the $K$-nearest neighbor
11471146
classifier would make. We can see that the decision is
11481147
always "benign," corresponding to the blue color.
11491148

1150-
```{r 05-upsample-2, echo = FALSE, fig.height = 4, fig.width = 5, fig.cap = "Imbalanced data with background color indicating the decision of the classifier and the points represent the labeled data."}
1149+
```{r 05-upsample-2, echo = FALSE, fig.height = 3.5, fig.width = 4.5, fig.cap = "Imbalanced data with background color indicating the decision of the classifier and the points represent the labeled data."}
11511150
11521151
knn_spec <- nearest_neighbor(weight_func = "rectangular", neighbors = 7) |>
11531152
set_engine("kknn") |>
@@ -1225,7 +1224,7 @@ classifier would make. We can see that the decision is more reasonable; when the
12251224
to those labeled malignant, the classifier predicts a malignant tumor, and vice versa when they are
12261225
closer to the benign tumor observations.
12271226

1228-
```{r 05-upsample-plot, echo = FALSE, fig.height = 4, fig.width = 5, fig.cap = "Upsampled data with background color indicating the decision of the classifier."}
1227+
```{r 05-upsample-plot, echo = FALSE, fig.height = 3.5, fig.width = 4.5, fig.cap = "Upsampled data with background color indicating the decision of the classifier."}
12291228
knn_spec <- nearest_neighbor(weight_func = "rectangular", neighbors = 7) |>
12301229
set_engine("kknn") |>
12311230
set_mode("classification")
@@ -1335,7 +1334,7 @@ predict the label of each, and visualize the predictions with a colored scatter
13351334
> textbook. It is included for those readers who would like to use similar
13361335
> visualizations in their own data analyses.
13371336
1338-
```{r 05-workflow-plot-show, fig.height = 4, fig.width = 5, fig.cap = "Scatter plot of smoothness versus area where background color indicates the decision of the classifier."}
1337+
```{r 05-workflow-plot-show, fig.height = 3.5, fig.width = 4.6, fig.cap = "Scatter plot of smoothness versus area where background color indicates the decision of the classifier."}
13391338
# create the grid of area/smoothness vals, and arrange in a data frame
13401339
are_grid <- seq(min(unscaled_cancer$Area),
13411340
max(unscaled_cancer$Area),

0 commit comments

Comments
 (0)