You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: classification2.Rmd
+9-8Lines changed: 9 additions & 8 deletions
Original file line number
Diff line number
Diff line change
@@ -188,7 +188,7 @@ tumor cell concavity versus smoothness colored by diagnosis in Figure \@ref(fig:
188
188
You will also notice that we set the random seed here at the beginning of the analysis
189
189
using the `set.seed` function, as described in Section \@ref(randomseeds).
190
190
191
-
```{r 06-precode, fig.height = 4, fig.width = 5, fig.cap="Scatter plot of tumor cell concavity versus smoothness colored by diagnosis label.", message = F, warning = F}
191
+
```{r 06-precode, fig.height = 3.5, fig.width = 4.5, fig.cap="Scatter plot of tumor cell concavity versus smoothness colored by diagnosis label.", message = F, warning = F}
192
192
# load packages
193
193
library(tidyverse)
194
194
library(tidymodels)
@@ -754,7 +754,7 @@ We can select the best value of the number of neighbors (i.e., the one that resu
754
754
in the highest classifier accuracy estimate) by plotting the accuracy versus $K$
755
755
in Figure \@ref(fig:06-find-k).
756
756
757
-
```{r 06-find-k, fig.height = 4, fig.width = 5, fig.cap= "Plot of estimated accuracy versus the number of neighbors."}
757
+
```{r 06-find-k, fig.height = 3.5, fig.width = 4, fig.cap= "Plot of estimated accuracy versus the number of neighbors."}
758
758
accuracy_vs_k <- ggplot(accuracies, aes(x = neighbors, y = mean)) +
759
759
geom_point() +
760
760
geom_line() +
@@ -800,7 +800,7 @@ we vary $K$ from 1 to almost the number of observations in the data set.
800
800
set.seed(1)
801
801
```
802
802
803
-
```{r 06-lots-of-ks, message = FALSE, fig.height = 4, fig.width = 5, fig.cap="Plot of accuracy estimate versus number of neighbors for many K values."}
803
+
```{r 06-lots-of-ks, message = FALSE, fig.height = 3.5, fig.width = 4, fig.cap="Plot of accuracy estimate versus number of neighbors for many K values."}
804
804
k_lots <- tibble(neighbors = seq(from = 1, to = 385, by = 10))
```{r 06-fixed-irrelevant-features, echo = FALSE, warning = FALSE, fig.retina = 2, out.width = "100%", fig.cap = "Accuracy versus number of irrelevant predictors for tuned and untuned number of neighbors."}
1106
+
```{r 06-fixed-irrelevant-features, echo = FALSE, warning = FALSE, fig.retina = 2, out.width = "75%", fig.cap = "Accuracy versus number of irrelevant predictors for tuned and untuned number of neighbors."}
1106
1107
res_tmp <- res %>% pivot_longer(cols=c("accs", "fixedaccs"),
1107
1108
names_to="Type",
1108
1109
values_to="accuracy")
@@ -1338,7 +1339,7 @@ where the elbow occurs, and whether adding a variable provides a meaningful incr
1338
1339
> part of tuning your classifier, you *cannot use your test data* for this
1339
1340
> process!
1340
1341
1341
-
```{r 06-fwdsel-3, echo = FALSE, warning = FALSE, fig.retina = 2, out.width = "100%", fig.cap = "Estimated accuracy versus the number of predictors for the sequence of models built using forward selection."}
1342
+
```{r 06-fwdsel-3, echo = FALSE, warning = FALSE, fig.retina = 2, out.width = "75%", fig.cap = "Estimated accuracy versus the number of predictors for the sequence of models built using forward selection."}
Copy file name to clipboardExpand all lines: regression1.Rmd
+12-10Lines changed: 12 additions & 10 deletions
Original file line number
Diff line number
Diff line change
@@ -125,7 +125,7 @@ want to predict (sale price) on the y-axis.
125
125
> (from the `scales` package)
126
126
> to the `labels` argument of the `scale_y_continuous` function.
127
127
128
-
```{r 07-edaRegr, message = FALSE, fig.height = 4, fig.width = 5, fig.cap = "Scatter plot of price (USD) versus house size (square feet)."}
128
+
```{r 07-edaRegr, message = FALSE, fig.height = 3.5, fig.width = 4.5, fig.cap = "Scatter plot of price (USD) versus house size (square feet)."}
129
129
eda <- ggplot(sacramento, aes(x = sqft, y = price)) +
130
130
geom_point(alpha = 0.4) +
131
131
xlab("House size (square feet)") +
@@ -179,7 +179,7 @@ you can see that we have no
179
179
observations of a house of size *exactly* 2,000 square feet. How can we predict
180
180
the sale price?
181
181
182
-
```{r 07-small-eda-regr, fig.height = 4, fig.width = 5, fig.cap = "Scatter plot of price (USD) versus house size (square feet) with vertical line indicating 2,000 square feet on x-axis."}
182
+
```{r 07-small-eda-regr, fig.height = 3.5, fig.width = 4.5, fig.cap = "Scatter plot of price (USD) versus house size (square feet) with vertical line indicating 2,000 square feet on x-axis."}
183
183
small_plot <- ggplot(small_sacramento, aes(x = sqft, y = price)) +
```{r 07-predictedViz-knn, echo = FALSE, fig.height = 4, fig.width = 5, fig.cap = "Scatter plot of price (USD) versus house size (square feet) with predicted price for a 2,000 square-foot house based on 5 nearest neighbors represented as a red dot."}
237
+
```{r 07-predictedViz-knn, echo = FALSE, fig.height = 3.5, fig.width = 4.5, fig.cap = "Scatter plot of price (USD) versus house size (square feet) with predicted price for a 2,000 square-foot house based on 5 nearest neighbors represented as a red dot."}
238
238
nn_plot +
239
239
geom_point(aes(x = 2000, y = prediction[[1]]), color = "red", size = 2.5)
240
240
```
@@ -305,7 +305,7 @@ different from the true values, then RMSPE will be quite large. When we
305
305
use cross validation, we will choose the $K$ that gives
306
306
us the smallest RMSPE.
307
307
308
-
```{r 07-verticalerrors, echo = FALSE, message = FALSE, warning = FALSE, fig.cap = "Scatter plot of price (USD) versus house size (square feet) with example predictions (blue line) and the error in those predictions compared with true response values for three selected observations (vertical red lines).", fig.height = 4, fig.width = 5}
308
+
```{r 07-verticalerrors, echo = FALSE, message = FALSE, warning = FALSE, fig.cap = "Scatter plot of price (USD) versus house size (square feet) with example predictions (blue line) and the error in those predictions compared with true response values for three selected observations (vertical red lines).", fig.height = 3.5, fig.width = 4.5}
309
309
# save the seed
310
310
seedval <- .Random.seed
311
311
@@ -434,7 +434,7 @@ sacr_results <- sacr_wkflw |>
434
434
sacr_results
435
435
```
436
436
437
-
```{r 07-choose-k-knn-plot, echo = FALSE, fig.height = 4, fig.width = 5, fig.cap = "Effect of the number of neighbors on the RMSPE."}
437
+
```{r 07-choose-k-knn-plot, echo = FALSE, fig.height = 3.5, fig.width = 4.5, fig.cap = "Effect of the number of neighbors on the RMSPE."}
438
438
sacr_tunek_plot <- ggplot(sacr_results, aes(x = neighbors, y = mean)) +
439
439
geom_point() +
440
440
geom_line() +
@@ -499,7 +499,8 @@ for (i in 1:6) {
499
499
ylab("Price (USD)") +
500
500
scale_y_continuous(labels = dollar_format()) +
501
501
geom_line(data = sacr_preds, aes(x = sqft, y = .pred), color = "blue") +
502
-
ggtitle(paste0("K = ", gridvals[[i]]))
502
+
ggtitle(paste0("K = ", gridvals[[i]])) +
503
+
theme(text = element_text(size = 16))
503
504
} else {
504
505
plots[[i]] <- ggplot(sacr_preds, aes(x = sqft, y = price)) +
505
506
geom_point(alpha = 0.4) +
@@ -510,7 +511,8 @@ for (i in 1:6) {
510
511
mapping = aes(x = sqft),
511
512
yintercept = mean(sacr_preds$price),
512
513
color = "blue") +
513
-
ggtitle(paste0("K = ", gridvals[[i]]))
514
+
ggtitle(paste0("K = ", gridvals[[i]])) +
515
+
theme(text = element_text(size = 16))
514
516
}
515
517
}
516
518
@@ -618,7 +620,7 @@ the range of house sizes we might encounter in the Sacramento area—from 50
618
620
You have already seen a few plots like this in this chapter, but here we also provide the code that generated it
619
621
as a learning challenge.
620
622
621
-
```{r 07-predict-all, warning = FALSE, fig.height = 4, fig.width = 5, fig.cap = "Predicted values of house price (blue line) for the final KNN regression model."}
623
+
```{r 07-predict-all, warning = FALSE, fig.height = 3.5, fig.width = 4.5, fig.cap = "Predicted values of house price (blue line) for the final KNN regression model."}
622
624
sacr_preds <- tibble(sqft = seq(from = 500, to = 5000, by = 10))
623
625
624
626
sacr_preds <- sacr_fit |>
@@ -665,7 +667,7 @@ visualizing the data, before we start modeling the data. Figure \@ref(fig:07-bed
665
667
shows that the number of bedrooms might provide useful information
666
668
to help predict the sale price of a house.
667
669
668
-
```{r 07-bedscatter, fig.height = 5, fig.width = 6, fig.cap = "Scatter plot of the sale price of houses versus the number of bedrooms."}
670
+
```{r 07-bedscatter, fig.height = 3.5, fig.width = 4.5, fig.cap = "Scatter plot of the sale price of houses versus the number of bedrooms."}
0 commit comments