Skip to content

Commit 20301d8

Browse files
seed hacking to get reg1 and reg2 story to align with py
1 parent 0ee258a commit 20301d8

File tree

2 files changed

+8
-9
lines changed

2 files changed

+8
-9
lines changed

source/regression1.Rmd

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -307,7 +307,7 @@ that we used earlier in the chapter (Figure \@ref(fig:07-small-eda-regr)).
307307

308308
```{r 07-sacramento-seed-before-train-test-split, echo = FALSE, message = FALSE, warning = FALSE}
309309
# hidden seed -- make sure this is the same as what appears in reg2 right before train/test split
310-
set.seed(10)
310+
set.seed(7)
311311
```
312312

313313
```{r 07-test-train-split}
@@ -512,13 +512,13 @@ Figure \@ref(fig:07-choose-k-knn-plot). What is happening here?
512512

513513
Figure \@ref(fig:07-howK) visualizes the effect of different settings of $K$ on the
514514
regression model. Each plot shows the predicted values for house sale price from
515-
our KNN regression model on the training data for 6 different values for $K$: 1, 3, `r kmin`, 41, 250, and 680 (almost the entire training set).
515+
our KNN regression model on the training data for 6 different values for $K$: 1, 3, 25, `r kmin`, 250, and 680 (almost the entire training set).
516516
For each model, we predict prices for the range of possible home sizes we
517517
observed in the data set (here 500 to 5,000 square feet) and we plot the
518518
predicted prices as a blue line.
519519

520520
```{r 07-howK, echo = FALSE, warning = FALSE, fig.height = 13, fig.width = 10,fig.cap = "Predicted values for house price (represented as a blue line) from KNN regression models for six different values for $K$."}
521-
gridvals <- c(1, 3, kmin, 41, 250, 680)
521+
gridvals <- c(1, 3, 25, kmin, 250, 680)
522522
523523
plots <- list()
524524

source/regression2.Rmd

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -221,7 +221,7 @@ can come back to after we choose our final model. Let's take care of that now.
221221
library(tidyverse)
222222
library(tidymodels)
223223
224-
set.seed(10)
224+
set.seed(7)
225225
226226
sacramento <- read_csv("data/sacramento.csv")
227227
@@ -350,7 +350,7 @@ obtained from the same problem, shown in Figure \@ref(fig:08-compareRegression).
350350
```{r 08-compareRegression, echo = FALSE, warning = FALSE, message = FALSE, fig.height = 4.75, fig.width = 10, fig.cap = "Comparison of simple linear regression and KNN regression."}
351351
set.seed(1234)
352352
# neighbors = 28 from regression1 chapter
353-
sacr_spec <- nearest_neighbor(weight_func = "rectangular", neighbors = 28) |>
353+
sacr_spec <- nearest_neighbor(weight_func = "rectangular", neighbors = 52) |>
354354
set_engine("kknn") |>
355355
set_mode("regression")
356356
@@ -621,10 +621,9 @@ indicating that we should likely choose linear regression for predictions of
621621
house sale price on this data set. Revisiting the simple linear regression model
622622
with only a single predictor from earlier in this chapter, we see that the RMSPE for that model was
623623
\$`r format(lm_test_results |> filter(.metric == 'rmse') |> pull(.estimate), big.mark=",", nsmall=0, scientific = FALSE)`,
624-
which is slightly higher than that of our more complex model. Our model with two predictors
625-
provided a slightly better fit on test data than our model with just one.
626-
As mentioned earlier, this is not always the case: sometimes including more
627-
predictors can negatively impact the prediction performance on unseen
624+
which is almost the same as that of our more complex model.
625+
As mentioned earlier, this is not always the case: often including more
626+
predictors will either positively or negatively impact the prediction performance on unseen
628627
test data.
629628

630629
## Multicollinearity and outliers

0 commit comments

Comments
 (0)