seed hacking to get reg1 and reg2 story to align with py

trevorcampbell · trevorcampbell · commit 20301d82e2c1 · 2023-11-13T18:12:43.000-08:00
diff --git a/source/regression1.Rmd b/source/regression1.Rmd
@@ -307,7 +307,7 @@ that we used earlier in the chapter (Figure \@ref(fig:07-small-eda-regr)).
 
 ```{r 07-sacramento-seed-before-train-test-split, echo = FALSE, message = FALSE, warning = FALSE}
 # hidden seed -- make sure this is the same as what appears in reg2 right before train/test split
-set.seed(10)
+set.seed(7)
 ```
 
 ```{r 07-test-train-split}
@@ -512,13 +512,13 @@ Figure \@ref(fig:07-choose-k-knn-plot). What is happening here?
 
 Figure \@ref(fig:07-howK) visualizes the effect of different settings of $K$ on the
 regression model. Each plot shows the predicted values for house sale price from
-our KNN regression model on the training data for 6 different values for $K$: 1, 3, `r kmin`, 41, 250, and 680 (almost the entire training set).
+our KNN regression model on the training data for 6 different values for $K$: 1, 3, 25, `r kmin`, 250, and 680 (almost the entire training set).
 For each model, we predict prices for the range of possible home sizes we
 observed in the data set (here 500 to 5,000 square feet) and we plot the
 predicted prices as a blue line.
 
 ```{r 07-howK, echo = FALSE, warning = FALSE, fig.height = 13, fig.width = 10,fig.cap = "Predicted values for house price (represented as a blue line) from KNN regression models for six different values for $K$."}
-gridvals <- c(1, 3, kmin, 41, 250, 680)
+gridvals <- c(1, 3, 25, kmin, 250, 680)
 
 plots <- list()
 
diff --git a/source/regression2.Rmd b/source/regression2.Rmd
@@ -221,7 +221,7 @@ can come back to after we choose our final model. Let's take care of that now.
 library(tidyverse)
 library(tidymodels)
 
-set.seed(10)
+set.seed(7)
 
 sacramento <- read_csv("data/sacramento.csv")
 
@@ -350,7 +350,7 @@ obtained from the same problem, shown in Figure \@ref(fig:08-compareRegression).
 ```{r 08-compareRegression, echo = FALSE, warning = FALSE, message = FALSE, fig.height = 4.75, fig.width = 10, fig.cap = "Comparison of simple linear regression and KNN regression."}
 set.seed(1234)
 # neighbors = 28 from regression1 chapter
-sacr_spec <- nearest_neighbor(weight_func = "rectangular", neighbors = 28) |>
+sacr_spec <- nearest_neighbor(weight_func = "rectangular", neighbors = 52) |>
   set_engine("kknn") |>
   set_mode("regression")
 
@@ -621,10 +621,9 @@ indicating that we should likely choose linear regression for predictions of
 house sale price on this data set. Revisiting the simple linear regression model
 with only a single predictor from earlier in this chapter, we see that the RMSPE for that model was 
 \$`r format(lm_test_results |> filter(.metric == 'rmse') |> pull(.estimate), big.mark=",", nsmall=0, scientific = FALSE)`,
-which is slightly higher than that of our more complex model. Our model with two predictors
-provided a slightly better fit on test data than our model with just one. 
-As mentioned earlier, this is not always the case: sometimes including more
-predictors can negatively impact the prediction performance on unseen 
+which is almost the same as that of our more complex model. 
+As mentioned earlier, this is not always the case: often including more
+predictors will either positively or negatively impact the prediction performance on unseen 
 test data.
 
 ## Multicollinearity and outliers