You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: source/regression2.Rmd
+23-10Lines changed: 23 additions & 10 deletions
Original file line number
Diff line number
Diff line change
@@ -54,7 +54,6 @@ By the end of the chapter, readers will be able to do the following:
54
54
* Use R and `tidymodels` to fit a linear regression model on training data.
55
55
* Evaluate the linear regression model on test data.
56
56
* Compare and contrast predictions obtained from K-nearest neighbor regression to those obtained using linear regression from the same data set.
57
-
* In R, overlay predictions from linear regression on a scatter plot of data using `geom_smooth`.
58
57
59
58
## Simple linear regression
60
59
@@ -292,21 +291,35 @@ sale price based off of the predictor of home size? Again, answering this is
292
291
tricky and requires knowledge of how you intend to use the prediction.
293
292
294
293
To visualize the simple linear regression model, we can plot the predicted house
295
-
sale price across all possible house sizes we might encounter superimposed on a scatter
296
-
plot of the original housing price data. There is a plotting function in
297
-
the `tidyverse`, `geom_smooth`, that
298
-
allows us to add a layer on our plot with the simple
299
-
linear regression predicted line of best fit. By default `geom_smooth` adds some other information
300
-
to the plot that we are not interested in at this point; we provide the argument `se = FALSE` to
301
-
tell `geom_smooth` not to show that information. Figure \@ref(fig:08-lm-predict-all) displays the result.
294
+
sale price across all possible house sizes we might encounter.
295
+
Since our model is linear,
296
+
we only need to compute the predicted value of the min and max points,
297
+
and then connect them with a straight line.
298
+
We superimpose this prediction line on a scatter
299
+
plot of the original housing price data,
300
+
so that we can qualitatively assess if the model seems to fit the data well.
301
+
Figure \@ref(fig:08-lm-predict-all) displays the result.
302
302
303
303
```{r 08-lm-predict-all, fig.height = 3.5, fig.width = 4.5, warning = FALSE, fig.pos = "H", out.extra="", message = FALSE, fig.cap = "Scatter plot of sale price versus size with line of best fit for the full Sacramento housing data."}
304
-
lm_plot_final <- ggplot(sacramento_train, aes(x = sqft, y = price)) +
304
+
sqft_prediction_grid <- tibble(
305
+
sqft = c(
306
+
sacramento |> select(sqft) |> min(),
307
+
sacramento |> select(sqft) |> max()
308
+
)
309
+
)
310
+
311
+
sacr_preds <- lm_fit |>
312
+
predict(sqft_prediction_grid) |>
313
+
bind_cols(sqft_prediction_grid)
314
+
315
+
lm_plot_final <- ggplot(sacramento, aes(x = sqft, y = price)) +
0 commit comments