@@ -157,7 +157,7 @@ lm_test_results <- lm_fit %>%
157
157
lm_test_results
158
158
```
159
159
160
- Our final model's test error as assessed by $ RMSPE$
160
+ Our final model's test error as assessed by RMSPE
161
161
is ` r format(round(lm_test_results %>% filter(.metric == 'rmse') %>% pull(.estimate)), scientific=FALSE) ` .
162
162
Remember that this is in units of the target/response variable, and here that
163
163
is US Dollars (USD). Does this mean our model is "good" at predicting house
@@ -413,39 +413,38 @@ get our test error.
413
413
lm_mult_test_results
414
414
```
415
415
416
- We get that the $ RMSPE$ for the multivariate linear regression model
416
+ We get that the RMSPE for the multivariate linear regression model
417
417
of ` r format(lm_mult_test_results %>% filter(.metric == 'rmse') %>% pull(.estimate), scientific = FALSE) ` . This prediction error
418
418
is less than the prediction error for the multivariate K-NN regression model,
419
419
indicating that we should likely choose linear regression for predictions of
420
420
house price on this data set. But we should also ask if this more complex
421
421
model is doing a better job of predicting compared to our simple linear
422
422
regression model with only a single predictor (house size). Revisiting last
423
- section, we see that our $ RMSPE$ for our simple linear regression model with
423
+ section, we see that our RMSPE for our simple linear regression model with
424
424
only a single predictor was
425
425
` r format(lm_test_results %>% filter(.metric == 'rmse') %>% pull(.estimate), scientific = FALSE) ` ,
426
426
which is slightly more than that of our more complex model. Our model with two predictors
427
- provided a slightly lower $RMSPE$ on test data than our model with one.
427
+ provided a slightly better fit on test data than our model with just one.
428
428
429
- Should we always end up choosing a model with more predictors than fewer? Or perhaps
430
- always choose a model with fewer than more predictors? The answer is no;
431
- you never know what model will be the best until you go through the
429
+ But should we always end up choosing a model with more predictors than fewer?
430
+ The answer is no; you never know what model will be the best until you go through the
432
431
process of comparing their performance on held-out test data. Exploratory
433
432
data analysis can give you some hints, but until you look
434
433
at the prediction errors to compare the models you don't really know.
435
434
Additionally, here we compare test errors purely for the purposes of teaching.
436
- In practice, if you wanted to choose compare several regression models with
437
- differing numbers of variables to see which performed the best you would use
438
- cross-validation to choose this (similar to how we use cross validation to
439
- choose k in K-NN regression) . There are several well known and more advanced
435
+ In practice, when you want to compare several regression models with
436
+ differing numbers of predictor variables, you should use
437
+ cross-validation on the training set only; in this case choosing the model is part
438
+ of tuning, so you cannot use the test data . There are several well known and more advanced
440
439
methods to do this that are beyond the scope of this course, and they include
441
440
backward or forward selection, and L1 or L2 regularization (also known as Lasso
442
- and Ridge regression, respectively).
441
+ and ridge regression, respectively).
443
442
444
443
## The other side of regression
445
444
446
445
So far in this textbook we have used regression only in the context of
447
446
prediction. However, regression is also a powerful method to understand and/or
448
- describe the relationship between a quantitative outcome/ response variable and
447
+ describe the relationship between a quantitative response variable and
449
448
one or more explanatory variables. Extending the case we have been working with
450
449
in this chapter (where we are interested in house price as the outcome/response
451
450
variable), we might also be interested in describing the
0 commit comments