Skip to content

Commit d67b080

Browse files
rebuild
1 parent 40623ab commit d67b080

13 files changed

+1622
-1029
lines changed

07-regression1.Rmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -175,7 +175,7 @@ accuracy to see how well our predictions matched the true labels. Here in the
175175
context of K-NN regression we will use root mean square prediction error
176176
(RMSPE) instead. The mathematical formula for calculating RMSPE is:
177177

178-
$$RMSPE = \sqrt{\frac{1}{n}\sum\limits_{i=1}^{n}(y_i - \hat{y}_i)^2}$$
178+
$$\text{RMSPE} = \sqrt{\frac{1}{n}\sum\limits_{i=1}^{n}(y_i - \hat{y}_i)^2}$$
179179

180180
Where:
181181

08-regression2.Rmd

Lines changed: 12 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -157,7 +157,7 @@ lm_test_results <- lm_fit %>%
157157
lm_test_results
158158
```
159159

160-
Our final model's test error as assessed by $RMSPE$
160+
Our final model's test error as assessed by RMSPE
161161
is `r format(round(lm_test_results %>% filter(.metric == 'rmse') %>% pull(.estimate)), scientific=FALSE)`.
162162
Remember that this is in units of the target/response variable, and here that
163163
is US Dollars (USD). Does this mean our model is "good" at predicting house
@@ -413,39 +413,38 @@ get our test error.
413413
lm_mult_test_results
414414
```
415415

416-
We get that the $RMSPE$ for the multivariate linear regression model
416+
We get that the RMSPE for the multivariate linear regression model
417417
of `r format(lm_mult_test_results %>% filter(.metric == 'rmse') %>% pull(.estimate), scientific = FALSE)`. This prediction error
418418
is less than the prediction error for the multivariate K-NN regression model,
419419
indicating that we should likely choose linear regression for predictions of
420420
house price on this data set. But we should also ask if this more complex
421421
model is doing a better job of predicting compared to our simple linear
422422
regression model with only a single predictor (house size). Revisiting last
423-
section, we see that our $RMSPE$ for our simple linear regression model with
423+
section, we see that our RMSPE for our simple linear regression model with
424424
only a single predictor was
425425
`r format(lm_test_results %>% filter(.metric == 'rmse') %>% pull(.estimate), scientific = FALSE)`,
426426
which is slightly more than that of our more complex model. Our model with two predictors
427-
provided a slightly lower $RMSPE$ on test data than our model with one.
427+
provided a slightly better fit on test data than our model with just one.
428428

429-
Should we always end up choosing a model with more predictors than fewer? Or perhaps
430-
always choose a model with fewer than more predictors? The answer is no;
431-
you never know what model will be the best until you go through the
429+
But should we always end up choosing a model with more predictors than fewer?
430+
The answer is no; you never know what model will be the best until you go through the
432431
process of comparing their performance on held-out test data. Exploratory
433432
data analysis can give you some hints, but until you look
434433
at the prediction errors to compare the models you don't really know.
435434
Additionally, here we compare test errors purely for the purposes of teaching.
436-
In practice, if you wanted to choose compare several regression models with
437-
differing numbers of variables to see which performed the best you would use
438-
cross-validation to choose this (similar to how we use cross validation to
439-
choose k in K-NN regression). There are several well known and more advanced
435+
In practice, when you want to compare several regression models with
436+
differing numbers of predictor variables, you should use
437+
cross-validation on the training set only; in this case choosing the model is part
438+
of tuning, so you cannot use the test data. There are several well known and more advanced
440439
methods to do this that are beyond the scope of this course, and they include
441440
backward or forward selection, and L1 or L2 regularization (also known as Lasso
442-
and Ridge regression, respectively).
441+
and ridge regression, respectively).
443442

444443
## The other side of regression
445444

446445
So far in this textbook we have used regression only in the context of
447446
prediction. However, regression is also a powerful method to understand and/or
448-
describe the relationship between a quantitative outcome/response variable and
447+
describe the relationship between a quantitative response variable and
449448
one or more explanatory variables. Extending the case we have been working with
450449
in this chapter (where we are interested in house price as the outcome/response
451450
variable), we might also be interested in describing the

docs/GitHub.html

Lines changed: 42 additions & 48 deletions
Large diffs are not rendered by default.
-71.4 KB
Loading

docs/classification.html

Lines changed: 753 additions & 164 deletions
Large diffs are not rendered by default.

docs/clustering.html

Lines changed: 105 additions & 111 deletions
Large diffs are not rendered by default.

docs/index.html

Lines changed: 43 additions & 49 deletions
Large diffs are not rendered by default.

docs/reading.html

Lines changed: 93 additions & 76 deletions
Large diffs are not rendered by default.

docs/regression1.html

Lines changed: 188 additions & 194 deletions
Large diffs are not rendered by default.

docs/regression2.html

Lines changed: 128 additions & 112 deletions
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)