UBC-DSCI
diff --git a/‎07-regression1.Rmd
Lines changed: 1 addition & 1 deletion b/‎07-regression1.Rmd
Lines changed: 1 addition & 1 deletion
diff --git a/‎08-regression2.Rmd
Lines changed: 12 additions & 13 deletions b/‎08-regression2.Rmd
Lines changed: 12 additions & 13 deletions
diff --git a/‎docs/GitHub.html
Lines changed: 42 additions & 48 deletions b/‎docs/GitHub.html
Lines changed: 42 additions & 48 deletions
diff --git a/‎docs/_main_files/figure-html/08-compareRegression-1.png
-71.4 KB b/‎docs/_main_files/figure-html/08-compareRegression-1.png
-71.4 KB
diff --git a/‎docs/classification.html
Lines changed: 753 additions & 164 deletions b/‎docs/classification.html
Lines changed: 753 additions & 164 deletions
diff --git a/‎docs/clustering.html
Lines changed: 105 additions & 111 deletions b/‎docs/clustering.html
Lines changed: 105 additions & 111 deletions
diff --git a/‎docs/index.html
Lines changed: 43 additions & 49 deletions b/‎docs/index.html
Lines changed: 43 additions & 49 deletions
diff --git a/‎docs/reading.html
Lines changed: 93 additions & 76 deletions b/‎docs/reading.html
Lines changed: 93 additions & 76 deletions
diff --git a/‎docs/regression1.html
Lines changed: 188 additions & 194 deletions b/‎docs/regression1.html
Lines changed: 188 additions & 194 deletions
diff --git a/‎docs/regression2.html
Lines changed: 128 additions & 112 deletions b/‎docs/regression2.html
Lines changed: 128 additions & 112 deletions
@@ -175,7 +175,7 @@ accuracy to see how well our predictions matched the true labels. Here in the
 context of K-NN regression we will use root mean square prediction error
 (RMSPE) instead. The mathematical formula for calculating RMSPE is: 
 
-$$RMSPE = \sqrt{\frac{1}{n}\sum\limits_{i=1}^{n}(y_i - \hat{y}_i)^2}$$
+$$\text{RMSPE} = \sqrt{\frac{1}{n}\sum\limits_{i=1}^{n}(y_i - \hat{y}_i)^2}$$
 
 Where:
 
 
@@ -157,7 +157,7 @@ lm_test_results <- lm_fit %>%
 lm_test_results
 ```
 
-Our final model's test error as assessed by $RMSPE$ 
+Our final model's test error as assessed by RMSPE
 is `r format(round(lm_test_results %>% filter(.metric == 'rmse') %>% pull(.estimate)), scientific=FALSE)`. 
 Remember that this is in units of the target/response variable, and here that
 is US Dollars (USD). Does this mean our model is "good" at predicting house
@@ -413,39 +413,38 @@ get our test error.
 lm_mult_test_results
 ```
 
-We get that the $RMSPE$ for the multivariate linear regression model 
+We get that the RMSPE for the multivariate linear regression model 
 of `r format(lm_mult_test_results %>% filter(.metric == 'rmse') %>% pull(.estimate), scientific = FALSE)`. This prediction error
 is less than the prediction error for the multivariate K-NN regression model,
 indicating that we should likely choose linear regression for predictions of
 house price on this data set. But we should also ask if this more complex
 model is doing a better job of predicting compared to our simple linear
 regression model with only a single predictor (house size). Revisiting last
-section, we see that our $RMSPE$ for our simple linear regression model with
+section, we see that our RMSPE for our simple linear regression model with
 only a single predictor was 
 `r format(lm_test_results %>% filter(.metric == 'rmse') %>% pull(.estimate), scientific = FALSE)`, 
 which is slightly more than that of our more complex model. Our model with two predictors
-provided a slightly lower $RMSPE$ on test data than our model with one. 
+provided a slightly better fit on test data than our model with just one. 
 
-Should we always end up choosing a model with more predictors than fewer? Or perhaps
-always choose a model with fewer than more predictors? The answer is no;
-you never know what model will be the best until you go through the
+But should we always end up choosing a model with more predictors than fewer? 
+ The answer is no; you never know what model will be the best until you go through the
 process of comparing their performance on held-out test data. Exploratory 
 data analysis can give you some hints, but until you look
 at the prediction errors to compare the models you don't really know.
 Additionally, here we compare test errors purely for the purposes of teaching.
-In practice, if you wanted to choose compare several regression models with
-differing numbers of variables to see which performed the best you would use
-cross-validation to choose this (similar to how we use cross validation to
-choose k in K-NN regression). There are several well known and more advanced
+In practice, when  you want to compare several regression models with
+differing numbers of predictor variables, you should use
+cross-validation on the training set only; in this case choosing the model is part
+of tuning, so you cannot use the test data. There are several well known and more advanced
 methods to do this that are beyond the scope of this course, and they include
 backward or forward selection, and L1 or L2 regularization (also known as Lasso
-and Ridge regression, respectively).
+and ridge regression, respectively).
 
 ## The other side of regression
 
 So far in this textbook we have used regression only in the context of
 prediction. However, regression is also a powerful method to understand and/or
-describe the relationship between a quantitative outcome/response variable and
+describe the relationship between a quantitative response variable and
 one or more explanatory variables. Extending the case we have been working with
 in this chapter (where we are interested in house price as the outcome/response
 variable), we might also be interested in describing the