@@ -129,7 +129,7 @@ $\beta_1$ as the increase in price for each square foot of space.
129
129
Let's push this thought even further: what would happen in the equation for the line if you
130
130
tried to evaluate the price of a house with size 6 * million* square feet?
131
131
Or what about * negative* 2,000 square feet? As it turns out, nothing in the formula breaks; linear
132
- regression will happily make predictions for crazy predictor values if you ask it to. But even though
132
+ regression will happily make predictions for different predictor values if you ask it to. But even though
133
133
you * can* make these wild predictions, you shouldn't. You should only make predictions roughly within
134
134
the range of your original data, and perhaps a bit beyond it only if it makes sense. For example,
135
135
the data in Figure \@ ref(fig:08-lin-reg1) only reaches around 800 square feet on the low end, but
@@ -163,7 +163,7 @@ small_plot +
163
163
164
164
By using simple linear regression on this small data set to predict the sale price
165
165
for a 2,000 square-foot house, we get a predicted value of
166
- \$ ` r format(round(prediction[[1]]), big.mark=",", nsmall=0, scientific = FALSE) ` . But wait a minute...how
166
+ \$ ` r format(round(prediction[[1]]), big.mark=",", nsmall=0, scientific = FALSE) ` . But wait a minute... how
167
167
exactly does simple linear regression choose the line of best fit? Many
168
168
different lines could be drawn through the data points.
169
169
Some plausible examples are shown in Figure \@ ref(fig:08-several-lines).
@@ -783,11 +783,11 @@ has regression coefficients that are very sensitive to the exact values in the d
783
783
if we change the data ever so slightly&mdash ; e.g., by running cross-validation, which splits
784
784
up the data randomly into different chunks&mdash ; the coefficients vary by large amounts:
785
785
786
- Best Fit 1: $\text{house sale price} = ` r icept1 ` + ` r sqft1 ` \cdot (\text{house size 1 (ft$^2$)}) + ` r sqft11 ` \cdot (\text{house size 2 (ft$^2$)}).$
786
+ Best Fit 1: $\text{house sale price} = ` r icept1 ` + ( ` r sqft1 ` ) \cdot (\text{house size 1 (ft$^2$)}) + ( ` r sqft11 ` ) \cdot (\text{house size 2 (ft$^2$)}).$
787
787
788
- Best Fit 2: $\text{house sale price} = ` r icept2 ` + ` r sqft2 ` \cdot (\text{house size 1 (ft$^2$)}) + ` r sqft22 ` \cdot (\text{house size 2 (ft$^2$)}).$
788
+ Best Fit 2: $\text{house sale price} = ` r icept2 ` + ( ` r sqft2 ` ) \cdot (\text{house size 1 (ft$^2$)}) + ( ` r sqft22 ` ) \cdot (\text{house size 2 (ft$^2$)}).$
789
789
790
- Best Fit 3: $\text{house sale price} = ` r icept3 ` + ` r sqft3 ` \cdot (\text{house size 1 (ft$^2$)}) + ` r sqft33 ` \cdot (\text{house size 2 (ft$^2$)}).$
790
+ Best Fit 3: $\text{house sale price} = ` r icept3 ` + ( ` r sqft3 ` ) \cdot (\text{house size 1 (ft$^2$)}) + ( ` r sqft33 ` ) \cdot (\text{house size 2 (ft$^2$)}).$
791
791
792
792
Therefore, when performing multivariable linear regression, it is important to avoid including very
793
793
linearly related predictors. However, techniques for doing so are beyond the scope of this
0 commit comments