You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
```{r 08-lin-reg1, message = FALSE, warning = FALSE, echo = FALSE, fig.height = 4, fig.width = 5, fig.cap = "Scatter plot of price (USD) versus house size (square footage) with line of best fit for subset of the Sacramento housing data set"}
```{r 08-lin-reg2, message = FALSE, warning = FALSE, echo = FALSE, fig.height = 4, fig.width = 5, fig.cap = "Scatter plot of price (USD) versus house size (square footage) with line of best fit and predicted price for a 2000 square foot home represented as a red dot"}
75
75
small_model <- lm(price ~ sqft, data = small_sacramento)
```{r 08-several-lines, echo = FALSE, message = FALSE, warning = FALSE, fig.height = 4, fig.width = 5, fig.cap = "Scatter plot of price (USD) versus house size (square footage) with many possible lines that could be drawn through the data points"}
94
94
95
-
small_plot +
95
+
small_plot +
96
96
geom_abline(intercept = -64542.23, slope = 190, color = "green") +
97
97
geom_abline(intercept = -6900, slope = 175, color = "purple") +
98
-
geom_abline(intercept = -64542.23, slope = 160, color = "red")
98
+
geom_abline(intercept = -64542.23, slope = 160, color = "red")
99
99
```
100
100
101
101
Simple linear regression chooses the straight line of best fit by choosing
@@ -105,13 +105,11 @@ line. What exactly do we mean by the vertical distance between the predicted
105
105
values (which fall along the line of best fit) and the observed data points?
106
106
We illustrate these distances in the plot below with a red line:
```{r 08-verticalDistToMin, echo = FALSE, message = FALSE, warning = FALSE, fig.height = 4, fig.width = 5, fig.cap = "Scatter plot of price (USD) versus house size (square footage) with the vertical distances between the predicted values and the observed data points"}
```{r 08-lm-predict-all, fig.height = 4, fig.width = 5, warning = FALSE, message = FALSE, fig.cap = "Scatter plot of price (USD) versus house size (square footage) with line of best fit for complete Sacramento housing data set"}
200
198
201
199
lm_plot_final <- ggplot(sacramento_train, aes(x = sqft, y = price)) +
202
-
geom_point(alpha = 0.4) +
203
-
xlab("House size (square footage)") +
204
-
ylab("Price (USD)") +
205
-
scale_y_continuous(labels = dollar_format()) +
206
-
geom_smooth(method = "lm", se = FALSE)
200
+
geom_point(alpha = 0.4) +
201
+
xlab("House size (square footage)") +
202
+
ylab("Price (USD)") +
203
+
scale_y_continuous(labels = dollar_format()) +
204
+
geom_smooth(method = "lm", se = FALSE)
207
205
lm_plot_final
208
206
```
209
207
@@ -226,50 +224,51 @@ simple linear regression model predictions for the Sacramento real estate data
226
224
(predicting price from house size) and the "best" K-NN regression model
```{r 08-3DlinReg, echo = FALSE, message = FALSE, warning = FALSE, fig.cap = "Simple linear regression model’s predictions represented as a plane overlaid on top of the data using three predictors (price, house size, and the number of bedrooms)"}
@@ -470,7 +475,7 @@ quantifying how big each of these effects are, and assessing how accurately we
470
475
can estimate each of these effects. This side of regression is the topic of
471
476
many follow-on statistics courses and beyond the scope of this course.
472
477
473
-
## Additional readings/resources
478
+
## Additional resources
474
479
- Pages 59-71 of [Introduction to Statistical Learning](http://www-bcf.usc.edu/~gareth/ISL/ISLR%20Seventh%20Printing.pdf) with Applications in R by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani
475
480
- Pages 104 - 109 of [An Introduction to Statistical Learning with Applications in R](https://www-bcf.usc.edu/~gareth/ISL/ISLR%20Seventh%20Printing.pdf) by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani
0 commit comments