Skip to content

Commit 6edfb12

Browse files
committed
fixing figure caption
1 parent 534cc78 commit 6edfb12

File tree

1 file changed

+15
-13
lines changed

1 file changed

+15
-13
lines changed

regression2.Rmd

Lines changed: 15 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -72,7 +72,7 @@ to draw the straight line of best fit through our existing data points.
7272
The small subset of data as well as the line of best fit are shown
7373
in Figure \@ref(fig:08-lin-reg1).
7474

75-
```{r 08-lin-reg1, message = FALSE, warning = FALSE, echo = FALSE, fig.height = 4, fig.width = 5, fig.cap = "Scatter plot of sale price versus size with line of best fit for subset of the Sacramento housing data."}
75+
```{r 08-lin-reg1, message = FALSE, warning = FALSE, echo = FALSE, fig.height = 3.5, fig.width = 4.5, fig.cap = "Scatter plot of sale price versus size with line of best fit for subset of the Sacramento housing data."}
7676
library(tidyverse)
7777
library(tidymodels)
7878
library(scales)
@@ -122,7 +122,7 @@ above to evaluate the predicted sale price given the value we have for the
122122
predictor variable—here 2,000 square feet. Figure
123123
\@ref(fig:08-lin-reg2) demonstrates this process.
124124

125-
```{r 08-lin-reg2, message = FALSE, warning = FALSE, echo = FALSE, fig.height = 4, fig.width = 5, fig.cap = "Scatter plot of sale price versus size with line of best fit and a red dot at the predicted sale price for a 2000 square foot home."}
125+
```{r 08-lin-reg2, message = FALSE, warning = FALSE, echo = FALSE, fig.height = 3.5, fig.width = 4.5, fig.cap = "Scatter plot of sale price versus size with line of best fit and a red dot at the predicted sale price for a 2000 square foot home."}
126126
small_model <- lm(price ~ sqft, data = small_sacramento)
127127
prediction <- predict(small_model, data.frame(sqft = 2000))
128128
@@ -150,7 +150,7 @@ exactly does simple linear regression choose the line of best fit? Many
150150
different lines could be drawn through the data points.
151151
Some plausible examples are shown in Figure \@ref(fig:08-several-lines).
152152

153-
```{r 08-several-lines, echo = FALSE, message = FALSE, warning = FALSE, fig.height = 4, fig.width = 5, fig.cap = "Scatter plot of sale price versus size with many possible lines that could be drawn through the data points."}
153+
```{r 08-several-lines, echo = FALSE, message = FALSE, warning = FALSE, fig.height = 3.5, fig.width = 4.5, fig.cap = "Scatter plot of sale price versus size with many possible lines that could be drawn through the data points."}
154154
small_plot +
155155
geom_abline(intercept = -64542.23, slope = 190, color = "green") +
156156
geom_abline(intercept = -6900, slope = 175, color = "purple") +
@@ -165,7 +165,7 @@ accuracy of a simple linear regression model,
165165
we use RMSPE&mdash;the same measure of predictive performance we used with KNN regression.
166166
\index{RMSPE}
167167

168-
```{r 08-verticalDistToMin, echo = FALSE, message = FALSE, warning = FALSE, fig.height = 4, fig.width = 5, fig.cap = "Scatter plot of sale price versus size with red lines denoting the vertical distances between the predicted values and the observed data points."}
168+
```{r 08-verticalDistToMin, echo = FALSE, message = FALSE, warning = FALSE, fig.height = 3.5, fig.width = 4.5, fig.cap = "Scatter plot of sale price versus size with red lines denoting the vertical distances between the predicted values and the observed data points."}
169169
small_sacramento <- small_sacramento |>
170170
mutate(predicted = predict(small_model))
171171
@@ -206,7 +206,7 @@ sacramento_test <- testing(sacramento_split)
206206

207207
Now that we have our training data, we will create the model specification
208208
and recipe, and fit our simple linear regression model:
209-
```{r 08-fitLM, fig.height = 4, fig.width = 5}
209+
```{r 08-fitLM, fig.height = 3.5, fig.width = 4.5}
210210
lm_spec <- linear_reg() |>
211211
set_engine("lm") |>
212212
set_mode("regression")
@@ -268,7 +268,7 @@ linear regression predicted line of best fit. By default `geom_smooth` adds some
268268
to the plot that we are not interested in at this point; we provide the argument `se = FALSE` to
269269
tell `geom_smooth` not to show that information. Figure \@ref(fig:08-lm-predict-all) displays the result.
270270

271-
```{r 08-lm-predict-all, fig.height = 4, fig.width = 5, warning = FALSE, message = FALSE, fig.cap = "Scatter plot of sale price versus size with line of best fit for the full Sacramento housing data."}
271+
```{r 08-lm-predict-all, fig.height = 3.5, fig.width = 4.5, warning = FALSE, message = FALSE, fig.cap = "Scatter plot of sale price versus size with line of best fit for the full Sacramento housing data."}
272272
lm_plot_final <- ggplot(sacramento_train, aes(x = sqft, y = price)) +
273273
geom_point(alpha = 0.4) +
274274
xlab("House size (square feet)") +
@@ -344,7 +344,8 @@ knn_plot_final <- ggplot(sacr_preds, aes(x = sqft, y = price)) +
344344
scale_y_continuous(labels = dollar_format()) +
345345
geom_line(data = sacr_preds, aes(x = sqft, y = .pred), color = "blue") +
346346
ggtitle("KNN regression") +
347-
annotate("text", x = 3500, y = 100000, label = paste("RMSPE =", sacr_rmspe))
347+
annotate("text", x = 3500, y = 100000, label = paste("RMSPE =", sacr_rmspe)) +
348+
theme(text = element_text(size = 14))
348349
349350
lm_rmspe <- lm_test_results |>
350351
filter(.metric == "rmse") |>
@@ -353,7 +354,8 @@ lm_rmspe <- lm_test_results |>
353354
354355
lm_plot_final <- lm_plot_final +
355356
annotate("text", x = 3500, y = 100000, label = paste("RMSPE =", lm_rmspe)) +
356-
ggtitle("linear regression")
357+
ggtitle("linear regression") +
358+
theme(text = element_text(size = 14))
357359
358360
grid.arrange(lm_plot_final, knn_plot_final, ncol = 2)
359361
```
@@ -597,7 +599,7 @@ the data point is an *outlier*. In blue we plot the original line of best fit, a
597599
we plot the new line of best fit including the outlier. You can see how different the red line
598600
is from the blue line, which is entirely caused by that one extra outlier data point.
599601

600-
```{r 08-lm-outlier, fig.height = 4, fig.width = 5, message = FALSE, warning = FALSE, echo = FALSE, fig.cap = "Scatter plot of a subset of the data, with outlier highlighted in red."}
602+
```{r 08-lm-outlier, fig.height = 3.5, fig.width = 4.5, message = FALSE, warning = FALSE, echo = FALSE, fig.cap = "Scatter plot of a subset of the data, with outlier highlighted in red."}
601603
sacramento_train_small <- sacramento_train |> sample_n(100)
602604
sacramento_outlier <- tibble(sqft = 5000, price = 50000)
603605
@@ -626,7 +628,7 @@ changes much less when adding the outlier.
626628
Nevertheless, it is still important when working with linear regression to critically
627629
think about how much any individual data point is influencing the model.
628630

629-
```{r 08-lm-outlier-2, fig.height = 4, fig.width = 5, warning = FALSE, message = FALSE, echo = FALSE, fig.cap = "Scatter plot of the full data, with outlier highlighted in red."}
631+
```{r 08-lm-outlier-2, fig.height = 3.5, fig.width = 4.5, warning = FALSE, message = FALSE, echo = FALSE, fig.cap = "Scatter plot of the full data, with outlier highlighted in red."}
630632
sacramento_outlier <- tibble(sqft = 5000, price = 50000)
631633
632634
lm_plot_outlier_large <- ggplot(sacramento_train, aes(x = sqft, y = price)) +
@@ -660,7 +662,7 @@ Since the two people are each slightly inaccurate, the two measurements might
660662
not agree exactly, but they are very strongly linearly related to each other,
661663
as shown in Figure \@ref(fig:08-lm-multicol).
662664

663-
```{r 08-lm-multicol, fig.height = 4, fig.width = 5, warning = FALSE, echo = FALSE, fig.cap = "Scatter plot of the with possible outlier highlighted in red."}
665+
```{r 08-lm-multicol, fig.height = 3.5, fig.width = 4.5, warning = FALSE, echo = FALSE, fig.cap = "Scatter plot of house size (in square inches) versus house size (in square feet)."}
664666
sacramento_train <- sacramento_train |>
665667
mutate(sqft1 = sqft + 100 * sample(1000000,
666668
size=nrow(sacramento_train),
@@ -793,7 +795,7 @@ df <- df |>
793795
df
794796
```
795797

796-
```{r 08-predictor-design, message = FALSE, warning = FALSE, echo = FALSE, fig.height = 4, fig.width = 5, fig.cap = "Example of a data set with a nonlinear relationship between the predictor and the response."}
798+
```{r 08-predictor-design, message = FALSE, warning = FALSE, echo = FALSE, fig.height = 3.5, fig.width = 4.5, fig.cap = "Example of a data set with a nonlinear relationship between the predictor and the response."}
797799
curve_plt <- ggplot(df, aes(x = x, y = y)) +
798800
geom_point() +
799801
xlab("x") +
@@ -820,7 +822,7 @@ Note that none of the `y` response values have changed between Figures \@ref(fig
820822
and \@ref(fig:08-predictor-design-2); the only change is that the `x` values
821823
have been replaced by `z` values.
822824

823-
```{r 08-predictor-design-2, message = FALSE, warning = FALSE, echo = FALSE, fig.height = 4, fig.width = 5, fig.cap = "Relationship between the transformed predictor and the response."}
825+
```{r 08-predictor-design-2, message = FALSE, warning = FALSE, echo = FALSE, fig.height = 3.5, fig.width = 4.5, fig.cap = "Relationship between the transformed predictor and the response."}
824826
curve_plt2 <- ggplot(df, aes(x = z, y = y)) +
825827
geom_point() +
826828
xlab(paste0("z = ", expression(x^3))) +

0 commit comments

Comments
 (0)