Skip to content

Commit 178d846

Browse files
Merge pull request #554 from UBC-DSCI/dollar-sign-fix
Fixing dollar sign typesetting in inference, reg1, reg2
2 parents fc3bf2a + 53a7fc8 commit 178d846

File tree

3 files changed

+18
-17
lines changed

3 files changed

+18
-17
lines changed

source/inference.Rmd

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -174,7 +174,7 @@ population_proportion <- airbnb |>
174174
```
175175

176176
We can see that the proportion of `Entire home/apt` listings in
177-
the data set is `r round(population_proportion,3)`. This
177+
the data set is `r round(population_proportion,3)`. This
178178
value, `r round(population_proportion,3)`, is the population parameter. Remember, this
179179
parameter value is usually unknown in real data analysis problems, as it is
180180
typically not possible to make measurements for an entire population.
@@ -398,7 +398,7 @@ estimates
398398
```
399399

400400
The average value of the sample of size 40
401-
is \$`r round(estimates$mean_price, 2)`. This
401+
is \$`r format(round(estimates$mean_price, 2), nsmall=2)`. This
402402
number is a point estimate for the mean of the full population.
403403
Recall that the population mean was
404404
\$`r round(population_parameters$mean_price,2)`. So our estimate was fairly close to
@@ -771,7 +771,7 @@ and use a bootstrap distribution using just a single sample from the population.
771771
Once again, suppose we are
772772
interested in estimating the population mean price per night of all Airbnb
773773
listings in Vancouver, Canada, using a single sample size of 40.
774-
Recall our point estimate was \$`r round(estimates$mean_price, 2)`. The
774+
Recall our point estimate was \$`r format(round(estimates$mean_price, 2), nsmall=2)`. The
775775
histogram of prices in the sample is displayed in Figure \@ref(fig:11-bootstrapping1).
776776

777777
```{r, echo = F, message = F, warning = F}
@@ -791,7 +791,7 @@ one_sample_dist
791791
```
792792

793793
The histogram for the sample is skewed, with a few observations out to the right. The
794-
mean of the sample is \$`r round(estimates$mean_price, 2)`.
794+
mean of the sample is \$`r format(round(estimates$mean_price, 2), nsmall=2)`.
795795
Remember, in practice, we usually only have this one sample from the population. So
796796
this sample and estimate are the only data we can work with.
797797

@@ -1114,7 +1114,8 @@ To calculate a 95\% percentile bootstrap confidence interval, we will do the fol
11141114

11151115
\newpage
11161116

1117-
To do this in R, we can use the `quantile()` function:
1117+
To do this in R, we can use the `quantile()` function. Quantiles are expressed in proportions rather than
1118+
percentages, so the 2.5th and 97.5th percentiles would be the 0.025 and 0.975 quantiles, respectively.
11181119
\index{quantile}
11191120
\index{pull}
11201121
\index{select}
@@ -1149,9 +1150,9 @@ boot_est_dist +
11491150
To finish our estimation of the population parameter, we would report the point
11501151
estimate and our confidence interval's lower and upper bounds. Here the sample
11511152
mean price per night of 40 Airbnb listings was
1152-
\$`r round(mean(one_sample$price),2)`, and we are 95\% "confident" that the true
1153+
\$`r format(round(mean(one_sample$price),2), nsmall=2)`, and we are 95\% "confident" that the true
11531154
population mean price per night for all Airbnb listings in Vancouver is between
1154-
\$(`r round(bounds[1],2)`, `r round(bounds[2],2)`).
1155+
\$`r round(bounds[1],2)` and \$`r round(bounds[2],2)`.
11551156
Notice that our interval does indeed contain the true
11561157
population mean value, \$`r round(mean(airbnb$price),2)`\! However, in
11571158
practice, we would not know whether our interval captured the population

source/regression1.Rmd

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -456,8 +456,8 @@ the model and returns the RMSPE for each number of neighbors. In the output of t
456456
results data frame, we see that the `neighbors` variable contains the value of $K$,
457457
the mean (`mean`) contains the value of the RMSPE estimated via cross-validation,
458458
and the standard error (`std_err`) contains a value corresponding to a measure of how uncertain we are in the mean value. A detailed treatment of this
459-
is beyond the scope of this chapter; but roughly, if your estimated mean is 100,000 and standard
460-
error is 1,000, you can expect the *true* RMSPE to be somewhere roughly between 99,000 and 101,000 (although it may
459+
is beyond the scope of this chapter; but roughly, if your estimated mean RMSPE is \$100,000 and standard
460+
error is \$1,000, you can expect the *true* RMSPE to be somewhere roughly between \$99,000 and \$101,000 (although it may
461461
fall outside this range). You may ignore the other columns in the metrics data frame,
462462
as they do not provide any additional insight.
463463
\index{cross-validation!collect\_metrics}
@@ -762,10 +762,10 @@ If we want to compare this multivariable KNN regression model to the model with
762762
predictor *as part of the model tuning process* (e.g., if we are running forward selection as described
763763
in the chapter on evaluating and tuning classification models),
764764
then we must compare the RMSPE estimated using only the training data via cross-validation.
765-
Looking back, the estimated cross-validation RMSPE for the single-predictor
766-
model was `r format(round(sacr_min$mean), big.mark=",", nsmall=0, scientific = FALSE)`.
765+
Looking back, the estimated cross-validation RMSPE for the single-predictor
766+
model was \$`r format(round(sacr_min$mean), big.mark=",", nsmall=0, scientific = FALSE)`.
767767
The estimated cross-validation RMSPE for the multivariable model is
768-
`r format(round(sacr_multi$mean), big.mark=",", nsmall=0, scientific = FALSE)`.
768+
\$`r format(round(sacr_multi$mean), big.mark=",", nsmall=0, scientific = FALSE)`.
769769
Thus in this case, we did not improve the model
770770
by a large amount by adding this additional predictor.
771771

@@ -797,7 +797,7 @@ knn_mult_mets
797797

798798
This time, when we performed KNN regression on the same data set, but also
799799
included number of bedrooms as a predictor, we obtained a RMSPE test error
800-
of `r format(round(knn_mult_mets |> pull(.estimate)), big.mark=",", nsmall=0, scientific=FALSE)`.
800+
of \$`r format(round(knn_mult_mets |> pull(.estimate)), big.mark=",", nsmall=0, scientific=FALSE)`.
801801
Figure \@ref(fig:07-knn-mult-viz) visualizes the model's predictions overlaid on top of the data. This
802802
time the predictions are a surface in 3D space, instead of a line in 2D space, as we have 2
803803
predictors instead of 1.

source/regression2.Rmd

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -284,7 +284,7 @@ lm_test_results
284284
```
285285

286286
Our final model's test error as assessed by RMSPE \index{RMSPE}
287-
is `r format(round(lm_test_results |> filter(.metric == 'rmse') |> pull(.estimate)), big.mark=",", nsmall=0, scientific=FALSE)`.
287+
is \$`r format(round(lm_test_results |> filter(.metric == 'rmse') |> pull(.estimate)), big.mark=",", nsmall=0, scientific=FALSE)`.
288288
Remember that this is in units of the response variable, and here that
289289
is US Dollars (USD). Does this mean our model is "good" at predicting house
290290
sale price based off of the predictor of home size? Again, answering this is
@@ -504,7 +504,7 @@ lm_mult_test_results
504504
```
505505

506506
Our model's test error as assessed by RMSPE
507-
is `r format(round(lm_mult_test_results |> filter(.metric == 'rmse') |> pull(.estimate)), big.mark=",", nsmall=0, scientific=FALSE)`.
507+
is \$`r format(round(lm_mult_test_results |> filter(.metric == 'rmse') |> pull(.estimate)), big.mark=",", nsmall=0, scientific=FALSE)`.
508508
In the case of two predictors, we can plot the predictions made by our linear regression creates a *plane* of best fit, as
509509
shown in Figure \@ref(fig:08-3DlinReg).
510510

@@ -614,12 +614,12 @@ lm_mult_test_results
614614
```
615615

616616
We obtain an RMSPE \index{RMSPE} for the multivariable linear regression model
617-
of `r format(lm_mult_test_results |> filter(.metric == 'rmse') |> pull(.estimate), big.mark=",", nsmall=0, scientific = FALSE)`. This prediction error
617+
of \$`r format(lm_mult_test_results |> filter(.metric == 'rmse') |> pull(.estimate), big.mark=",", nsmall=0, scientific = FALSE)`. This prediction error
618618
is less than the prediction error for the multivariable KNN regression model,
619619
indicating that we should likely choose linear regression for predictions of
620620
house sale price on this data set. Revisiting the simple linear regression model
621621
with only a single predictor from earlier in this chapter, we see that the RMSPE for that model was
622-
`r format(lm_test_results |> filter(.metric == 'rmse') |> pull(.estimate), big.mark=",", nsmall=0, scientific = FALSE)`,
622+
\$`r format(lm_test_results |> filter(.metric == 'rmse') |> pull(.estimate), big.mark=",", nsmall=0, scientific = FALSE)`,
623623
which is slightly higher than that of our more complex model. Our model with two predictors
624624
provided a slightly better fit on test data than our model with just one.
625625
As mentioned earlier, this is not always the case: sometimes including more

0 commit comments

Comments
 (0)