@@ -308,7 +308,7 @@ calculate the mean of the sample proportions. \index{sampling distribution!shap
308
308
309
309
``` {r 11-example-proportions8, echo = TRUE, message = FALSE, warning = FALSE}
310
310
sample_estimates |>
311
- summarize(mean = mean(sample_proportion))
311
+ summarize(mean_proportion = mean(sample_proportion))
312
312
```
313
313
314
314
We notice that the sample proportions are centered around the population
@@ -356,13 +356,13 @@ the average price per night for all the Airbnb listings.
356
356
357
357
``` {r 11-example-means-popmean, echo = TRUE, message = FALSE, warning = FALSE}
358
358
population_parameters <- airbnb |>
359
- summarize(pop_mean = mean(price))
359
+ summarize(mean_price = mean(price))
360
360
361
361
population_parameters
362
362
```
363
363
364
364
The price per night of all Airbnb rentals in Vancouver, BC
365
- is \$ ` r round(population_parameters$pop_mean ,2) ` , on average. This value is our
365
+ is \$ ` r round(population_parameters$mean_price ,2) ` , on average. This value is our
366
366
population parameter since we are calculating it using the population data. \index{population!parameter}
367
367
368
368
Now suppose we did not have access to the population data (which is usually the
@@ -401,9 +401,9 @@ The average value of the sample of size 40
401
401
is \$ ` r round(estimates$mean_price, 2) ` . This
402
402
number is a point estimate for the mean of the full population.
403
403
Recall that the population mean was
404
- \$ ` r round(population_parameters$pop_mean ,2) ` . So our estimate was fairly close to
404
+ \$ ` r round(population_parameters$mean_price ,2) ` . So our estimate was fairly close to
405
405
the population parameter: the mean was about
406
- ` r round(100*abs(estimates$mean_price - population_parameters$pop_mean )/population_parameters$pop_mean , 1) ` %
406
+ ` r round(100*abs(estimates$mean_price - population_parameters$mean_price )/population_parameters$mean_price , 1) ` %
407
407
off. Note that we usually cannot compute the estimate's accuracy in practice
408
408
since we do not have access to the population parameter; if we did, we wouldn't
409
409
need to estimate it!
@@ -447,7 +447,7 @@ about \$`r round(quantile(sample_estimates$mean_price)[2], -1)` and
447
447
a good fraction of cases outside this range (i.e., where the point estimate was
448
448
not close to the population parameter). So it does indeed look like we were
449
449
quite lucky when we estimated the population mean with only
450
- ` r round(100*abs(estimates$mean_price - population_parameters$pop_mean )/population_parameters$pop_mean , 1) ` % error.
450
+ ` r round(100*abs(estimates$mean_price - population_parameters$mean_price )/population_parameters$mean_price , 1) ` % error.
451
451
452
452
Let's visualize the population distribution, distribution of the sample, and
453
453
the sampling distribution on one plot to compare them in Figure
@@ -815,7 +815,7 @@ boot1_dist <- ggplot(boot1, aes(price)) +
815
815
816
816
boot1_dist
817
817
818
- summarize(boot1, mean = mean(price))
818
+ summarize(boot1, mean_price = mean(price))
819
819
```
820
820
821
821
Notice in Figure \@ ref(fig:11-bootstrapping3) that the histogram of our bootstrap sample
@@ -861,7 +861,7 @@ these six replicates.
861
861
``` {r 11-bootstrapping-six-bootstrap-samples-means, echo = TRUE, message = FALSE, warning = FALSE}
862
862
six_bootstrap_samples |>
863
863
group_by(replicate) |>
864
- summarize(mean = mean(price))
864
+ summarize(mean_price = mean(price))
865
865
```
866
866
867
867
We can see that the bootstrap sample distributions and the sample means are
@@ -874,12 +874,12 @@ our point estimate to behave if we took another sample.
874
874
``` {r 11-bootstrapping5, echo = TRUE, message = FALSE, warning = FALSE, fig.pos = "H", out.extra="", fig.cap = "Distribution of the bootstrap sample means.", fig.height = 3.5, fig.width = 4.5}
875
875
boot20000_means <- boot20000 |>
876
876
group_by(replicate) |>
877
- summarize(mean = mean(price))
877
+ summarize(mean_price = mean(price))
878
878
879
879
boot20000_means
880
880
tail(boot20000_means)
881
881
882
- boot_est_dist <- ggplot(boot20000_means, aes(x = mean )) +
882
+ boot_est_dist <- ggplot(boot20000_means, aes(x = mean_price )) +
883
883
geom_histogram(fill = "dodgerblue3", color = "lightgrey") +
884
884
labs(x = "Sample mean price per night (dollars)", y = "Count") +
885
885
theme(text = element_text(size = 12))
@@ -915,12 +915,12 @@ boot_est_dist_limits <- boot_est_dist +
915
915
xlim(min_x(sampling_dist), max_x(sampling_dist))
916
916
917
917
annotated_boot_est_dist <- boot_est_dist_limits +
918
- geom_vline(xintercept = mean(boot20000_means$mean ), col = "red") +
918
+ geom_vline(xintercept = mean(boot20000_means$mean_price ), col = "red") +
919
919
annotate("text",
920
920
x = max_x(sampling_dist), y = max_count(boot_est_dist_limits),
921
921
vjust = 1,
922
922
hjust = 1,
923
- label = paste("mean = ", round(mean(boot20000_means$mean ), 1)))
923
+ label = paste("mean = ", round(mean(boot20000_means$mean_price ), 1)))
924
924
grid.arrange(annotated_sampling_dist + ggtitle("Sampling distribution"),
925
925
annotated_boot_est_dist + ggtitle("Bootstrap distribution"),
926
926
ncol = 2
@@ -936,7 +936,7 @@ second important point is that the means of these two distributions are
936
936
different. The sampling distribution is centered at
937
937
\$ ` r round(mean(airbnb$price),2) ` , the population mean value. However, the bootstrap
938
938
distribution is centered at the original sample's mean price per night,
939
- \$ ` r round(mean(boot20000_means$mean ), 2) ` . Because we are resampling from the
939
+ \$ ` r round(mean(boot20000_means$mean_price ), 2) ` . Because we are resampling from the
940
940
original sample repeatedly, we see that the bootstrap distribution is centered
941
941
at the original sample's mean value (unlike the sampling distribution of the
942
942
sample mean, which is centered at the population parameter value).
@@ -1121,7 +1121,7 @@ To do this in R, we can use the `quantile()` function:
1121
1121
1122
1122
``` {r 11-bootstrapping8, echo = T, message = FALSE, warning = FALSE}
1123
1123
bounds <- boot20000_means |>
1124
- select(mean ) |>
1124
+ select(mean_price ) |>
1125
1125
pull() |>
1126
1126
quantile(c(0.025, 0.975))
1127
1127
0 commit comments