Skip to content

Commit edac90c

Browse files
committed
Fix a few additional occurrences of mean
1 parent 9950752 commit edac90c

File tree

1 file changed

+14
-14
lines changed

1 file changed

+14
-14
lines changed

source/inference.Rmd

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -308,7 +308,7 @@ calculate the mean of the sample proportions. \index{sampling distribution!shap
308308

309309
```{r 11-example-proportions8, echo = TRUE, message = FALSE, warning = FALSE}
310310
sample_estimates |>
311-
summarize(mean = mean(sample_proportion))
311+
summarize(mean_proportion = mean(sample_proportion))
312312
```
313313

314314
We notice that the sample proportions are centered around the population
@@ -356,13 +356,13 @@ the average price per night for all the Airbnb listings.
356356

357357
```{r 11-example-means-popmean, echo = TRUE, message = FALSE, warning = FALSE}
358358
population_parameters <- airbnb |>
359-
summarize(pop_mean = mean(price))
359+
summarize(mean_price = mean(price))
360360
361361
population_parameters
362362
```
363363

364364
The price per night of all Airbnb rentals in Vancouver, BC
365-
is \$`r round(population_parameters$pop_mean,2)`, on average. This value is our
365+
is \$`r round(population_parameters$mean_price,2)`, on average. This value is our
366366
population parameter since we are calculating it using the population data. \index{population!parameter}
367367

368368
Now suppose we did not have access to the population data (which is usually the
@@ -401,9 +401,9 @@ The average value of the sample of size 40
401401
is \$`r round(estimates$mean_price, 2)`. This
402402
number is a point estimate for the mean of the full population.
403403
Recall that the population mean was
404-
\$`r round(population_parameters$pop_mean,2)`. So our estimate was fairly close to
404+
\$`r round(population_parameters$mean_price,2)`. So our estimate was fairly close to
405405
the population parameter: the mean was about
406-
`r round(100*abs(estimates$mean_price - population_parameters$pop_mean)/population_parameters$pop_mean, 1)`%
406+
`r round(100*abs(estimates$mean_price - population_parameters$mean_price)/population_parameters$mean_price, 1)`%
407407
off. Note that we usually cannot compute the estimate's accuracy in practice
408408
since we do not have access to the population parameter; if we did, we wouldn't
409409
need to estimate it!
@@ -447,7 +447,7 @@ about \$`r round(quantile(sample_estimates$mean_price)[2], -1)` and
447447
a good fraction of cases outside this range (i.e., where the point estimate was
448448
not close to the population parameter). So it does indeed look like we were
449449
quite lucky when we estimated the population mean with only
450-
`r round(100*abs(estimates$mean_price - population_parameters$pop_mean)/population_parameters$pop_mean, 1)`% error.
450+
`r round(100*abs(estimates$mean_price - population_parameters$mean_price)/population_parameters$mean_price, 1)`% error.
451451

452452
Let's visualize the population distribution, distribution of the sample, and
453453
the sampling distribution on one plot to compare them in Figure
@@ -815,7 +815,7 @@ boot1_dist <- ggplot(boot1, aes(price)) +
815815
816816
boot1_dist
817817
818-
summarize(boot1, mean = mean(price))
818+
summarize(boot1, mean_price = mean(price))
819819
```
820820

821821
Notice in Figure \@ref(fig:11-bootstrapping3) that the histogram of our bootstrap sample
@@ -861,7 +861,7 @@ these six replicates.
861861
```{r 11-bootstrapping-six-bootstrap-samples-means, echo = TRUE, message = FALSE, warning = FALSE}
862862
six_bootstrap_samples |>
863863
group_by(replicate) |>
864-
summarize(mean = mean(price))
864+
summarize(mean_price = mean(price))
865865
```
866866

867867
We can see that the bootstrap sample distributions and the sample means are
@@ -874,12 +874,12 @@ our point estimate to behave if we took another sample.
874874
```{r 11-bootstrapping5, echo = TRUE, message = FALSE, warning = FALSE, fig.pos = "H", out.extra="", fig.cap = "Distribution of the bootstrap sample means.", fig.height = 3.5, fig.width = 4.5}
875875
boot20000_means <- boot20000 |>
876876
group_by(replicate) |>
877-
summarize(mean = mean(price))
877+
summarize(mean_price = mean(price))
878878
879879
boot20000_means
880880
tail(boot20000_means)
881881
882-
boot_est_dist <- ggplot(boot20000_means, aes(x = mean)) +
882+
boot_est_dist <- ggplot(boot20000_means, aes(x = mean_price)) +
883883
geom_histogram(fill = "dodgerblue3", color = "lightgrey") +
884884
labs(x = "Sample mean price per night (dollars)", y = "Count") +
885885
theme(text = element_text(size = 12))
@@ -915,12 +915,12 @@ boot_est_dist_limits <- boot_est_dist +
915915
xlim(min_x(sampling_dist), max_x(sampling_dist))
916916
917917
annotated_boot_est_dist <- boot_est_dist_limits +
918-
geom_vline(xintercept = mean(boot20000_means$mean), col = "red") +
918+
geom_vline(xintercept = mean(boot20000_means$mean_price), col = "red") +
919919
annotate("text",
920920
x = max_x(sampling_dist), y = max_count(boot_est_dist_limits),
921921
vjust = 1,
922922
hjust = 1,
923-
label = paste("mean = ", round(mean(boot20000_means$mean), 1)))
923+
label = paste("mean = ", round(mean(boot20000_means$mean_price), 1)))
924924
grid.arrange(annotated_sampling_dist + ggtitle("Sampling distribution"),
925925
annotated_boot_est_dist + ggtitle("Bootstrap distribution"),
926926
ncol = 2
@@ -936,7 +936,7 @@ second important point is that the means of these two distributions are
936936
different. The sampling distribution is centered at
937937
\$`r round(mean(airbnb$price),2)`, the population mean value. However, the bootstrap
938938
distribution is centered at the original sample's mean price per night,
939-
\$`r round(mean(boot20000_means$mean), 2)`. Because we are resampling from the
939+
\$`r round(mean(boot20000_means$mean_price), 2)`. Because we are resampling from the
940940
original sample repeatedly, we see that the bootstrap distribution is centered
941941
at the original sample's mean value (unlike the sampling distribution of the
942942
sample mean, which is centered at the population parameter value).
@@ -1121,7 +1121,7 @@ To do this in R, we can use the `quantile()` function:
11211121

11221122
```{r 11-bootstrapping8, echo = T, message = FALSE, warning = FALSE}
11231123
bounds <- boot20000_means |>
1124-
select(mean) |>
1124+
select(mean_price) |>
11251125
pull() |>
11261126
quantile(c(0.025, 0.975))
11271127

0 commit comments

Comments
 (0)