Skip to content

Commit 9950752

Browse files
committed
Use more intuitive column name
1 parent bfd3a99 commit 9950752

File tree

1 file changed

+36
-36
lines changed

1 file changed

+36
-36
lines changed

source/inference.Rmd

Lines changed: 36 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -392,18 +392,18 @@ sample_distribution <- ggplot(one_sample, aes(price)) +
392392
sample_distribution
393393
394394
estimates <- one_sample |>
395-
summarize(sample_mean = mean(price))
395+
summarize(mean_price = mean(price))
396396
397397
estimates
398398
```
399399

400400
The average value of the sample of size 40
401-
is \$`r round(estimates$sample_mean, 2)`. This
401+
is \$`r round(estimates$mean_price, 2)`. This
402402
number is a point estimate for the mean of the full population.
403403
Recall that the population mean was
404404
\$`r round(population_parameters$pop_mean,2)`. So our estimate was fairly close to
405405
the population parameter: the mean was about
406-
`r round(100*abs(estimates$sample_mean - population_parameters$pop_mean)/population_parameters$pop_mean, 1)`%
406+
`r round(100*abs(estimates$mean_price - population_parameters$pop_mean)/population_parameters$pop_mean, 1)`%
407407
off. Note that we usually cannot compute the estimate's accuracy in practice
408408
since we do not have access to the population parameter; if we did, we wouldn't
409409
need to estimate it!
@@ -428,11 +428,11 @@ distribution of sample means for samples of size 40.
428428
```{r 11-example-means4, echo = TRUE, message = FALSE, fig.pos = "H", out.extra="", warning = FALSE, fig.cap= "Sampling distribution of the sample means for sample size of 40.", fig.height = 3.5, fig.width = 4.5}
429429
sample_estimates <- samples |>
430430
group_by(replicate) |>
431-
summarize(sample_mean = mean(price))
431+
summarize(mean_price = mean(price))
432432
433433
sample_estimates
434434
435-
sampling_distribution_40 <- ggplot(sample_estimates, aes(x = sample_mean)) +
435+
sampling_distribution_40 <- ggplot(sample_estimates, aes(x = mean_price)) +
436436
geom_histogram(fill = "dodgerblue3", color = "lightgrey") +
437437
labs(x = "Sample mean price per night (dollars)", y = "Count") +
438438
theme(text = element_text(size = 12))
@@ -442,12 +442,12 @@ sampling_distribution_40
442442

443443
In Figure \@ref(fig:11-example-means4), the sampling distribution of the mean
444444
has one peak and is \index{sampling distribution!shape} bell-shaped. Most of the estimates are between
445-
about \$`r round(quantile(sample_estimates$sample_mean)[2], -1)` and
446-
\$`r round(quantile(sample_estimates$sample_mean)[4], -1)`; but there are
445+
about \$`r round(quantile(sample_estimates$mean_price)[2], -1)` and
446+
\$`r round(quantile(sample_estimates$mean_price)[4], -1)`; but there are
447447
a good fraction of cases outside this range (i.e., where the point estimate was
448448
not close to the population parameter). So it does indeed look like we were
449449
quite lucky when we estimated the population mean with only
450-
`r round(100*abs(estimates$sample_mean - population_parameters$pop_mean)/population_parameters$pop_mean, 1)`% error.
450+
`r round(100*abs(estimates$mean_price - population_parameters$pop_mean)/population_parameters$pop_mean, 1)`% error.
451451

452452
Let's visualize the population distribution, distribution of the sample, and
453453
the sampling distribution on one plot to compare them in Figure
@@ -465,9 +465,9 @@ sample, which will keep the average from being too extreme.
465465
<!---
466466
```{r 11-example-means4.5}
467467
sample_estimates |>
468-
summarize(mean_of_sample_means = mean(sample_mean))
468+
summarize(mean_of_sample_means = mean(mean_price))
469469
```
470-
Notice that the mean of the sample means is \$`r round(mean(sample_estimates$sample_mean),2)`. Recall that the population mean
470+
Notice that the mean of the sample means is \$`r round(mean(sample_estimates$mean_price),2)`. Recall that the population mean
471471
was \$`r round(mean(airbnb$price),2)`.
472472
-->
473473

@@ -497,44 +497,44 @@ distribution with a red vertical line.
497497
## Sampling n = 20, 50, 100, 500
498498
sample_estimates_20 <- rep_sample_n(airbnb, size = 20, reps = 20000) |>
499499
group_by(replicate) |>
500-
summarize(sample_mean = mean(price))
500+
summarize(mean_price = mean(price))
501501
502502
sample_estimates_50 <- rep_sample_n(airbnb, size = 50, reps = 20000) |>
503503
group_by(replicate) |>
504-
summarize(sample_mean = mean(price))
504+
summarize(mean_price = mean(price))
505505
506506
sample_estimates_100 <- rep_sample_n(airbnb, size = 100, reps = 20000) |>
507507
group_by(replicate) |>
508-
summarize(sample_mean = mean(price))
508+
summarize(mean_price = mean(price))
509509
510510
sample_estimates_500 <- rep_sample_n(airbnb, size = 500, reps = 20000) |>
511511
group_by(replicate) |>
512-
summarize(sample_mean = mean(price))
512+
summarize(mean_price = mean(price))
513513
514514
## Sampling distribution n = 20
515-
sampling_distribution_20 <- ggplot(sample_estimates_20, aes(x = sample_mean)) +
515+
sampling_distribution_20 <- ggplot(sample_estimates_20, aes(x = mean_price)) +
516516
geom_histogram(fill = "dodgerblue3", color = "lightgrey") +
517517
labs(x = "Sample mean price per night (dollars)", y = "Count") +
518518
ggtitle("n = 20")
519519
520520
## Sampling distribution n = 50
521-
sampling_distribution_50 <- ggplot(sample_estimates_50, aes(x = sample_mean)) +
521+
sampling_distribution_50 <- ggplot(sample_estimates_50, aes(x = mean_price)) +
522522
geom_histogram(fill = "dodgerblue3", color = "lightgrey") +
523523
ylab("Count") +
524524
xlab("Sample mean price per night (dollars)") +
525525
ggtitle("n = 50") +
526526
xlim(min_x(sampling_distribution_20), max_x(sampling_distribution_20))
527527
528528
## Sampling distribution n = 100
529-
sampling_distribution_100 <- ggplot(sample_estimates_100, aes(x = sample_mean)) +
529+
sampling_distribution_100 <- ggplot(sample_estimates_100, aes(x = mean_price)) +
530530
geom_histogram(fill = "dodgerblue3", color = "lightgrey") +
531531
ylab("Count") +
532532
xlab("Sample mean price per night (dollars)") +
533533
ggtitle("n = 100") +
534534
xlim(min_x(sampling_distribution_20), max_x(sampling_distribution_20))
535535
536536
## Sampling distribution n = 500
537-
sampling_distribution_500 <- ggplot(sample_estimates_500, aes(x = sample_mean)) +
537+
sampling_distribution_500 <- ggplot(sample_estimates_500, aes(x = mean_price)) +
538538
geom_histogram(fill = "dodgerblue3", color = "lightgrey") +
539539
ylab("Count") +
540540
xlab("Sample mean price per night (dollars)") +
@@ -544,57 +544,57 @@ sampling_distribution_500 <- ggplot(sample_estimates_500, aes(x = sample_mean))
544544

545545
```{r 11-example-means7, echo = FALSE, message = FALSE, warning = FALSE, fig.cap = "Comparison of sampling distributions, with mean highlighted as a vertical red line."}
546546
annotated_sampling_dist_20 <- sampling_distribution_20 +
547-
geom_vline(xintercept = mean(sample_estimates$sample_mean), col = "red") +
547+
geom_vline(xintercept = mean(sample_estimates$mean_price), col = "red") +
548548
xlim(min_x(sampling_distribution_20), max_x(sampling_distribution_20)) +
549549
ggtitle("n = 20") +
550550
annotate("text",
551551
x = max_x(sampling_distribution_20),
552552
y = max_count(sampling_distribution_20),
553553
hjust = 1,
554554
vjust = 1,
555-
label = paste("mean = ", round(mean(sample_estimates$sample_mean), 1))
555+
label = paste("mean = ", round(mean(sample_estimates$mean_price), 1))
556556
)+ theme(text = element_text(size = 12), axis.title=element_text(size=12))
557557
#+
558558
# annotate("text", x = max_x(sampling_distribution_20), y = max_count(sampling_distribution_20), hjust = 1, vjust = 3,
559-
# label = paste("sd = ", round(sd(sample_estimates$sample_mean), 1)))
559+
# label = paste("sd = ", round(sd(sample_estimates$mean_price), 1)))
560560
561561
annotated_sampling_dist_50 <- sampling_distribution_50 +
562-
geom_vline(xintercept = mean(sample_estimates_50$sample_mean), col = "red") +
562+
geom_vline(xintercept = mean(sample_estimates_50$mean_price), col = "red") +
563563
## x limits set the same as n = 20 graph, y is this graph
564564
annotate("text",
565565
x = max_x(sampling_distribution_20),
566566
y = max_count(sampling_distribution_50),
567567
hjust = 1,
568568
vjust = 1,
569-
label = paste("mean = ", round(mean(sample_estimates_50$sample_mean), 1))
569+
label = paste("mean = ", round(mean(sample_estimates_50$mean_price), 1))
570570
)+ theme(text = element_text(size = 12), axis.title=element_text(size=12)) #+
571571
# annotate("text", x = max_x(sampling_distribution_20), y = max_count(sampling_distribution_50), hjust = 1, vjust = 3,
572-
# label = paste("sd = ", round(sd(sample_estimates_50$sample_mean), 1)))
572+
# label = paste("sd = ", round(sd(sample_estimates_50$mean_price), 1)))
573573
574574
annotated_sampling_dist_100 <- sampling_distribution_100 +
575-
geom_vline(xintercept = mean(sample_estimates_100$sample_mean), col = "red") +
575+
geom_vline(xintercept = mean(sample_estimates_100$mean_price), col = "red") +
576576
annotate("text",
577577
x = max_x(sampling_distribution_20),
578578
y = max_count(sampling_distribution_100),
579579
hjust = 1,
580580
vjust = 1,
581-
label = paste("mean = ", round(mean(sample_estimates_100$sample_mean), 1))
581+
label = paste("mean = ", round(mean(sample_estimates_100$mean_price), 1))
582582
) + theme(text = element_text(size = 12), axis.title=element_text(size=12)) #+
583583
# annotate("text", x = max_x(sampling_distribution_20), y = max_count(sampling_distribution_100), hjust = 1, vjust = 3,
584-
# label = paste("sd = ", round(sd(sample_estimates_100$sample_mean), 1)))
584+
# label = paste("sd = ", round(sd(sample_estimates_100$mean_price), 1)))
585585
586586
annotated_sampling_dist_500 <- sampling_distribution_500 +
587-
geom_vline(xintercept = mean(sample_estimates_500$sample_mean), col = "red") +
587+
geom_vline(xintercept = mean(sample_estimates_500$mean_price), col = "red") +
588588
annotate("text",
589589
x = max_x(sampling_distribution_20),
590590
y = max_count(sampling_distribution_500),
591591
hjust = 1,
592592
vjust = 1,
593-
label = paste("mean = ", round(mean(sample_estimates_500$sample_mean), 1))
593+
label = paste("mean = ", round(mean(sample_estimates_500$mean_price), 1))
594594
) + theme(text = element_text(size = 12), axis.title=element_text(size=12))
595595
#+
596596
# annotate("text", x = max_x(sampling_distribution_20), y = max_count(sampling_distribution_500), hjust = 1, vjust = 3,
597-
# label = paste("sd = ", round(sd(sample_estimates_500$sample_mean), 1)))
597+
# label = paste("sd = ", round(sd(sample_estimates_500$mean_price), 1)))
598598
599599
grid.arrange(annotated_sampling_dist_20,
600600
annotated_sampling_dist_50,
@@ -771,7 +771,7 @@ and use a bootstrap distribution using just a single sample from the population.
771771
Once again, suppose we are
772772
interested in estimating the population mean price per night of all Airbnb
773773
listings in Vancouver, Canada, using a single sample size of 40.
774-
Recall our point estimate was \$`r round(estimates$sample_mean, 2)`. The
774+
Recall our point estimate was \$`r round(estimates$mean_price, 2)`. The
775775
histogram of prices in the sample is displayed in Figure \@ref(fig:11-bootstrapping1).
776776

777777
```{r, echo = F, message = F, warning = F}
@@ -791,7 +791,7 @@ one_sample_dist
791791
```
792792

793793
The histogram for the sample is skewed, with a few observations out to the right. The
794-
mean of the sample is \$`r round(estimates$sample_mean, 2)`.
794+
mean of the sample is \$`r round(estimates$mean_price, 2)`.
795795
Remember, in practice, we usually only have this one sample from the population. So
796796
this sample and estimate are the only data we can work with.
797797

@@ -895,21 +895,21 @@ samples <- rep_sample_n(airbnb, size = 40, reps = 20000)
895895
896896
sample_estimates <- samples |>
897897
group_by(replicate) |>
898-
summarize(sample_mean = mean(price))
898+
summarize(mean_price = mean(price))
899899
900-
sampling_dist <- ggplot(sample_estimates, aes(x = sample_mean)) +
900+
sampling_dist <- ggplot(sample_estimates, aes(x = mean_price)) +
901901
geom_histogram(fill = "dodgerblue3", color = "lightgrey") +
902902
ylab("Count") +
903903
xlab("Sample mean price per night (dollars)")
904904
905905
annotated_sampling_dist <- sampling_dist +
906906
xlim(min_x(sampling_dist), max_x(sampling_dist)) +
907-
geom_vline(xintercept = mean(sample_estimates$sample_mean), col = "red") +
907+
geom_vline(xintercept = mean(sample_estimates$mean_price), col = "red") +
908908
annotate("text",
909909
x = max_x(sampling_dist), y = max_count(sampling_dist),
910910
hjust = 1,
911911
vjust = 1,
912-
label = paste("mean = ", round(mean(sample_estimates$sample_mean), 1)))
912+
label = paste("mean = ", round(mean(sample_estimates$mean_price), 1)))
913913
914914
boot_est_dist_limits <- boot_est_dist +
915915
xlim(min_x(sampling_dist), max_x(sampling_dist))

0 commit comments

Comments
 (0)