Skip to content

Commit cf87404

Browse files
inference copyedits
1 parent c22ffd2 commit cf87404

File tree

1 file changed

+20
-20
lines changed

1 file changed

+20
-20
lines changed

inference.Rmd

Lines changed: 20 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ populations and then introduce two common techniques in statistical inference:
3333
*point estimation* and *interval estimation*.
3434

3535
## Chapter learning objectives
36-
By the end of the chapter, readers will be able to:
36+
By the end of the chapter, readers will be able to do the following:
3737

3838
* Describe real-world examples of questions that can be answered with statistical inference.
3939
* Define common population parameters (e.g., mean, proportion, standard deviation) that are often estimated using sampled data, and estimate these from a sample.
@@ -74,7 +74,7 @@ subset of individuals collected from the population. We can then compute a
7474
estimates the population parameter. For example, suppose we randomly selected
7575
ten undergraduate students across North America (the sample) and computed the
7676
proportion of those students who own an iPhone (the sample estimate). In that
77-
case, we might suspect that that proportion is a reasonable estimate of the
77+
case, we might suspect that proportion is a reasonable estimate of the
7878
proportion of students who own an iPhone in the entire population. Figure
7979
\@ref(fig:11-population-vs-sample) illustrates this process.
8080
In general, the process of using a sample to make a conclusion about the
@@ -96,10 +96,10 @@ formulate the following question:
9696

9797
In this case, the population consists of all studio apartment rentals in Vancouver, and the
9898
population parameter is the *average price-per-month*. Here we used the average
99-
as a measure of center to describe the "typical value" of studio apartment
99+
as a measure of the center to describe the "typical value" of studio apartment
100100
rental prices. But even within this one example, we could also be interested in
101101
many other population parameters. For instance, we know that not every studio
102-
apartment rental in Vancouver will have the same price-per-month. The student
102+
apartment rental in Vancouver will have the same price per month. The student
103103
might be interested in how much monthly prices vary and want to find a measure
104104
of the rentals' spread (or variability), such as the standard deviation. Or perhaps the
105105
student might be interested in the fraction of studio apartment rentals that
@@ -252,8 +252,8 @@ samples
252252
```
253253

254254
Notice that the column `replicate` indicates the replicate, or sample, to which
255-
each listing belongs. Above since by default R only prints the first few rows,
256-
it looks like all of the listing have `replicate` set to 1. But you can
255+
each listing belongs. Above, since by default R only prints the first few rows,
256+
it looks like all of the listings have `replicate` set to 1. But you can
257257
check the last few entries using the `tail()` function to verify that
258258
we indeed created 20,000 samples (or replicates).
259259

@@ -623,7 +623,7 @@ mean is roughly bell-shaped. \index{sampling distribution!effect of sample size}
623623
--->
624624

625625
### Summary
626-
1. A point estimate is a single value computed using a sample from a population (e.g., a mean or proportion)
626+
1. A point estimate is a single value computed using a sample from a population (e.g., a mean or proportion).
627627
2. The sampling distribution of an estimate is the distribution of the estimate for all possible samples of a fixed size from the same population.
628628
3. The shape of the sampling distribution is usually bell-shaped with one peak and centered at the population mean or proportion.
629629
4. The spread of the sampling distribution is related to the sample size. As the sample size increases, the spread of the sampling distribution decreases.
@@ -745,15 +745,15 @@ approximation that we call **the bootstrap distribution**. \index{bootstrap!dist
745745
746746
This section will explore how to create a bootstrap distribution from a single
747747
sample using R. The process is visualized in Figure \@ref(fig:11-intro-bootstrap-image).
748-
For a sample of size $n$, you:
748+
For a sample of size $n$, you would do the following:
749749

750-
1. Randomly select an observation from the original sample, which was drawn from the population
751-
2. Record the observation's value
752-
3. Replace that observation
753-
4. Repeat steps 1 - 3 (sampling *with* replacement) until you have $n$ observations, which form a bootstrap sample
754-
5. Calculate the bootstrap point estimate (e.g., mean, median, proportion, slope, etc.) of the $n$ observations in your bootstrap sample
755-
6. Repeat steps (1) - (5) many times to create a distribution of point estimates (the bootstrap distribution)
756-
7. Calculate the plausible range of values around our observed point estimate
750+
1. Randomly select an observation from the original sample, which was drawn from the population.
751+
2. Record the observation's value.
752+
3. Replace that observation.
753+
4. Repeat steps 1 - 3 (sampling *with* replacement) until you have $n$ observations, which form a bootstrap sample.
754+
5. Calculate the bootstrap point estimate (e.g., mean, median, proportion, slope, etc.) of the $n$ observations in your bootstrap sample.
755+
6. Repeat steps (1) - (5) many times to create a distribution of point estimates (the bootstrap distribution).
756+
7. Calculate the plausible range of values around our observed point estimate.
757757

758758
```{r 11-intro-bootstrap-image, echo = FALSE, message = FALSE, warning = FALSE, fig.cap = "Overview of the bootstrap process.", fig.retina = 2, out.width="100%"}
759759
knitr::include_graphics("img/intro-bootstrap.jpeg")
@@ -1102,9 +1102,9 @@ confidence level.
11021102

11031103
To calculate a 95\% percentile bootstrap confidence interval, we will do the following:
11041104

1105-
1. Arrange the observations in the bootstrap distribution in ascending order
1106-
2. Find the value such that 2.5\% of observations fall below it (the 2.5\% percentile). Use that value as the lower bound of the interval
1107-
3. Find the value such that 97.5\% of observations fall below it (the 97.5\% percentile). Use that value as the upper bound of the interval
1105+
1. Arrange the observations in the bootstrap distribution in ascending order.
1106+
2. Find the value such that 2.5\% of observations fall below it (the 2.5\% percentile). Use that value as the lower bound of the interval.
1107+
3. Find the value such that 97.5\% of observations fall below it (the 97.5\% percentile). Use that value as the upper bound of the interval.
11081108

11091109
To do this in R, we can use the `quantile()` function:
11101110
\index{quantile}
@@ -1172,5 +1172,5 @@ found in Chapter \@ref(move-to-your-own-machine).
11721172

11731173
## Additional resources
11741174

1175-
- Chapters 7 to 10 of [Modern Dive](https://moderndive.com/) provide a great next step in learning about inference. In particular, Chapters 7 and 8 cover sampling and bootstrapping using `tidyverse` and `infer` in a slightly more in-depth manner than the present chapter. Chapters 9 and 10 take the next step beyond the scope of this chapter and begin to provide some of the initial mathematical underpinnings of inference and more advanced applications of the concept of inference in testing hypotheses and performing regression. This material offers a great starting point for getting more into the technical side of statistics.
1176-
- Chapters 4 to 7 of [OpenIntro Statistics - Fourth Edition](https://www.openintro.org/) provide a good next step after Modern Dive. Although it is still certainly an introductory text, things get a bit more mathematical here. Depending on your background, you may actually want to start going through Chapters 1 to 3 first, where you will learn some fundamental concepts in probability theory. Although it may seem like a diversion, probability theory is *the language of statistics*; if you have a solid grasp of probability, more advanced statistics will come naturally to you!
1175+
- Chapters 7 to 10 of [*Modern Dive*](https://moderndive.com/) provide a great next step in learning about inference. In particular, Chapters 7 and 8 cover sampling and bootstrapping using `tidyverse` and `infer` in a slightly more in-depth manner than the present chapter. Chapters 9 and 10 take the next step beyond the scope of this chapter and begin to provide some of the initial mathematical underpinnings of inference and more advanced applications of the concept of inference in testing hypotheses and performing regression. This material offers a great starting point for getting more into the technical side of statistics.
1176+
- Chapters 4 to 7 of [*OpenIntro Statistics - Fourth Edition*](https://www.openintro.org/) provide a good next step after *Modern Dive*. Although it is still certainly an introductory text, things get a bit more mathematical here. Depending on your background, you may actually want to start going through Chapters 1 to 3 first, where you will learn some fundamental concepts in probability theory. Although it may seem like a diversion, probability theory is *the language of statistics*; if you have a solid grasp of probability, more advanced statistics will come naturally to you!

0 commit comments

Comments
 (0)