You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: inference.Rmd
+20-20Lines changed: 20 additions & 20 deletions
Original file line number
Diff line number
Diff line change
@@ -33,7 +33,7 @@ populations and then introduce two common techniques in statistical inference:
33
33
*point estimation* and *interval estimation*.
34
34
35
35
## Chapter learning objectives
36
-
By the end of the chapter, readers will be able to:
36
+
By the end of the chapter, readers will be able to do the following:
37
37
38
38
* Describe real-world examples of questions that can be answered with statistical inference.
39
39
* Define common population parameters (e.g., mean, proportion, standard deviation) that are often estimated using sampled data, and estimate these from a sample.
@@ -74,7 +74,7 @@ subset of individuals collected from the population. We can then compute a
74
74
estimates the population parameter. For example, suppose we randomly selected
75
75
ten undergraduate students across North America (the sample) and computed the
76
76
proportion of those students who own an iPhone (the sample estimate). In that
77
-
case, we might suspect that that proportion is a reasonable estimate of the
77
+
case, we might suspect that proportion is a reasonable estimate of the
78
78
proportion of students who own an iPhone in the entire population. Figure
79
79
\@ref(fig:11-population-vs-sample) illustrates this process.
80
80
In general, the process of using a sample to make a conclusion about the
@@ -96,10 +96,10 @@ formulate the following question:
96
96
97
97
In this case, the population consists of all studio apartment rentals in Vancouver, and the
98
98
population parameter is the *average price-per-month*. Here we used the average
99
-
as a measure of center to describe the "typical value" of studio apartment
99
+
as a measure of the center to describe the "typical value" of studio apartment
100
100
rental prices. But even within this one example, we could also be interested in
101
101
many other population parameters. For instance, we know that not every studio
102
-
apartment rental in Vancouver will have the same price-per-month. The student
102
+
apartment rental in Vancouver will have the same pricepermonth. The student
103
103
might be interested in how much monthly prices vary and want to find a measure
104
104
of the rentals' spread (or variability), such as the standard deviation. Or perhaps the
105
105
student might be interested in the fraction of studio apartment rentals that
@@ -252,8 +252,8 @@ samples
252
252
```
253
253
254
254
Notice that the column `replicate` indicates the replicate, or sample, to which
255
-
each listing belongs. Above since by default R only prints the first few rows,
256
-
it looks like all of the listing have `replicate` set to 1. But you can
255
+
each listing belongs. Above, since by default R only prints the first few rows,
256
+
it looks like all of the listings have `replicate` set to 1. But you can
257
257
check the last few entries using the `tail()` function to verify that
258
258
we indeed created 20,000 samples (or replicates).
259
259
@@ -623,7 +623,7 @@ mean is roughly bell-shaped. \index{sampling distribution!effect of sample size}
623
623
--->
624
624
625
625
### Summary
626
-
1. A point estimate is a single value computed using a sample from a population (e.g., a mean or proportion)
626
+
1. A point estimate is a single value computed using a sample from a population (e.g., a mean or proportion).
627
627
2. The sampling distribution of an estimate is the distribution of the estimate for all possible samples of a fixed size from the same population.
628
628
3. The shape of the sampling distribution is usually bell-shaped with one peak and centered at the population mean or proportion.
629
629
4. The spread of the sampling distribution is related to the sample size. As the sample size increases, the spread of the sampling distribution decreases.
@@ -745,15 +745,15 @@ approximation that we call **the bootstrap distribution**. \index{bootstrap!dist
745
745
746
746
This section will explore how to create a bootstrap distribution from a single
747
747
sample using R. The process is visualized in Figure \@ref(fig:11-intro-bootstrap-image).
748
-
For a sample of size $n$, you:
748
+
For a sample of size $n$, you would do the following:
749
749
750
-
1. Randomly select an observation from the original sample, which was drawn from the population
751
-
2. Record the observation's value
752
-
3. Replace that observation
753
-
4. Repeat steps 1 - 3 (sampling *with* replacement) until you have $n$ observations, which form a bootstrap sample
754
-
5. Calculate the bootstrap point estimate (e.g., mean, median, proportion, slope, etc.) of the $n$ observations in your bootstrap sample
755
-
6. Repeat steps (1) - (5) many times to create a distribution of point estimates (the bootstrap distribution)
756
-
7. Calculate the plausible range of values around our observed point estimate
750
+
1. Randomly select an observation from the original sample, which was drawn from the population.
751
+
2. Record the observation's value.
752
+
3. Replace that observation.
753
+
4. Repeat steps 1 - 3 (sampling *with* replacement) until you have $n$ observations, which form a bootstrap sample.
754
+
5. Calculate the bootstrap point estimate (e.g., mean, median, proportion, slope, etc.) of the $n$ observations in your bootstrap sample.
755
+
6. Repeat steps (1) - (5) many times to create a distribution of point estimates (the bootstrap distribution).
756
+
7. Calculate the plausible range of values around our observed point estimate.
To calculate a 95\% percentile bootstrap confidence interval, we will do the following:
1104
1104
1105
-
1. Arrange the observations in the bootstrap distribution in ascending order
1106
-
2. Find the value such that 2.5\% of observations fall below it (the 2.5\% percentile). Use that value as the lower bound of the interval
1107
-
3. Find the value such that 97.5\% of observations fall below it (the 97.5\% percentile). Use that value as the upper bound of the interval
1105
+
1. Arrange the observations in the bootstrap distribution in ascending order.
1106
+
2. Find the value such that 2.5\% of observations fall below it (the 2.5\% percentile). Use that value as the lower bound of the interval.
1107
+
3. Find the value such that 97.5\% of observations fall below it (the 97.5\% percentile). Use that value as the upper bound of the interval.
1108
1108
1109
1109
To do this in R, we can use the `quantile()` function:
1110
1110
\index{quantile}
@@ -1172,5 +1172,5 @@ found in Chapter \@ref(move-to-your-own-machine).
1172
1172
1173
1173
## Additional resources
1174
1174
1175
-
- Chapters 7 to 10 of [Modern Dive](https://moderndive.com/) provide a great next step in learning about inference. In particular, Chapters 7 and 8 cover sampling and bootstrapping using `tidyverse` and `infer` in a slightly more in-depth manner than the present chapter. Chapters 9 and 10 take the next step beyond the scope of this chapter and begin to provide some of the initial mathematical underpinnings of inference and more advanced applications of the concept of inference in testing hypotheses and performing regression. This material offers a great starting point for getting more into the technical side of statistics.
1176
-
- Chapters 4 to 7 of [OpenIntro Statistics - Fourth Edition](https://www.openintro.org/) provide a good next step after Modern Dive. Although it is still certainly an introductory text, things get a bit more mathematical here. Depending on your background, you may actually want to start going through Chapters 1 to 3 first, where you will learn some fundamental concepts in probability theory. Although it may seem like a diversion, probability theory is *the language of statistics*; if you have a solid grasp of probability, more advanced statistics will come naturally to you!
1175
+
- Chapters 7 to 10 of [*Modern Dive*](https://moderndive.com/) provide a great next step in learning about inference. In particular, Chapters 7 and 8 cover sampling and bootstrapping using `tidyverse` and `infer` in a slightly more in-depth manner than the present chapter. Chapters 9 and 10 take the next step beyond the scope of this chapter and begin to provide some of the initial mathematical underpinnings of inference and more advanced applications of the concept of inference in testing hypotheses and performing regression. This material offers a great starting point for getting more into the technical side of statistics.
1176
+
- Chapters 4 to 7 of [*OpenIntro Statistics - Fourth Edition*](https://www.openintro.org/) provide a good next step after *Modern Dive*. Although it is still certainly an introductory text, things get a bit more mathematical here. Depending on your background, you may actually want to start going through Chapters 1 to 3 first, where you will learn some fundamental concepts in probability theory. Although it may seem like a diversion, probability theory is *the language of statistics*; if you have a solid grasp of probability, more advanced statistics will come naturally to you!
0 commit comments