Merge pull request #393 from UBC-DSCI/inference-edits

leem44 · web-flow · commit c609dfc9e03a · 2021-12-06T16:25:51.000-08:00
Copyediting for inference chapter
diff --git a/inference.Rmd b/inference.Rmd
@@ -33,7 +33,7 @@ populations and then introduce two common techniques in statistical inference:
 *point estimation* and *interval estimation*. 
 
 ## Chapter learning objectives 
-By the end of the chapter, readers will be able to:
+By the end of the chapter, readers will be able to do the following:
 
 * Describe real-world examples of questions that can be answered with statistical inference.
 * Define common population parameters (e.g., mean, proportion, standard deviation) that are often estimated using sampled data, and estimate these from a sample.
@@ -74,7 +74,7 @@ subset of individuals collected from the population. We can then compute a
 estimates the population parameter. For example, suppose we randomly selected
 ten undergraduate students across North America (the sample) and computed the
 proportion of those students who own an iPhone (the sample estimate). In that
-case, we might suspect that that proportion is a reasonable estimate of the
+case, we might suspect that proportion is a reasonable estimate of the
 proportion of students who own an iPhone in the entire population.  Figure
 \@ref(fig:11-population-vs-sample) illustrates this process.
 In general, the process of using a sample to make a conclusion about the
@@ -96,10 +96,10 @@ formulate the following question:
 
 In this case, the population consists of all studio apartment rentals in Vancouver, and the
 population parameter is the *average price-per-month*. Here we used the average
-as a measure of center to describe the "typical value" of studio apartment
+as a measure of the center to describe the "typical value" of studio apartment
 rental prices. But even within this one example, we could also be interested in
 many other population parameters. For instance, we know that not every studio
-apartment rental in Vancouver will have the same price-per-month. The student
+apartment rental in Vancouver will have the same price per month. The student
 might be interested in how much monthly prices vary and want to find a measure
 of the rentals' spread (or variability), such as the standard deviation. Or perhaps the
 student might be interested in the fraction of studio apartment rentals that
@@ -252,8 +252,8 @@ samples
 ```
 
 Notice that the column `replicate` indicates the replicate, or sample, to which
-each listing belongs. Above since by default R only prints the first few rows,
-it looks like all of the listing have `replicate` set to 1. But you can 
+each listing belongs. Above, since by default R only prints the first few rows,
+it looks like all of the listings have `replicate` set to 1. But you can 
 check the last few entries using the `tail()` function to verify that 
 we indeed created 20,000 samples (or replicates).
 
@@ -623,7 +623,7 @@ mean is roughly bell-shaped. \index{sampling distribution!effect of sample size}
 ---> 
 
 ### Summary
-1. A point estimate is a single value computed using a sample from a population (e.g., a mean or proportion)
+1. A point estimate is a single value computed using a sample from a population (e.g., a mean or proportion).
 2. The sampling distribution of an estimate is the distribution of the estimate for all possible samples of a fixed size from the same population.
 3. The shape of the sampling distribution is usually bell-shaped with one peak and centered at the population mean or proportion.
 4. The spread of the sampling distribution is related to the sample size. As the sample size increases, the spread of the sampling distribution decreases. 
@@ -745,15 +745,15 @@ approximation that we call **the bootstrap distribution**. \index{bootstrap!dist
 
 This section will explore how to create a bootstrap distribution from a single
 sample using R.  The process is visualized in Figure \@ref(fig:11-intro-bootstrap-image). 
-For a sample of size $n$, you:
+For a sample of size $n$, you would do the following:
 
-1. Randomly select an observation from the original sample, which was drawn from the population
-2. Record the observation's value 
-3. Replace that observation
-4. Repeat steps 1 - 3 (sampling *with* replacement) until you have $n$ observations, which form a bootstrap sample
-5. Calculate the bootstrap point estimate (e.g., mean, median, proportion, slope, etc.) of the $n$ observations in your bootstrap sample
-6. Repeat steps (1) - (5) many times to create a distribution of point estimates (the bootstrap distribution)
-7. Calculate the plausible range of values around our observed point estimate
+1. Randomly select an observation from the original sample, which was drawn from the population.
+2. Record the observation's value.
+3. Replace that observation.
+4. Repeat steps 1 - 3 (sampling *with* replacement) until you have $n$ observations, which form a bootstrap sample.
+5. Calculate the bootstrap point estimate (e.g., mean, median, proportion, slope, etc.) of the $n$ observations in your bootstrap sample.
+6. Repeat steps (1) - (5) many times to create a distribution of point estimates (the bootstrap distribution).
+7. Calculate the plausible range of values around our observed point estimate.
 
 ```{r 11-intro-bootstrap-image, echo = FALSE, message = FALSE, warning = FALSE, fig.cap = "Overview of the bootstrap process.", fig.retina = 2, out.width="100%"}
 knitr::include_graphics("img/intro-bootstrap.jpeg")
@@ -1102,9 +1102,9 @@ confidence level.
 
 To calculate a 95\% percentile bootstrap confidence interval, we will do the following:
 
-1. Arrange the observations in the bootstrap distribution in ascending order 
-2. Find the value such that 2.5\% of observations fall below it (the 2.5\% percentile). Use that value as the lower bound of the interval 
-3. Find the value such that 97.5\% of observations fall below it (the 97.5\% percentile). Use that value as the upper bound of the interval
+1. Arrange the observations in the bootstrap distribution in ascending order. 
+2. Find the value such that 2.5\% of observations fall below it (the 2.5\% percentile). Use that value as the lower bound of the interval.
+3. Find the value such that 97.5\% of observations fall below it (the 97.5\% percentile). Use that value as the upper bound of the interval.
 
 To do this in R, we can use the `quantile()` function:
 \index{quantile}
@@ -1172,5 +1172,5 @@ found in Chapter \@ref(move-to-your-own-machine).
 
 ## Additional resources
 
-- Chapters 7 to 10 of [Modern Dive](https://moderndive.com/) provide a great next step in learning about inference. In particular, Chapters 7 and 8 cover sampling and bootstrapping using `tidyverse` and `infer` in a slightly more in-depth manner than the present chapter. Chapters 9 and 10 take the next step beyond the scope of this chapter and begin to provide some of the initial mathematical underpinnings of inference and more advanced applications of the concept of inference in testing hypotheses and performing regression. This material offers a great starting point for getting more into the technical side of statistics.
-- Chapters 4 to 7 of [OpenIntro Statistics - Fourth Edition](https://www.openintro.org/) provide a good next step after Modern Dive. Although it is still certainly an introductory text, things get a bit more mathematical here. Depending on your background, you may actually want to start going through Chapters 1 to 3 first, where you will learn some fundamental concepts in probability theory. Although it may seem like a diversion, probability theory is *the language of statistics*; if you have a solid grasp of probability, more advanced statistics will come naturally to you!
+- Chapters 7 to 10 of [*Modern Dive*](https://moderndive.com/) provide a great next step in learning about inference. In particular, Chapters 7 and 8 cover sampling and bootstrapping using `tidyverse` and `infer` in a slightly more in-depth manner than the present chapter. Chapters 9 and 10 take the next step beyond the scope of this chapter and begin to provide some of the initial mathematical underpinnings of inference and more advanced applications of the concept of inference in testing hypotheses and performing regression. This material offers a great starting point for getting more into the technical side of statistics.
+- Chapters 4 to 7 of [*OpenIntro Statistics - Fourth Edition*](https://www.openintro.org/) provide a good next step after *Modern Dive*. Although it is still certainly an introductory text, things get a bit more mathematical here. Depending on your background, you may actually want to start going through Chapters 1 to 3 first, where you will learn some fundamental concepts in probability theory. Although it may seem like a diversion, probability theory is *the language of statistics*; if you have a solid grasp of probability, more advanced statistics will come naturally to you!