You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
1) Clarify mention of paired test
2) Remove reference to missing values warning—this is no longer raised as the missing values from gss have been removed.
3) Remove reference to lack of support for theoretical p-values
Copy file name to clipboardExpand all lines: vignettes/t_test.Rmd
+2-4Lines changed: 2 additions & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -27,7 +27,7 @@ library(infer)
27
27
28
28
### Introduction
29
29
30
-
In this vignette, we'll walk through conducting $t$-tests and their randomization-based analogue using infer. We'll start out with a 1-sample $t$-test, which compares a sample mean to a hypothesized true mean value. Then, we'll discuss paired $t$-tests, which are a special use case of 1-sample $t$-tests, and evaluate whether differences in paired values (e.g. some measure taken of a person before and after an experiment) differ from 0. Finally, we'll wrap up with 2-sample $t$-tests, testing the difference in means of two populations using a sample of data drawn from them.
30
+
In this vignette, we'll walk through conducting $t$-tests and their randomization-based analogue using infer. We'll start out with a 1-sample $t$-test, which compares a sample mean to a hypothesized true mean value. Then, we'll discuss 2-sample $t$-tests, testing the difference in means of two populations using a sample of data drawn from them. If you're interested in evaluating whether differences in paired values (e.g. some measure taken of a person before and after an experiment) differ from 0, see `vignette("paired", package = "infer")`.
31
31
32
32
Throughout this vignette, we'll make use of the `gss` dataset supplied by infer, which contains a sample of data from the General Social Survey. See `?gss` for more information on the variables included and their source. Note that this data (and our examples on it) are for demonstration purposes only, and will not necessarily provide accurate estimates unless weighted properly. For these examples, let's suppose that this dataset is a representative sample of a population we want to learn about: American adults. The data looks like this:
33
33
@@ -167,8 +167,6 @@ gss |>
167
167
168
168
It looks like both of these distributions are centered near 40 hours a week, but the distribution for those with a degree is slightly right skewed.
169
169
170
-
Again, note the warning about missing values---many respondents' values are missing. If we were actually carrying out this hypothesis test, we might look further into how this data was collected; it's possible that whether or not a value in either of these columns is missing is related to what that value would be.
171
-
172
170
infer's randomization-based analogue to the 2-sample $t$-test is a difference in means test. We'll start off showcasing that test before demonstrating how to carry out a theory-based $t$-test with the package.
173
171
174
172
As with the one-sample test, to calculate the observed difference in means, we can use `specify()` and `calculate()`.
@@ -220,7 +218,7 @@ null_dist_2_sample |>
220
218
direction = "two-sided")
221
219
```
222
220
223
-
It looks like our observed statistic of `r observed_statistic` would be unlikely if there was truly no relationship between degree status and number of hours worked. More exactly, we can calculate the p-value; theoretical p-values are not yet supported, so we'll use the randomization-based null distribution to do calculate the p-value.
221
+
It looks like our observed statistic of `r observed_statistic` would be unlikely if there was truly no relationship between degree status and number of hours worked. More exactly, we'll use the randomization-based null distribution to calculate the p-value.
0 commit comments