|
182 | 182 | "metadata": {},
|
183 | 183 | "source": [
|
184 | 184 | "In this dataset we have both integer and character variables. Gender is provided as a binary variable \n",
|
185 |
| - "(also known as an indicator or dummay variable called ```male``` coded as 0 \n", |
186 |
| - "(for female) and 1 (for male). This is not nessecary, if it was coded as a factor with the levels \n", |
| 185 | + "(also known as an indicator or dummy variable) called ```male``` coded as 0 \n", |
| 186 | + "(for female) and 1 (for male). This is not necessary, if it was coded as a factor with the levels \n", |
187 | 187 | "\"Female\" and \"Male\" R would be happy to use in the \n",
|
188 | 188 | "statistical functions. Coding it in this way however, is more aligned with how it is used in statistics, \n",
|
189 |
| - "and may make interpetation easier. \n", |
| 189 | + "and may make interpretation easier. \n", |
190 | 190 | "\n",
|
191 |
| - "The variable `intervention` is a character variable, although arguably should be coded as a factor. However,\n", |
| 191 | + "The variable `intervention` is a character variable, although arguably this should be coded as a factor. However,\n", |
192 | 192 | "in most cases R will convert it to a factor within the function calls if it is needed.\n",
|
193 | 193 | "\n",
|
194 | 194 | "\n",
|
|
562 | 562 | "command requires 3 arguments :\n",
|
563 | 563 | "\n",
|
564 | 564 | " + a numeric variable which we want to summarise (in the example below this is ```age```) \n",
|
565 |
| - " + a categorical variable indicating the subgroups,which we want to group by (in the example below this is ```male```)\n", |
| 565 | + " + a categorical variable indicating the subgroups which we want to group by (in the example below this is ```male```)\n", |
566 | 566 | " + the function we wish to call on each subgroup (in the example below this is ```mean```)"
|
567 | 567 | ]
|
568 | 568 | },
|
|
597 | 597 | "id": "d458ec38-644c-4097-88d9-aa7b7a0b5eb8",
|
598 | 598 | "metadata": {},
|
599 | 599 | "source": [
|
600 |
| - "If we want to calculate a summary statistics for a sub group, by subsetting it and providing the subset to the ```mean()``` function. For example \n", |
601 |
| - "to calucate the mean of just the means (i.e where male==1). \n", |
602 |
| - "Note the use of ```==``` for specifiying an equality condition.\n" |
| 600 | + "If we want to calculate a summary statistics for a sub group, we do this by subsetting it and providing the subset to the ```mean()``` function. For example, \n", |
| 601 | + "to calculate the mean of just the means (i.e where male==1). \n", |
| 602 | + "Note the use of ```==``` for specifying an equality condition.\n" |
603 | 603 | ]
|
604 | 604 | },
|
605 | 605 | {
|
|
666 | 666 | "id": "b01b12d0-8de4-407f-b3f2-6d7366f5fa41",
|
667 | 667 | "metadata": {},
|
668 | 668 | "source": [
|
669 |
| - "If we want to calculate summary statistics for combinations of groups for example by sex and intervention group, we can use the ```aggregate()```\n", |
670 |
| - " command. We use the formula method to specify which variables we want summarise (on the left hand side of the ~) and which we want to group by\n", |
| 669 | + "If we want to calculate summary statistics for combinations of groups - for example by sex and intervention group, - we can use the ```aggregate()```\n", |
| 670 | + " command. We use the formula method to specify which variables we want to summarise (on the left hand side of the ~) and which we want to group by\n", |
671 | 671 | " (on the right hand side of the ~). If all the variables are included in\n",
|
672 | 672 | " single data.frame, we can construct the formula using just the column names, and include the argument ```data``` to specify which object these are found\n",
|
673 |
| - " in. The ```FUN``` argument specfies which function you want to apply to these data, which is the mean in this example. \n", |
| 673 | + " in. The ```FUN``` argument specifies which function you want to apply to these data, which is the mean in this example. \n", |
674 | 674 | " Note that we need to include the dataset as an argument, even though we have attached ```bp_dataset```.\n",
|
675 | 675 | "\n"
|
676 | 676 | ]
|
|
754 | 754 | "id": "a9e365e2-0e41-46ce-ab51-0f346d047625",
|
755 | 755 | "metadata": {},
|
756 | 756 | "source": [
|
757 |
| - "We can also produce cross-tabulations for two categorical variables. The first variable will form the rows, and The\n", |
| 757 | + "We can also produce cross-tabulations for two categorical variables. The first variable will form the rows, and the\n", |
758 | 758 | "second variable the columns. "
|
759 | 759 | ]
|
760 | 760 | },
|
|
791 | 791 | "id": "e2ee8182-1593-4419-aee5-dd9719608253",
|
792 | 792 | "metadata": {},
|
793 | 793 | "source": [
|
794 |
| - "In fact we can produce tables for more than 2 categorical variables. R will print these as a series of 2 dimensional\n", |
| 794 | + "In fact, we can produce tables for more than 2 categorical variables. R will print these as a series of 2-dimensional\n", |
795 | 795 | "tables for fixed values of the other variables.\n",
|
796 | 796 | "\n",
|
797 | 797 | "If we also wish to calculate proportions or percentages, we can use the ```prop.table()``` command. We first need to create the \n",
|
|
867 | 867 | "id": "f96f09d3-a6ca-475a-a2c8-c02d171ed663",
|
868 | 868 | "metadata": {},
|
869 | 869 | "source": [
|
870 |
| - "We can also create a table for a subgroup of the data by providing just a subset of the data to the ```table()``` function. For example to count the number of each sex, only for people aged over 50:\n" |
| 870 | + "We can also create a table for a subgroup of the data by providing just a subset of the data to the ```table()``` function. For example, to count the number of each sex, only for people aged over 50:\n" |
871 | 871 | ]
|
872 | 872 | },
|
873 | 873 | {
|
|
903 | 903 | "metadata": {},
|
904 | 904 | "source": [
|
905 | 905 | "## Activity \n",
|
906 |
| - "1. Create table of frequencies of the number of indivduals in each intervention group. Convert this into \n", |
| 906 | + "1. Create table of frequencies of the number of individuals in each intervention group. Convert this into \n", |
907 | 907 | "and a table of percentages.\n",
|
908 | 908 | "\n",
|
909 |
| - "2. Create table of frequencies of each intervention group stratified by whether individual's baseline blood pressure is \n", |
| 909 | + "2. Create a table of frequencies of each intervention group stratified by whether the individual's baseline blood pressure is \n", |
910 | 910 | "greater than or equal to 180.\n",
|
911 | 911 | "\n",
|
912 | 912 | "# Common statistical tests: One-sample t-test\n",
|
913 | 913 | "\n",
|
914 | 914 | "There are several types of t-test. We will start with the simplest: a one-sample two-sided t-test to test the null hypothesis that the true mean \n",
|
915 |
| - "value of a continuous variable is equal to a pre-specified value. The default behaviour, and the most common application is to compare to a value of 0.\n", |
| 915 | + "value of a continuous variable is equal to a pre-specified value. The default behaviour, and the most common application, is to compare to a value of 0.\n", |
916 | 916 | "\n"
|
917 | 917 | ]
|
918 | 918 | },
|
|
957 | 957 | "id": "ae02ccda-a868-44ba-812e-324c60b55408",
|
958 | 958 | "metadata": {},
|
959 | 959 | "source": [
|
960 |
| - "We can see in the console, we get a more verbose output than we have seen before. Mainly because the result of statistical test often includes multiple\n", |
961 |
| - "statistics, and the orginial writers of the ```t.test()``` function have made an effort to present these back to the user in an easy to intepret way. \n", |
| 960 | + "We can see in the console, we get a more verbose output than we have seen before. Mainly because the result of a statistical test often includes multiple\n", |
| 961 | + "statistics, and the original writers of the ```t.test()``` function have made an effort to present these back to the user in an easy to interpret way. \n", |
962 | 962 | "\n",
|
963 |
| - "We can see there is a statement at the top of the output reminding or confirming which statistical test we have performed, and underneath this a confirmation\n", |
| 963 | + "We can see there is a statement at the top of the output reminding or confirming which statistical test we have performed, and underneath this is a confirmation\n", |
964 | 964 | "of which variable/data this was performed on. \n",
|
965 | 965 | "\n",
|
966 | 966 | "We then have a line of multiple test statistics, including the p-value. Here we can see our test result is highly significant with p < 2.2e-16. Given \n",
|
|
1109 | 1109 | "id": "99e71401-67b3-4c68-a087-fd6d37d900f6",
|
1110 | 1110 | "metadata": {},
|
1111 | 1111 | "source": [
|
1112 |
| - "If you are only performing one (or a few) statistical tests, and you are working interatively then you might be ok to manually copy the result from the \n", |
| 1112 | + "If you are only performing one (or a few) statistical tests, and you are working iteratively then you might be able to manually copy the result from the \n", |
1113 | 1113 | "console. However, there are likely times when you want to extract the result from the test for further processing, for example enter it into a table to \n",
|
1114 | 1114 | "save to your computer. We can save the output of a t.test (or indeed any other statistical test or function) to a variable, meaning we can manipulate it \n",
|
1115 | 1115 | "further."
|
|
1191 | 1191 | "id": "df53c9a7-cfeb-457f-b37d-55878c4c5e31",
|
1192 | 1192 | "metadata": {},
|
1193 | 1193 | "source": [
|
1194 |
| - "```htest``` - that's a new one. This is a type of object that has been specfying defined to hold the result of a t-test. It consists of different objects\n", |
1195 |
| - "or slots where didferent parts of the result are stored. We can get a list of this elements with the function ```names()```\n" |
| 1194 | + "```htest``` - that's a new one. This is a type of object that has been specifically defined to hold the result of a t-test. It consists of different objects\n", |
| 1195 | + "or slots where different parts of the result are stored. We can get a list of this elements with the function ```names()```.\n" |
1196 | 1196 | ]
|
1197 | 1197 | },
|
1198 | 1198 | {
|
|
1226 | 1226 | "id": "8a640d03-bf42-437b-bbda-fe24a03ee2ae",
|
1227 | 1227 | "metadata": {},
|
1228 | 1228 | "source": [
|
1229 |
| - "We can see 10 items listed. All of these are named elements stored within our ```htest``` object which we can extract by name using the ```$```. For example \n", |
| 1229 | + "We can see 10 items listed. All of these are named elements stored within our ```htest``` object which we can extract by name using the ```$```. For example, \n", |
1230 | 1230 | "we can get just the p-value as follows:"
|
1231 | 1231 | ]
|
1232 | 1232 | },
|
|
1342 | 1342 | "In this example, we want to compare the mean age between males and females, or in other words we want to test whether the true mean age is equal \n",
|
1343 | 1343 | "for males and females.\n",
|
1344 | 1344 | "\n",
|
1345 |
| - "As we have all the data for our response variable (also called outcome variable or dependent variable), age in one object and we have a second object \n", |
| 1345 | + "As we have all the data for our response variable (also called outcome variable or dependent variable), age in one object, and we have a second object \n", |
1346 | 1346 | "which indicates which entries are female and which are male, we will use the formula method (signalled by the ```~```)\n",
|
1347 |
| - "for specifying the comparision we want to make. " |
| 1347 | + "for specifying the comparison we want to make. " |
1348 | 1348 | ]
|
1349 | 1349 | },
|
1350 | 1350 | {
|
|
1392 | 1392 | "\n",
|
1393 | 1393 | "1. The name of the test has changed to \"Welch Two Sample t-test\".\n",
|
1394 | 1394 | "2. The alternative hypothesis is different.\n",
|
1395 |
| - "3. There are two samples estimates, one for each group. \n", |
| 1395 | + "3. There are two sample estimates, one for each group. \n", |
1396 | 1396 | "\n",
|
1397 | 1397 | "Instead of using the formula method, we can code a two sample test where the data for each group is stored in a separate object. To demonstrate this\n",
|
1398 | 1398 | "we will create two numeric vectors, one with the ages of the female participants, and one with the ages of the male partipicants. We then provide the \n",
|
|
1499 | 1499 | "tags": []
|
1500 | 1500 | },
|
1501 | 1501 | "source": [
|
1502 |
| - "The default behaviour the ```t.test()``` function is to assumes unequal variance. If we wish to repeat the t-test using the assumption of equal variance\n", |
| 1502 | + "The default behaviour of the ```t.test()``` function is to assume unequal variance. If we wish to repeat the t-test using the assumption of equal variance\n", |
1503 | 1503 | "we can include the argument ```var.equal``` and set it to TRUE. "
|
1504 | 1504 | ]
|
1505 | 1505 | },
|
|
1739 | 1739 | "\n",
|
1740 | 1740 | "# Common statistical tests: Mann-Whitney test\n",
|
1741 | 1741 | "\n",
|
1742 |
| - "An non-parametric alternative to a t-test is a a Mann-Whitney U test, which is performed using by function ```wilcox.test()```. \n", |
| 1742 | + "A non-parametric alternative to a t-test is a Mann-Whitney U test, which is performed using function ```wilcox.test()```. \n", |
1743 | 1743 | "It works in a very similar way to the two sample t- test, and many of the arguments are shared.\n",
|
1744 | 1744 | "\n"
|
1745 | 1745 | ]
|
|
1780 | 1780 | "id": "09bc9639-41ac-4bca-96eb-4df60a2268f2",
|
1781 | 1781 | "metadata": {},
|
1782 | 1782 | "source": [
|
1783 |
| - "The output is shorter (as the test has less components) but it follows a similar format the that of the t-test.\n", |
| 1783 | + "The output is shorter (as the test has less components) but it follows a similar format to that of the t-test.\n", |
1784 | 1784 | "\n",
|
1785 | 1785 | "We can save this output as a variable and extract the p-value in the same way as the t-test. The elements within \n",
|
1786 | 1786 | "the variable are different to the t-test output but they are accessed in the same way.\n",
|
|
2012 | 2012 | "source": [
|
2013 | 2013 | "# Common statistical tests: Correlation\n",
|
2014 | 2014 | "\n",
|
2015 |
| - "When you have two continuous variables, it is likely a correlation statistics that you want to calculate. We can do this with the function ```cor()```.\n", |
2016 |
| - "For example to calculate the correlation between age and BP at baseline we can run:\n" |
| 2015 | + "When you have two continuous variables, it is likely a correlation statistic that you want to calculate. We can do this with the function ```cor()```.\n", |
| 2016 | + "For example, to calculate the correlation between age and BP at baseline we can run:\n" |
2017 | 2017 | ]
|
2018 | 2018 | },
|
2019 | 2019 | {
|
|
2047 | 2047 | "metadata": {},
|
2048 | 2048 | "source": [
|
2049 | 2049 | "The output of this function is very simple compared to the tests we looked at before, it is just a simple value. If you compare the function name to \n",
|
2050 |
| - "the names o fthe other functions to perform statistical tests, this doesn't have the suffix ```.test```. This function simply caluclated the value of\n", |
| 2050 | + "the names of the other functions to perform statistical tests, this doesn't have the suffix ```.test```. This function simply calculated the value of\n", |
2051 | 2051 | "the correlation statistic and does not perform any hypothesis testing with it. To do that we need the ```cor.test()``` function. \n",
|
2052 | 2052 | "\n",
|
2053 | 2053 | "We can use ```cor.test``` to find the correlation between `age` and `bp_baseline`\n"
|
|
0 commit comments