Skip to content

Commit 3b914d0

Browse files
Proof reading: intro to r stats
1 parent 95af908 commit 3b914d0

File tree

1 file changed

+33
-33
lines changed

1 file changed

+33
-33
lines changed

individual_modules/introduction_to_r/stats.ipynb

Lines changed: 33 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -182,13 +182,13 @@
182182
"metadata": {},
183183
"source": [
184184
"In this dataset we have both integer and character variables. Gender is provided as a binary variable \n",
185-
"(also known as an indicator or dummay variable called ```male``` coded as 0 \n",
186-
"(for female) and 1 (for male). This is not nessecary, if it was coded as a factor with the levels \n",
185+
"(also known as an indicator or dummy variable) called ```male``` coded as 0 \n",
186+
"(for female) and 1 (for male). This is not necessary, if it was coded as a factor with the levels \n",
187187
"\"Female\" and \"Male\" R would be happy to use in the \n",
188188
"statistical functions. Coding it in this way however, is more aligned with how it is used in statistics, \n",
189-
"and may make interpetation easier. \n",
189+
"and may make interpretation easier. \n",
190190
"\n",
191-
"The variable `intervention` is a character variable, although arguably should be coded as a factor. However,\n",
191+
"The variable `intervention` is a character variable, although arguably this should be coded as a factor. However,\n",
192192
"in most cases R will convert it to a factor within the function calls if it is needed.\n",
193193
"\n",
194194
"\n",
@@ -562,7 +562,7 @@
562562
"command requires 3 arguments :\n",
563563
"\n",
564564
" + a numeric variable which we want to summarise (in the example below this is ```age```) \n",
565-
" + a categorical variable indicating the subgroups,which we want to group by (in the example below this is ```male```)\n",
565+
" + a categorical variable indicating the subgroups which we want to group by (in the example below this is ```male```)\n",
566566
" + the function we wish to call on each subgroup (in the example below this is ```mean```)"
567567
]
568568
},
@@ -597,9 +597,9 @@
597597
"id": "d458ec38-644c-4097-88d9-aa7b7a0b5eb8",
598598
"metadata": {},
599599
"source": [
600-
"If we want to calculate a summary statistics for a sub group, by subsetting it and providing the subset to the ```mean()``` function. For example \n",
601-
"to calucate the mean of just the means (i.e where male==1). \n",
602-
"Note the use of ```==``` for specifiying an equality condition.\n"
600+
"If we want to calculate a summary statistics for a sub group, we do this by subsetting it and providing the subset to the ```mean()``` function. For example, \n",
601+
"to calculate the mean of just the means (i.e where male==1). \n",
602+
"Note the use of ```==``` for specifying an equality condition.\n"
603603
]
604604
},
605605
{
@@ -666,11 +666,11 @@
666666
"id": "b01b12d0-8de4-407f-b3f2-6d7366f5fa41",
667667
"metadata": {},
668668
"source": [
669-
"If we want to calculate summary statistics for combinations of groups for example by sex and intervention group, we can use the ```aggregate()```\n",
670-
" command. We use the formula method to specify which variables we want summarise (on the left hand side of the ~) and which we want to group by\n",
669+
"If we want to calculate summary statistics for combinations of groups - for example by sex and intervention group, - we can use the ```aggregate()```\n",
670+
" command. We use the formula method to specify which variables we want to summarise (on the left hand side of the ~) and which we want to group by\n",
671671
" (on the right hand side of the ~). If all the variables are included in\n",
672672
" single data.frame, we can construct the formula using just the column names, and include the argument ```data``` to specify which object these are found\n",
673-
" in. The ```FUN``` argument specfies which function you want to apply to these data, which is the mean in this example. \n",
673+
" in. The ```FUN``` argument specifies which function you want to apply to these data, which is the mean in this example. \n",
674674
" Note that we need to include the dataset as an argument, even though we have attached ```bp_dataset```.\n",
675675
"\n"
676676
]
@@ -754,7 +754,7 @@
754754
"id": "a9e365e2-0e41-46ce-ab51-0f346d047625",
755755
"metadata": {},
756756
"source": [
757-
"We can also produce cross-tabulations for two categorical variables. The first variable will form the rows, and The\n",
757+
"We can also produce cross-tabulations for two categorical variables. The first variable will form the rows, and the\n",
758758
"second variable the columns. "
759759
]
760760
},
@@ -791,7 +791,7 @@
791791
"id": "e2ee8182-1593-4419-aee5-dd9719608253",
792792
"metadata": {},
793793
"source": [
794-
"In fact we can produce tables for more than 2 categorical variables. R will print these as a series of 2 dimensional\n",
794+
"In fact, we can produce tables for more than 2 categorical variables. R will print these as a series of 2-dimensional\n",
795795
"tables for fixed values of the other variables.\n",
796796
"\n",
797797
"If we also wish to calculate proportions or percentages, we can use the ```prop.table()``` command. We first need to create the \n",
@@ -867,7 +867,7 @@
867867
"id": "f96f09d3-a6ca-475a-a2c8-c02d171ed663",
868868
"metadata": {},
869869
"source": [
870-
"We can also create a table for a subgroup of the data by providing just a subset of the data to the ```table()``` function. For example to count the number of each sex, only for people aged over 50:\n"
870+
"We can also create a table for a subgroup of the data by providing just a subset of the data to the ```table()``` function. For example, to count the number of each sex, only for people aged over 50:\n"
871871
]
872872
},
873873
{
@@ -903,16 +903,16 @@
903903
"metadata": {},
904904
"source": [
905905
"## Activity \n",
906-
"1. Create table of frequencies of the number of indivduals in each intervention group. Convert this into \n",
906+
"1. Create table of frequencies of the number of individuals in each intervention group. Convert this into \n",
907907
"and a table of percentages.\n",
908908
"\n",
909-
"2. Create table of frequencies of each intervention group stratified by whether individual's baseline blood pressure is \n",
909+
"2. Create a table of frequencies of each intervention group stratified by whether the individual's baseline blood pressure is \n",
910910
"greater than or equal to 180.\n",
911911
"\n",
912912
"# Common statistical tests: One-sample t-test\n",
913913
"\n",
914914
"There are several types of t-test. We will start with the simplest: a one-sample two-sided t-test to test the null hypothesis that the true mean \n",
915-
"value of a continuous variable is equal to a pre-specified value. The default behaviour, and the most common application is to compare to a value of 0.\n",
915+
"value of a continuous variable is equal to a pre-specified value. The default behaviour, and the most common application, is to compare to a value of 0.\n",
916916
"\n"
917917
]
918918
},
@@ -957,10 +957,10 @@
957957
"id": "ae02ccda-a868-44ba-812e-324c60b55408",
958958
"metadata": {},
959959
"source": [
960-
"We can see in the console, we get a more verbose output than we have seen before. Mainly because the result of statistical test often includes multiple\n",
961-
"statistics, and the orginial writers of the ```t.test()``` function have made an effort to present these back to the user in an easy to intepret way. \n",
960+
"We can see in the console, we get a more verbose output than we have seen before. Mainly because the result of a statistical test often includes multiple\n",
961+
"statistics, and the original writers of the ```t.test()``` function have made an effort to present these back to the user in an easy to interpret way. \n",
962962
"\n",
963-
"We can see there is a statement at the top of the output reminding or confirming which statistical test we have performed, and underneath this a confirmation\n",
963+
"We can see there is a statement at the top of the output reminding or confirming which statistical test we have performed, and underneath this is a confirmation\n",
964964
"of which variable/data this was performed on. \n",
965965
"\n",
966966
"We then have a line of multiple test statistics, including the p-value. Here we can see our test result is highly significant with p < 2.2e-16. Given \n",
@@ -1109,7 +1109,7 @@
11091109
"id": "99e71401-67b3-4c68-a087-fd6d37d900f6",
11101110
"metadata": {},
11111111
"source": [
1112-
"If you are only performing one (or a few) statistical tests, and you are working interatively then you might be ok to manually copy the result from the \n",
1112+
"If you are only performing one (or a few) statistical tests, and you are working iteratively then you might be able to manually copy the result from the \n",
11131113
"console. However, there are likely times when you want to extract the result from the test for further processing, for example enter it into a table to \n",
11141114
"save to your computer. We can save the output of a t.test (or indeed any other statistical test or function) to a variable, meaning we can manipulate it \n",
11151115
"further."
@@ -1191,8 +1191,8 @@
11911191
"id": "df53c9a7-cfeb-457f-b37d-55878c4c5e31",
11921192
"metadata": {},
11931193
"source": [
1194-
"```htest``` - that's a new one. This is a type of object that has been specfying defined to hold the result of a t-test. It consists of different objects\n",
1195-
"or slots where didferent parts of the result are stored. We can get a list of this elements with the function ```names()```\n"
1194+
"```htest``` - that's a new one. This is a type of object that has been specifically defined to hold the result of a t-test. It consists of different objects\n",
1195+
"or slots where different parts of the result are stored. We can get a list of this elements with the function ```names()```.\n"
11961196
]
11971197
},
11981198
{
@@ -1226,7 +1226,7 @@
12261226
"id": "8a640d03-bf42-437b-bbda-fe24a03ee2ae",
12271227
"metadata": {},
12281228
"source": [
1229-
"We can see 10 items listed. All of these are named elements stored within our ```htest``` object which we can extract by name using the ```$```. For example \n",
1229+
"We can see 10 items listed. All of these are named elements stored within our ```htest``` object which we can extract by name using the ```$```. For example, \n",
12301230
"we can get just the p-value as follows:"
12311231
]
12321232
},
@@ -1342,9 +1342,9 @@
13421342
"In this example, we want to compare the mean age between males and females, or in other words we want to test whether the true mean age is equal \n",
13431343
"for males and females.\n",
13441344
"\n",
1345-
"As we have all the data for our response variable (also called outcome variable or dependent variable), age in one object and we have a second object \n",
1345+
"As we have all the data for our response variable (also called outcome variable or dependent variable), age in one object, and we have a second object \n",
13461346
"which indicates which entries are female and which are male, we will use the formula method (signalled by the ```~```)\n",
1347-
"for specifying the comparision we want to make. "
1347+
"for specifying the comparison we want to make. "
13481348
]
13491349
},
13501350
{
@@ -1392,7 +1392,7 @@
13921392
"\n",
13931393
"1. The name of the test has changed to \"Welch Two Sample t-test\".\n",
13941394
"2. The alternative hypothesis is different.\n",
1395-
"3. There are two samples estimates, one for each group. \n",
1395+
"3. There are two sample estimates, one for each group. \n",
13961396
"\n",
13971397
"Instead of using the formula method, we can code a two sample test where the data for each group is stored in a separate object. To demonstrate this\n",
13981398
"we will create two numeric vectors, one with the ages of the female participants, and one with the ages of the male partipicants. We then provide the \n",
@@ -1499,7 +1499,7 @@
14991499
"tags": []
15001500
},
15011501
"source": [
1502-
"The default behaviour the ```t.test()``` function is to assumes unequal variance. If we wish to repeat the t-test using the assumption of equal variance\n",
1502+
"The default behaviour of the ```t.test()``` function is to assume unequal variance. If we wish to repeat the t-test using the assumption of equal variance\n",
15031503
"we can include the argument ```var.equal``` and set it to TRUE. "
15041504
]
15051505
},
@@ -1739,7 +1739,7 @@
17391739
"\n",
17401740
"# Common statistical tests: Mann-Whitney test\n",
17411741
"\n",
1742-
"An non-parametric alternative to a t-test is a a Mann-Whitney U test, which is performed using by function ```wilcox.test()```. \n",
1742+
"A non-parametric alternative to a t-test is a Mann-Whitney U test, which is performed using function ```wilcox.test()```. \n",
17431743
"It works in a very similar way to the two sample t- test, and many of the arguments are shared.\n",
17441744
"\n"
17451745
]
@@ -1780,7 +1780,7 @@
17801780
"id": "09bc9639-41ac-4bca-96eb-4df60a2268f2",
17811781
"metadata": {},
17821782
"source": [
1783-
"The output is shorter (as the test has less components) but it follows a similar format the that of the t-test.\n",
1783+
"The output is shorter (as the test has less components) but it follows a similar format to that of the t-test.\n",
17841784
"\n",
17851785
"We can save this output as a variable and extract the p-value in the same way as the t-test. The elements within \n",
17861786
"the variable are different to the t-test output but they are accessed in the same way.\n",
@@ -2012,8 +2012,8 @@
20122012
"source": [
20132013
"# Common statistical tests: Correlation\n",
20142014
"\n",
2015-
"When you have two continuous variables, it is likely a correlation statistics that you want to calculate. We can do this with the function ```cor()```.\n",
2016-
"For example to calculate the correlation between age and BP at baseline we can run:\n"
2015+
"When you have two continuous variables, it is likely a correlation statistic that you want to calculate. We can do this with the function ```cor()```.\n",
2016+
"For example, to calculate the correlation between age and BP at baseline we can run:\n"
20172017
]
20182018
},
20192019
{
@@ -2047,7 +2047,7 @@
20472047
"metadata": {},
20482048
"source": [
20492049
"The output of this function is very simple compared to the tests we looked at before, it is just a simple value. If you compare the function name to \n",
2050-
"the names o fthe other functions to perform statistical tests, this doesn't have the suffix ```.test```. This function simply caluclated the value of\n",
2050+
"the names of the other functions to perform statistical tests, this doesn't have the suffix ```.test```. This function simply calculated the value of\n",
20512051
"the correlation statistic and does not perform any hypothesis testing with it. To do that we need the ```cor.test()``` function. \n",
20522052
"\n",
20532053
"We can use ```cor.test``` to find the correlation between `age` and `bp_baseline`\n"

0 commit comments

Comments
 (0)