Intermediate_R/06-cheatsheet.Rmd at main · fhdsl/Intermediate_R · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
# Cheatsheet

Many of the functions we learned require the "Tidyverse" library to run.

## Lists

The one-size-fits-all data structure...

+------------------------------------------+---------------------------------------------------------------------------+------------------+
| English                                  | R Language                                                                | Output type      |
+==========================================+===========================================================================+==================+
| Creating a List                          | `my_list = list("hamburger", 1:100, c(TRUE, TRUE))`                       | List             |
+------------------------------------------+---------------------------------------------------------------------------+------------------+
| Creating a List with names               | `my_list_named = list(l1 = "hamburger", l2 = 1:100, l3 = c(TRUE, TRUE))`  | List             |
+------------------------------------------+---------------------------------------------------------------------------+------------------+
| Names of a List                          | `names(my_list_named)`                                                    | String vector    |
+------------------------------------------+---------------------------------------------------------------------------+------------------+
| Accessing elements of a List             | `my_list[[1]]`                                                            | `"hamburger"`    |
|                                          |                                                                           |                  |
|                                          | `my_list[[2]][3]`                                                         | `3`              |
+------------------------------------------+---------------------------------------------------------------------------+------------------+
| Accessing elements of a List using names | `my_list$l1` or `my_list[["l1"]]`                                         | `"hamburger"`    |
|                                          |                                                                           |                  |
|                                          | `my_list$l2[3]` or `my_list[["l2"]][3]`                                   | `3`              |
+------------------------------------------+---------------------------------------------------------------------------+------------------+
| Treating a Dataframe `df` as a List      | `my_df$col1`, `my_df[["col1"]]`                                           | Vector           |
+------------------------------------------+---------------------------------------------------------------------------+------------------+

## Exploring new data structures

If you encounter an unknown data structure, such as the result of a `t.test()`, how do you explore it?

+---------------------------------------+------------------------------+
| English                               | R Language                   |
+=======================================+==============================+
| What data structure is this?          | `class(x)`                   |
+---------------------------------------+------------------------------+
| What are its attributes?              | `attributes(x)`              |
+---------------------------------------+------------------------------+
| What are its names, if any?           | `names(x)`                   |
+---------------------------------------+------------------------------+

## Data type checking and coercing

You loaded in the data, now what?

+--------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------+------------------------+
| English                                                                                    | R Language                                                                                     | Output type            |
+============================================================================================+================================================================================================+========================+
| Is this vector a \_\_\_ type of vector?                                                    | `is.numeric(vec)`, `is.double(vec)`, `is.integer(vec)`, `is.character(vec)` `is.logical(vec)`  | Logical value          |
+--------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------+------------------------+
| Convert \_\_\_ type of vector to \_\_\_\_ type of vector.                                  | `as.numeric(vec)`, `as.double(vec)`, `as.integer(vec)`, `as.character(vec)`                    | Vector of desired form |
|                                                                                            |                                                                                                |                        |
| Order of coercing that is allowed: Logical vector -\> Numeric vector -\> Character vector  |                                                                                                |                        |
+--------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------+------------------------+

## Subsetting and removing missing values

+----------------------------------------------------------------------------------------------------+--------------------------------------+------------------------------------------------------+
| English                                                                                            | R Language                           | Output type                                          |
+====================================================================================================+======================================+======================================================+
| Subset `vec` to be greater than 0                                                                  | `vec[vec > 0]`                       | Vector                                               |
+----------------------------------------------------------------------------------------------------+--------------------------------------+------------------------------------------------------+
| Subset `vec` to have "chris" or "bob"                                                              | `vec[vec == "chris" | vec == "bob"]` | Vector                                               |
+----------------------------------------------------------------------------------------------------+--------------------------------------+------------------------------------------------------+
| Where are the missing values in this vector?                                                       | `is.na(vec)`                         | Logical vector indicating where the missing value is |
+----------------------------------------------------------------------------------------------------+--------------------------------------+------------------------------------------------------+
| Given vector `vec`, subset to non-missing values                                                   | `vec[!is.na(vec)]`                   | Vector                                               |
+----------------------------------------------------------------------------------------------------+--------------------------------------+------------------------------------------------------+
| Given a dataframe `df`, subset the rows so that the column `col1` does not have any missing values | `filter(df, !is.na(col1))`           | Dataframe                                            |
+----------------------------------------------------------------------------------------------------+--------------------------------------+------------------------------------------------------+

## Data recoding

+---------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------+
| English                                                                                                                                                 | R Language                                                 |
+=========================================================================================================================================================+============================================================+
| If vector `vec` has the value "x", recode it as "a"                                                                                                     | `vec[vec == "x"] = "a"`                                    |
|                                                                                                                                                         |                                                            |
|                                                                                                                                                         | or                                                         |
|                                                                                                                                                         |                                                            |
|                                                                                                                                                         | `if_else(vec == "x", "a", vec)`                            |
+---------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------+
| If vector `vec` has the value "x", recode it as "a", anything else recode as "b"                                                                        | `if_else(vec == "x", "a", "b")`                            |
+---------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------+
| If vector `vec` has the value "x", recode it as "a", else if `vec` has value "y", recode it as "b", anything else recode as "z".                        | ```                                                        |
|                                                                                                                                                         | case_when(vec == "x" ~ "a",                                |
|                                                                                                                                                         | vec == "y" ~ "b",                                          |
|                                                                                                                                                         | .default = "z")                                            |
|                                                                                                                                                         | ```                                                        |
+---------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------+
| If vector `vec` has the value "x", recode it as "a", else if `vec` has value "y", recode it as "b", anything else leave it as is.                       | ```                                                        |
|                                                                                                                                                         | case_when(vec == "x" ~ "a",                                |
|                                                                                                                                                         | vec == "y" ~ "b",                                          |
|                                                                                                                                                         | .default = vec)                                            |
|                                                                                                                                                         | ```                                                        |
+---------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------+
| If dataframe `df` column `col` has the value "x", recode it as "a"                                                                                      | `df$col[df$col == "x"] = "a"`                              |
|                                                                                                                                                         |                                                            |
|                                                                                                                                                         | or                                                         |
|                                                                                                                                                         |                                                            |
|                                                                                                                                                         | `df$col = if_else(df$col == "x", "a", df$col)`             |
+---------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------+
| If dataframe `df` column `col` has the value "x", recode it as "a", anything else recode as "b"                                                         | `df$col = if_else(df$col == "x", "a", "b")`                |
|                                                                                                                                                         |                                                            |
|                                                                                                                                                         | or                                                         |
|                                                                                                                                                         |                                                            |
|                                                                                                                                                         | `df$col = mutate(df, col = if_else(col == "x", "a", "b"))` |
+---------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------+
| If dataframe `df` column `col` has the value "x", recode it as "a", else if column `col` has value "y", recode it as "b", anything else recode as "z".  | ```                                                        |
|                                                                                                                                                         | df$col = case_when(df$col == "x" ~ "a",                    |
|                                                                                                                                                         | df$col == "y" ~ "b",                                       |
|                                                                                                                                                         | .default = "z")                                            |
|                                                                                                                                                         | ```                                                        |
|                                                                                                                                                         |                                                            |
|                                                                                                                                                         | or                                                         |
|                                                                                                                                                         |                                                            |
|                                                                                                                                                         | ```                                                        |
|                                                                                                                                                         | df$col = mutate(df, col = case_when(col == "x" ~ "a",      |
|                                                                                                                                                         | col == "y" ~ "b",                                          |
|                                                                                                                                                         | .default = "z"))                                           |
|                                                                                                                                                         | ```                                                        |
+---------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------+
| If dataframe `df` column `col` has the value "x", recode it as "a", else if column `col` has value "y", recode it as "b", anything else leave it as is. | ```                                                        |
|                                                                                                                                                         | df$col = case_when(df$col == "x" ~ "a",                    |
|                                                                                                                                                         | df$col == "y" ~ "b",                                       |
|                                                                                                                                                         | .default = df$col)                                         |
|                                                                                                                                                         | ```                                                        |
|                                                                                                                                                         |                                                            |
|                                                                                                                                                         | or                                                         |
|                                                                                                                                                         |                                                            |
|                                                                                                                                                         | ```                                                        |
|                                                                                                                                                         | df$col = mutate(df, col = case_when(col == "x" ~ "a",      |
|                                                                                                                                                         | col == "y" ~ "b",                                          |
|                                                                                                                                                         | .default = col))                                           |
|                                                                                                                                                         | ```                                                        |
+---------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------+

## Conditional statements

+-----------------------------------+----------------------------------+
| English                           | R Language                       |
+===================================+==================================+
| If statement                      | ```                              |
|                                   | if(condition) {                  |
|                                   |                                  |
|                                   | }                                |
|                                   | ```                              |
+-----------------------------------+----------------------------------+
| If-else if statement              | ```                              |
|                                   | if(condition1) {                 |
|                                   |                                  |
|                                   | }else if(condition2) {           |
|                                   |                                  |
|                                   | }                                |
|                                   | ```                              |
+-----------------------------------+----------------------------------+
| If-else statement                 | ```                              |
|                                   | if(condition1) {                 |
|                                   |                                  |
|                                   | }else {                          |
|                                   |                                  |
|                                   | }                                |
|                                   | ```                              |
+-----------------------------------+----------------------------------+
| If-else if-else statement         | ```                              |
|                                   | if(condition1) {                 |
|                                   |                                  |
|                                   | }else if(condition2) {           |
|                                   |                                  |
|                                   | }else {                          |
|                                   |                                  |
|                                   | }                                |
|                                   | ```                              |
+-----------------------------------+----------------------------------+

## Dataframe Transformations

+---------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------+-------------+
| English                                                                                                                                                       | R Language                                                                              | Output type |
+===============================================================================================================================================================+=========================================================================================+=============+
| Pivot longer on Dataframe `df` with columns names `q1`, `q2`, `q3` to be put into its own vector "quarter" and corresponding values to its own column "sales" | `pivot_longer(df, c("q1", "q2", "q3"), names_to = "quarter", values_to = "sales")`      | Dataframe   |
|                                                                                                                                                               |                                                                                         |             |
| More notes on [specifying patterns for columns here](https://tidyselect.r-lib.org/reference/starts_with.html).                                                | or                                                                                      |             |
|                                                                                                                                                               |                                                                                         |             |
|                                                                                                                                                               | `pivot_longer(df, starts_with("q"), names_to = "quarter", values_to = "sales")`         |             |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------+-------------+
| Pivot wider on Dataframe `df` to take the values of column `measurement_type` to be column names and the corresponding column `values`.                       | `pivot_wider(df, names_from = "measurement_type", values_from = "values")`              | Dataframe   |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------+-------------+
| Separate Dataframe `df`'s column `patient_id_type` into two columns `patient_id` and `patient_type` by the separator `-`                                      | `separate(df, col = "patient_id_type", into = c("patient_id", "patient_type", sep="-")` | Dataframe   |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------+-------------+

## Writing functions

Some examples.

+----------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------+
| English                                                                                                                          | R Langauge                                                                              |
+==================================================================================================================================+=========================================================================================+
| Write a function that takes in a vector and returns a vector of the same length, such as a z-score transformation                | ```                                                                                     |
|                                                                                                                                  | z_score = function(vec) {                                                               |
|                                                                                                                                  |   result = (vec - mean(vec)) / sd(vec)                                                  |
|                                                                                                                                  |   return(result)                                                                        |
|                                                                                                                                  | }                                                                                       |
|                                                                                                                                  | ```                                                                                     |
|                                                                                                                                  |                                                                                         |
|                                                                                                                                  | then, to use it:                                                                        |
|                                                                                                                                  |                                                                                         |
|                                                                                                                                  | ```                                                                                     |
|                                                                                                                                  | df$biomarker_standardized = z_score(df$biomarker)                                       |
|                                                                                                                                  | ```                                                                                     |
+----------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------+
| Write a function that takes in a vector and returns a summary statistic, such as the difference in highest and lowest value      | ```                                                                                     |
|                                                                                                                                  | max_diff = function(vec) {                                                              |
|                                                                                                                                  |   result = max(vec) - min(vec)                                                          |
|                                                                                                                                  |   return(result)                                                                        |
|                                                                                                                                  | }                                                                                       |
|                                                                                                                                  | ```                                                                                     |
|                                                                                                                                  |                                                                                         |
|                                                                                                                                  | then, to use it:                                                                        |
|                                                                                                                                  |                                                                                         |
|                                                                                                                                  | ```                                                                                     |
|                                                                                                                                  | max_diff(df$biomarker)                                                                  |
|                                                                                                                                  | ```                                                                                     |
+----------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------+
| Write a function that takes in a Dataframe and returns some summary information about it, such as its dimension                  | ```                                                                                     |
|                                                                                                                                  | my_dim = function(df) {                                                                 |
|                                                                                                                                  |   result = c(nrow(df), ncol(df))                                                        |
|                                                                                                                                  |   return(result)                                                                        |
|                                                                                                                                  | }                                                                                       |
|                                                                                                                                  | ```                                                                                     |
|                                                                                                                                  |                                                                                         |
|                                                                                                                                  | then, to use it:                                                                        |
|                                                                                                                                  |                                                                                         |
|                                                                                                                                  | ```                                                                                     |
|                                                                                                                                  | my_dim(penguins)                                                                        |
|                                                                                                                                  | ```                                                                                     |
+----------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------+
| Write a function that takes in a character data type, and returns a Dataframe, such as loading and preprocessing the Dataframe.  | ```                                                                                     |
|                                                                                                                                  | load_and_process = function(filepath) {                                                 |
|                                                                                                                                  |   df = read_csv(filepath)                                                               |
|                                                                                                                                  |   df = pivot_longer(df, c("q1", "q2", "q3"), names_to = "quarter", values_to = "sales") |
|                                                                                                                                  |   return(df)                                                                            |
|                                                                                                                                  | }                                                                                       |
|                                                                                                                                  | ```                                                                                     |
|                                                                                                                                  |                                                                                         |
|                                                                                                                                  | then, to use it:                                                                        |
|                                                                                                                                  |                                                                                         |
|                                                                                                                                  | ```                                                                                     |
|                                                                                                                                  | sales_df = load_and_process("sales_data.csv")                                           |
|                                                                                                                                  | ```                                                                                     |
+----------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------+

## Iteration

Some examples

+--------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------+--------------------+
| English                                                                                                                        | R Language                                                                                            | Output type        |
+================================================================================================================================+=======================================================================================================+====================+
| Iterate on a vector of characters representing filepaths, where the function loads in Dataframes.                              | ```                                                                                                   | List of Dataframes |
|                                                                                                                                | files = c("f1.csv", "f2.csv", "f3.csv")                                                               |                    |
|                                                                                                                                |                                                                                                       |                    |
|                                                                                                                                | map(files, read_csv)                                                                                  |                    |
|                                                                                                                                | ```                                                                                                   |                    |
+--------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------+--------------------+
| Iterate on a vector of characters representing filepaths, where the a fucstom function loads in Dataframes and processes them. | ```                                                                                                   | List of Dataframes |
|                                                                                                                                | process_data = function(file) {                                                                       |                    |
|                                                                                                                                |   df = read_csv(file)                                                                                 |                    |
|                                                                                                                                |   drop_na(df)                                                                                         |                    |
|                                                                                                                                |   return(df)                                                                                          |                    |
|                                                                                                                                | }                                                                                                     |                    |
|                                                                                                                                |                                                                                                       |                    |
|                                                                                                                                | files = c("f1.csv", "f2.csv", "f3.csv")                                                               |                    |
|                                                                                                                                |                                                                                                       |                    |
|                                                                                                                                | map(files, read_csv)                                                                                  |                    |
|                                                                                                                                | ```                                                                                                   |                    |
+--------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------+--------------------+
| Iterate on the columns of a Dataframe to compute summary statistics. (Treat the Dataframe as a List to be itereated through)   | ```                                                                                                   | Numerical vector   |
|                                                                                                                                | penguins_numeric = penguins %>% select(bill_length_mm, bill_depth_mm, flipper_length_mm, body_mass_g) |                    |
|                                                                                                                                |                                                                                                       |                    |
|                                                                                                                                | map_dbl(penguins_numeric, mean, na.rm = TRUE)                                                         |                    |
|                                                                                                                                | ```                                                                                                   |                    |
+--------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------+--------------------+
| Itereate over different conditions to analyze a Dataframe multiple times.                                                      | ```                                                                                                   | Numerical vector   |
|                                                                                                                                | penguins_analysis = function(current_species) {                                                       |                    |
|                                                                                                                                |   penguins_subset = filter(penguins, species == current_species)                                      |                    |
|                                                                                                                                |   result = mean(penguins_subset$bill_length_mm, na.rm=TRUE)                                           |                    |
|                                                                                                                                |   return(result)                                                                                      |                    |
|                                                                                                                                | }                                                                                                     |                    |
|                                                                                                                                |                                                                                                       |                    |
|                                                                                                                                | map_dbl(c("Adelie", "Chinstrap", "Gentoo"), penguins_analysis)                                        |                    |
|                                                                                                                                | ```                                                                                                   |                    |
+--------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------+--------------------+