Skip to content

Commit e96a8d8

Browse files
committed
Added files with solution to the activities
1 parent f8b8d06 commit e96a8d8

File tree

5 files changed

+420
-0
lines changed

5 files changed

+420
-0
lines changed

solutions/day_1_solutions.Rmd

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
## Activity
2+
3+
- Read in “westernQuant.txt” using either read.delim() or the GUI
4+
- The file can be found in the materials folder. Make sure you tell R where to find it.
5+
- This is fake data of quantified western blot band intensities
6+
- Use summary() to look at the data
7+
- Use t.test() to test if intensity varies between groups
8+
- Make two plot()s of the two groups
9+
- Make x-axis group_name for a first plot
10+
- Make x-axis group_number for a second plot
11+
- Extra task if you're super fast and got done quick
12+
- Look up wilcox.test() and use it to compare between groups
13+
14+
If you have trouble, look at how I used these functions above as reference, or check out the documentation.
15+
16+
### Read in data
17+
Use read.delim()
18+
```{r readData}
19+
western_quant <- read.delim("materials/westernQuant.txt", header = TRUE, sep = "\t")
20+
```
21+
22+
### Summarize the data.frame
23+
Use summary()
24+
```{r summaryWestern}
25+
summary(western_quant)
26+
```
27+
28+
### Test for group differences
29+
Use t.test() function
30+
```{r testDiff}
31+
t_test_out <- t.test(western_quant$intensity ~ western_quant$group_name)
32+
33+
t_test_out
34+
```

solutions/day_2_solutions.Rmd

Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
## Activity
2+
- Use titanic.csv
3+
- This file is located in the `materials` folder.
4+
- This is real data on titanic passengers
5+
- Survived: 1 = yes, 0 = no
6+
- Explore the data
7+
- Make variables that contain average, median, max and min values for age and fare
8+
- Do the same, but split the data by male/female
9+
- Plot a histogram of age for all the data
10+
- Plot only the survivors as a second plot
11+
- If you're super fast:
12+
- Help someone around you who may be struggling
13+
- Use table() to compare how many men/women survived
14+
- Use aov() to see which factors influenced survival most
15+
16+
### Read in and check out the Titanic data
17+
```{r}
18+
titanic_data <- read.csv("materials/titanic.csv")
19+
```
20+
21+
### Make variables that contain the Mean/median/min/max of age/fare
22+
```{r}
23+
mean_age <- mean(titanic_data$Age)
24+
25+
median_age <- median(titanic_data$Age)
26+
27+
min_age <- min(titanic_data$Age)
28+
29+
max_age <- max(titanic_data$Age)
30+
31+
32+
mean_fare <- mean(titanic_data$Fare)
33+
34+
median_fare <- median(titanic_data$Fare)
35+
36+
min_fare <- min(titanic_data$Fare)
37+
38+
max_fare <- max(titanic_data$Fare)
39+
```
40+
41+
### Use filter to make two new dataframe variables with just male and female only and then calculate the mean/median values for age and fare
42+
Basically, calculate the mean/median age and fare for men vs women
43+
```{r}
44+
male_data <- filter(titanic_data, Sex == "male")
45+
46+
mean_male_age <- mean(male_data$Age)
47+
mean_male_fare <- mean(male_data$Fare)
48+
49+
female_data <- filter(titanic_data, Sex == "female")
50+
51+
mean_female_age <- mean(female_data$Age)
52+
mean_female_fare <- mean(female_data$Fare)
53+
```
54+
55+
### Make a histogram of the age all passengers
56+
```{r}
57+
hist(titanic_data$Age, n = 100)
58+
```
59+
60+
### Make another histogram for only passengers who survived
61+
62+
## If you're super fast
63+
64+
### Use `aov()` to test which factors influenced who survived the titanic disaster
65+
```{r}
66+
aov(Survived ~ Sex + Age + Fare, data = titanic_data) %>%
67+
summary()
68+
```
69+
70+
### Use table() to compare how many women vs men survived
71+
```{r}
72+
table(titanic_data$Sex, titanic_data$Survived)
73+
74+
table(titanic_data$Sex, titanic_data$Survived) %>%
75+
prop.table(margin = 1)
76+
```

solutions/day_3_solutions.Rmd

Lines changed: 123 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,123 @@
1+
## Activity
2+
- Use patientGroups.txt and exercise.txt from the materials folder
3+
- patientGroups is patient # and who received treatment
4+
- Exercise is the how many minutes each patient exercised across five days
5+
- Combine the datasets into a single data.frame
6+
- Pivot the data to long form
7+
- Columns: patient, day, exercise_min, glucose, trt_group
8+
- Save the pivoted data.frame to a text file with write.table()
9+
- Make a new column by multiplying glucose by 1000
10+
- Calculate the average daily minutes of exercise per patient
11+
- If you're super fast:
12+
- Help someone who is struggling
13+
- Plot glucose levels for each group (treated/control)
14+
- Test if they're statistically different
15+
- Plot daily exercise minutes per group
16+
- Test if exercise minutes was statistically different between groups
17+
18+
### Read in patientGroups.txt and exercise.txt
19+
patientGroups.txt is:
20+
21+
- patient #
22+
- treatment groups
23+
- final blood glucose measurements
24+
25+
exercise.txt is:
26+
27+
- patient #
28+
- how many minutes each patient exercised across five days
29+
30+
```{r}
31+
patient_groups <- read.table("materials/patientGroups.txt", header = TRUE, sep = "\t")
32+
33+
exercise_data <- read_tsv("materials/exercise.txt")
34+
```
35+
36+
### Combine the datasets into a single data.frame
37+
```{r}
38+
combined_data <- left_join(patient_groups, exercise_data, by = "patient")
39+
```
40+
41+
### Pivot the data to long form
42+
Columns should be patient, day, exercise_min, glucose
43+
```{r}
44+
long_combined_data <-
45+
combined_data %>%
46+
pivot_longer(
47+
cols = starts_with("day"),
48+
names_to = "day",
49+
values_to = "exercise_min"
50+
)
51+
52+
# could also do
53+
combined_data %>%
54+
pivot_longer(
55+
cols = c(day_1, day_2, day_3, day_4, day_5),
56+
names_to = "day",
57+
values_to = "exercise_min"
58+
)
59+
60+
# Or
61+
combined_data %>%
62+
pivot_longer(
63+
cols = c(-patient, -trt_group, -glucose),
64+
names_to = "day",
65+
values_to = "exercise_min"
66+
)
67+
```
68+
69+
### Save the pivoted data.frame to a text file with write.table() or write_tsv()
70+
```{r}
71+
write_tsv(long_combined_data, "long_combined_data.txt")
72+
```
73+
74+
### Make a new column where glucose is multiplied by 1000
75+
```{r}
76+
long_combined_data <-
77+
long_combined_data %>%
78+
mutate(glucose = glucose * 1000)
79+
```
80+
81+
### Calculate the average daily minutes of exercise per patient
82+
```{r}
83+
ave_exercise <-
84+
long_combined_data %>%
85+
group_by(patient) %>%
86+
summarise(ave_exercise = mean(exercise_min, na.rm = TRUE))
87+
```
88+
89+
## If you're super fast
90+
91+
### Plot glucose levels for each group (treated/control)
92+
```{r}
93+
plot(
94+
as.factor(long_combined_data$trt_group),
95+
long_combined_data$glucose,
96+
main = "Glucose Levels by Treatment Group"
97+
)
98+
```
99+
100+
### Test if glucose levels are statistically different between groups
101+
```{r}
102+
t.test(
103+
glucose ~ trt_group,
104+
data = long_combined_data
105+
)
106+
```
107+
108+
### Plot daily exercise minutes per group
109+
```{r}
110+
plot(
111+
as.factor(long_combined_data$trt_group),
112+
long_combined_data$exercise_min,
113+
main = "Daily Exercise Minutes by Treatment Group"
114+
)
115+
```
116+
117+
### Test if exercise minutes is statistically different between groups
118+
```{r}
119+
t.test(
120+
exercise_min ~ trt_group,
121+
data = long_combined_data
122+
)
123+
```

solutions/day_4_solutions.Rmd

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
## Activity
2+
3+
- These activities will be a bit of synthesis of stuff from previous days
4+
- use `starwars <- starwars` to get pre-loaded practice data
5+
- Get rid of the "films", "vehicles" and "starships" columns using `select()` and save it as a new variable
6+
- Write a for loop to loop over the column names in your new variable
7+
- Inside the loop, if the column contains numeric data, make a histogram of it
8+
- Otherwise, use `table()` to get counts of categories
9+
- Look at the help documentation for `table()` to see how to use it
10+
- If you're super fast:
11+
- Help someone who is struggling (loops can be confusing)
12+
- Write a while loop to count the sum of all integers whose squared value is less than 1,234,567
13+
- You should get 617,716
14+
15+
### Get the starwars data and look at it
16+
library(tidyverse) # load the tidyverse library
17+
starwars <- starwars # to get data
18+
19+
```{r}
20+
library(tidyverse)
21+
22+
starwars <- starwars
23+
starwars <- starwars %>%
24+
select(-films, -vehicles, -starships)
25+
```
26+
### Write a for loop using over the “starwars” column names
27+
Inside the loop, if the column contains numeric data, make a histogram of it
28+
Otherwise, use table() to get counts
29+
30+
- Useful ideas/functions here:
31+
- is.numeric()
32+
- this_column_name <- "height"
33+
- starwars[[this_column_name]] # same as starwars$height to select a column
34+
35+
```{r}
36+
for (col in colnames(starwars)) {
37+
if (is.numeric(starwars[[col]])) {
38+
hist(starwars[[col]], main = col, xlab = col)
39+
} else {
40+
print(table(starwars[[col]]))
41+
}
42+
Sys.sleep(1)
43+
}
44+
```
45+
46+
47+
## If you're super fast:
48+
Help someone who is struggling (loops can be confusing)
49+
50+
### Write a while loop to count the sum of all numbers whose squared value is less than 1,234,567
51+
52+
```{r}
53+
sum <- 0
54+
i <- 1
55+
while (i^2 < 1234567) {
56+
sum <- sum + i
57+
i <- i + 1
58+
}
59+
print(sum)
60+
```
61+
1^2 < 1,234,567 # add 1 to sum total
62+
2^2 < 1,234,567 # add 2 to sum total
63+
3^2 < 1,234,567 # add 3 to sum total
64+
.
65+
.
66+
.
67+
X^2 > 1,234,567 # stop here

0 commit comments

Comments
 (0)