Added files with solution to the activities

MVesuviusC · MVesuviusC · commit e96a8d88f782 · 2025-04-09T16:06:47.000-04:00
diff --git a/solutions/day_1_solutions.Rmd b/solutions/day_1_solutions.Rmd
@@ -0,0 +1,34 @@
+## Activity
+
+- Read in “westernQuant.txt” using either read.delim() or the GUI
+    - The file can be found in the materials folder. Make sure you tell R where to find it.
+    - This is fake data of quantified western blot band intensities
+- Use summary() to look at the data
+- Use t.test() to test if intensity varies between groups
+- Make two plot()s of the two groups
+    - Make x-axis group_name for a first plot
+    - Make x-axis group_number for a second plot
+- Extra task if you're super fast and got done quick
+    - Look up wilcox.test() and use it to compare between groups
+
+If you have trouble, look at how I used these functions above as reference, or check out the documentation.
+
+### Read in data
+Use read.delim()
+```{r readData}
+western_quant <- read.delim("materials/westernQuant.txt", header = TRUE, sep = "\t")
+```
+
+### Summarize the data.frame
+Use summary()
+```{r summaryWestern}
+summary(western_quant)
+```
+
+### Test for group differences
+Use t.test() function
+```{r testDiff}
+t_test_out <- t.test(western_quant$intensity ~ western_quant$group_name)
+
+t_test_out
+```
diff --git a/solutions/day_2_solutions.Rmd b/solutions/day_2_solutions.Rmd
@@ -0,0 +1,76 @@
+## Activity
+- Use titanic.csv
+    - This file is located in the `materials` folder.
+    - This is real data on titanic passengers
+    - Survived: 1 = yes, 0 = no
+- Explore the data
+- Make variables that contain average, median, max and min values for age and fare
+    - Do the same, but split the data by male/female
+- Plot a histogram of age for all the data
+    - Plot only the survivors as a second plot
+- If you're super fast:
+    - Help someone around you who may be struggling
+    - Use table() to compare how many men/women survived
+    - Use aov() to see which factors influenced survival most
+
+### Read in and check out the Titanic data
+```{r}
+titanic_data <- read.csv("materials/titanic.csv")
+```
+
+### Make variables that contain the Mean/median/min/max of age/fare
+```{r}
+mean_age <- mean(titanic_data$Age)
+
+median_age <- median(titanic_data$Age)
+
+min_age <- min(titanic_data$Age)
+
+max_age <- max(titanic_data$Age)
+
+
+mean_fare <- mean(titanic_data$Fare)
+
+median_fare <- median(titanic_data$Fare)
+
+min_fare <- min(titanic_data$Fare)
+
+max_fare <- max(titanic_data$Fare)
+```
+
+### Use filter to make two new dataframe variables with just male and female only and then calculate the mean/median values for age and fare
+Basically, calculate the mean/median age and fare for men vs women
+```{r}
+male_data <- filter(titanic_data, Sex == "male")
+
+mean_male_age <- mean(male_data$Age)
+mean_male_fare <- mean(male_data$Fare)
+
+female_data <- filter(titanic_data, Sex == "female")
+
+mean_female_age <- mean(female_data$Age)
+mean_female_fare <- mean(female_data$Fare)
+```
+
+### Make a histogram of the age all passengers
+```{r}
+hist(titanic_data$Age, n = 100)
+```
+
+### Make another histogram for only passengers who survived
+
+## If you're super fast
+
+### Use `aov()` to test which factors influenced who survived the titanic disaster
+```{r}
+aov(Survived ~ Sex + Age + Fare, data = titanic_data) %>%
+    summary()
+```
+
+### Use table() to compare how many women vs men survived
+```{r}
+table(titanic_data$Sex, titanic_data$Survived)
+
+table(titanic_data$Sex, titanic_data$Survived) %>%
+    prop.table(margin = 1)
+```
diff --git a/solutions/day_3_solutions.Rmd b/solutions/day_3_solutions.Rmd
@@ -0,0 +1,123 @@
+## Activity
+- Use patientGroups.txt and exercise.txt from the materials folder
+    - patientGroups is patient # and who received treatment
+    - Exercise is the how many minutes each patient exercised across five days
+- Combine the datasets into a single data.frame
+- Pivot the data to long form
+    - Columns: patient, day, exercise_min, glucose, trt_group
+- Save the pivoted data.frame to a text file with write.table()
+- Make a new column by multiplying glucose by 1000
+- Calculate the average daily minutes of exercise per patient
+- If you're super fast:
+    - Help someone who is struggling
+    - Plot glucose levels for each group (treated/control)
+    - Test if they're statistically different
+    - Plot daily exercise minutes per group
+    - Test if exercise minutes was statistically different between groups
+
+### Read in patientGroups.txt and exercise.txt
+patientGroups.txt is:
+
+-   patient #
+-   treatment groups
+-   final blood glucose measurements
+
+exercise.txt is:
+
+-   patient #
+-   how many minutes each patient exercised across five days
+
+```{r}
+patient_groups <- read.table("materials/patientGroups.txt", header = TRUE, sep = "\t")
+
+exercise_data <- read_tsv("materials/exercise.txt")
+```
+
+### Combine the datasets into a single data.frame
+```{r}
+combined_data <- left_join(patient_groups, exercise_data, by = "patient")
+```
+
+### Pivot the data to long form
+Columns should be patient, day, exercise_min, glucose
+```{r}
+long_combined_data <-
+    combined_data %>%
+    pivot_longer(
+        cols = starts_with("day"),
+        names_to = "day",
+        values_to = "exercise_min"
+    )
+
+# could also do
+combined_data %>%
+    pivot_longer(
+        cols = c(day_1, day_2, day_3, day_4, day_5),
+        names_to = "day",
+        values_to = "exercise_min"
+    )
+
+# Or
+combined_data %>%
+    pivot_longer(
+        cols = c(-patient, -trt_group, -glucose),
+        names_to = "day",
+        values_to = "exercise_min"
+    )
+```
+
+### Save the pivoted data.frame to a text file with write.table() or write_tsv()
+```{r}
+write_tsv(long_combined_data, "long_combined_data.txt")
+```
+
+### Make a new column where glucose is multiplied by 1000
+```{r}
+long_combined_data <-
+    long_combined_data %>%
+    mutate(glucose = glucose * 1000)
+```
+
+### Calculate the average daily minutes of exercise per patient
+```{r}
+ave_exercise <-
+    long_combined_data %>%
+    group_by(patient) %>%
+    summarise(ave_exercise = mean(exercise_min, na.rm = TRUE))
+```
+
+## If you're super fast
+
+### Plot glucose levels for each group (treated/control)
+```{r}
+plot(
+    as.factor(long_combined_data$trt_group),
+    long_combined_data$glucose,
+    main = "Glucose Levels by Treatment Group"
+)
+```
+
+### Test if glucose levels are statistically different between groups
+```{r}
+t.test(
+    glucose ~ trt_group,
+    data = long_combined_data
+)
+```
+
+### Plot daily exercise minutes per group
+```{r}
+plot(
+    as.factor(long_combined_data$trt_group),
+    long_combined_data$exercise_min,
+    main = "Daily Exercise Minutes by Treatment Group"
+)
+```
+
+### Test if exercise minutes is statistically different between groups
+```{r}
+t.test(
+    exercise_min ~ trt_group,
+    data = long_combined_data
+)
+```
diff --git a/solutions/day_4_solutions.Rmd b/solutions/day_4_solutions.Rmd
@@ -0,0 +1,67 @@
+## Activity
+
+- These activities will be a bit of synthesis of stuff from previous days
+- use `starwars <- starwars` to get pre-loaded practice data
+- Get rid of the "films", "vehicles" and "starships" columns using `select()` and save it as a new variable
+- Write a for loop to loop over the column names in your new variable
+    - Inside the loop, if the column contains numeric data, make a histogram of it
+    - Otherwise, use `table()` to get counts of categories
+        - Look at the help documentation for `table()` to see how to use it
+- If you're super fast:
+    - Help someone who is struggling (loops can be confusing)
+    - Write a while loop to count the sum of all integers whose squared value is less than 1,234,567
+    - You should get 617,716
+
+### Get the starwars data and look at it
+library(tidyverse) # load the tidyverse library
+starwars <- starwars # to get data
+
+```{r}
+library(tidyverse)
+
+starwars <- starwars
+starwars <- starwars %>%
+    select(-films, -vehicles, -starships)
+```
+### Write a for loop using over the “starwars” column names
+Inside the loop, if the column contains numeric data, make a histogram of it
+Otherwise, use table() to get counts
+
+- Useful ideas/functions here:
+    - is.numeric()
+    - this_column_name <- "height"
+    - starwars[[this_column_name]] # same as starwars$height to select a column
+
+```{r}
+for (col in colnames(starwars)) {
+    if (is.numeric(starwars[[col]])) {
+        hist(starwars[[col]], main = col, xlab = col)
+    } else {
+        print(table(starwars[[col]]))
+    }
+    Sys.sleep(1)
+}
+```
+
+
+## If you're super fast:
+Help someone who is struggling (loops can be confusing)
+
+### Write a while loop to count the sum of all numbers whose squared value is less than 1,234,567
+
+```{r}
+sum <- 0
+i <- 1
+while (i^2 < 1234567) {
+    sum <- sum + i
+    i <- i + 1
+}
+print(sum)
+```
+1^2 < 1,234,567   # add 1 to sum total
+2^2 < 1,234,567   # add 2 to sum total
+3^2 < 1,234,567   # add 3 to sum total
+.
+.
+.
+X^2 > 1,234,567   # stop here
diff --git a/solutions/day_5_solutions.Rmd b/solutions/day_5_solutions.Rmd