Skip to content

Commit 242ec02

Browse files
committed
Add catch up scripts and note about starting up HPC on demand if necessary to lessons wk7 01 and 02
1 parent 16f6c74 commit 242ec02

File tree

2 files changed

+40
-24
lines changed

2 files changed

+40
-24
lines changed

lessons/wk7_lesson01_hypothesis_testing.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,8 @@ dds <- DESeqDataSetFromMatrix(countData = data, colData = meta, design = ~ sampl
3838
dds <- DESeq(dds)
3939
```
4040

41+
Remember to get your HPC On Demand session going, if applicable, and open your `DEAnalysis` R project!
42+
4143
# DESeq2: Model fitting and Hypothesis testing
4244

4345
The final step in the DESeq2 workflow is taking the counts for each gene and fitting it to the model and testing for differential expression.

lessons/wk7_lesson02_wald_test.md

Lines changed: 38 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: "Exploring DESeq2 results: Wald test"
33
author: "Harvard HPC Staff, adapted by Sally Chang at NICHD"
4-
edited: "Last Modified March 2025"
4+
edited: "Last Modified May 2025"
55
---
66

77
Approximate time: 60 minutes
@@ -12,6 +12,36 @@ Approximate time: 60 minutes
1212
- Summarize the different levels of gene filtering
1313
- Explain log fold change shrinkage
1414

15+
## Catch-Up Script
16+
17+
If you need to be completely caught up, you can copy and paste the following into an R Script and run it. If you don't already have the files in your `/data` directory, please see [Wk 5 Lesson 01](../wk5_lesson01_introR_Rstudio.md) for instructions on where to obtain the input files.
18+
19+
``` r
20+
# Setup
21+
# Bioconductor and CRAN libraries used - already installed on Biowulf
22+
library(tidyverse)
23+
library(RColorBrewer)
24+
library(DESeq2)
25+
library(pheatmap)
26+
library(BiocManager)
27+
28+
# Load in data
29+
data <- read.table("data/mov10_AllSamples_featurecounts.Rmatrix.txt", header=T, row.names=1)
30+
31+
meta <- read.table("data/mov10_AllSamples_metadata.txt", header=T, row.names=1)
32+
33+
# Create DESeq2Dataset object
34+
dds <- DESeqDataSetFromMatrix(countData = data, colData = meta, design = ~ sampletype)
35+
36+
# Run DESeq2 on DESeq2Dataset object
37+
dds <- DESeq(dds)
38+
39+
# Likelihood ratio test
40+
dds_lrt <- DESeq(dds, test="LRT", reduced = ~ 1)
41+
```
42+
43+
Remember to get your HPC On Demand session going, if applicable, and open your `DEAnalysis` R project!
44+
1545
# Exploring Results (Wald test)
1646

1747
By default DESeq2 uses the Wald test to identify genes that are differentially expressed between two sample classes. Given the factor(s) used in the design formula, and how many factor levels are present, we can extract results for a number of different comparisons. Here, we will walk through how to obtain results from the `dds` object and provide some explanations on how to interpret them.
@@ -56,28 +86,14 @@ Alternatively, if you **only had two factor levels you could do nothing** and no
5686
To start, we want to evaluate **expression changes between the MOV10 overexpression samples and the control samples**. As such we will use the first method for specifying contrasts and create a character vector:
5787

5888
``` r
59-
## Define contrasts for MOV10 overexpression
89+
## Define contrasts for MOV10 overexpression - please run this!
6090
contrast_oe <- c("sampletype", "MOV10_overexpression", "control")
6191
```
6292

6393
> ### Does it matter what I choose to be my base level?
6494
>
6595
> Yes, it does matter. **Deciding what level is the base level will determine how to interpret the fold change that is reported.** So for example, if we observe a log2 fold change of -2 this would mean the gene expression is lower in factor level of interest relative to the base level. Thus, if leaving it up to DESeq2 to decide on the contrasts be sure to check that the alphabetical order coincides with the fold change direction you are anticipating.
6696
67-
## Getting Ready to work in R
68-
69-
1.Get your HPC On Demand session going:
70-
71-
- Opening up RStudio using [HPC on Demand](https://hpcondemand.nih.gov/pun/sys/dashboard/), using default values except for Starting Directory: `/data/Bspc-training/YOUR_USERNAME/rnaseq`
72-
73-
- To check whether or not you are in the correct working directory, use `getwd()`. Something like `/vf/users/Bspc-training/changes/rnaseq` should come up.
74-
75-
- Using the Project menu in the top right corner, or the Files Pane window (clicking rnaseq -\> DEanalysis), to navigate to and open `DEanalysis.Rproj`
76-
77-
2. We are assuming that you have the `dds` object in your environment and your packages are loaded - run your `de_setup.R` script if needed!
78-
79-
3. Run the actual DESeq2 analysis if needed `dds <- DESeq(dds)`.
80-
8197
## The results table
8298

8399
Now that we have our contrast created, we can use it as input to the `results()` function. Let's take a quick look at the help manual for the function:
@@ -329,11 +345,11 @@ Now that we have results for the overexpression results, do the same for the **C
329345
3. Shrink the LFC estimates using `lfcShrink()` and assign it back to `res_tableKD`.
330346
4. Store the code you used to do the above exercises in an RScript named `control_vs_kd.R` .
331347

332-
## R Script updates:
348+
## R Script updates:
333349

334350
Here is what your `de_setup.R` script should look like now to regenerate all the R objects we will need to the next lesson:
335351

336-
```r
352+
``` r
337353
# Gene-level differential expression analysis using DESeq2
338354

339355
# Setup
@@ -369,9 +385,7 @@ res_tableOE <- lfcShrink(dds, coef="sampletype_MOV10_overexpression_vs_control",
369385

370386
Now, on to [Week 7, lesson 03!](../lessons/wk7_lesson03_summarizing_dge_results.md)
371387

372-
------------------------------------------------------------------------
373-
*This lesson has been developed by members of the teaching team at the [Harvard Chan Bioinformatics Core (HBC)](http://bioinformatics.sph.harvard.edu/). These are open access materials distributed under the terms of the [Creative Commons Attribution license](https://creativecommons.org/licenses/by/4.0/) (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.*
374-
375-
*Some materials and hands-on activities were adapted from [RNA-seq workflow](http://www.bioconductor.org/help/workflows/rnaseqGene/#de) on the Bioconductor website*
376-
377-
------------------------------------------------------------------------
388+
| |
389+
|------------------------------------------------------------------------|
390+
| *This lesson has been developed by members of the teaching team at the [Harvard Chan Bioinformatics Core (HBC)](http://bioinformatics.sph.harvard.edu/). These are open access materials distributed under the terms of the [Creative Commons Attribution license](https://creativecommons.org/licenses/by/4.0/) (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.* |
391+
| *Some materials and hands-on activities were adapted from [RNA-seq workflow](http://www.bioconductor.org/help/workflows/rnaseqGene/#de) on the Bioconductor website* |

0 commit comments

Comments
 (0)