Skip to content

Commit 3328972

Browse files
authored
Merge pull request #114 from NICHD-BSPC/sally-edits-may2025
Week 6 lesson 01-04 scripts
2 parents ceb6ec5 + 3e21ce7 commit 3328972

File tree

4 files changed

+79
-9
lines changed

4 files changed

+79
-9
lines changed

lessons/wk6_lesson02_count_normalization.md

Lines changed: 19 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,18 @@
11
---
22
title: "Count normalization with DESeq2"
33
author: "Harvard HPC Staff, Adapted by Sally Chang @ NICHD"
4-
date: "Last Modified April 2025"
4+
date: "Last Modified May 2025"
55
---
66

77
Approximate time: 60 minutes
88

9-
### NOTE:
10-
To make names more generalized for the next course, `/data/Bspc-training/shared/rnaseq_jan2025` is now `/data/Bspc-training/shared/rnaseq_mov10` . Make sure to edit any scripts that refer to the shared data!
11-
129
## Learning Objectives
1310

1411
- Explore different types of normalization methods
1512
- Become familiar with the `DESeqDataSet` object
1613
- Understand how to normalize counts using DESeq2
1714

18-
### Opening the project using RStudio (HPC on Demand)
15+
### Preparing for this lesson:
1916

2017
Before we get into the details of the analysis, let's start by:
2118

@@ -25,6 +22,23 @@ Before we get into the details of the analysis, let's start by:
2522

2623
- Using the Project menu in the top right corner, or the Files Pane window (clicking rnaseq -\> DEanalysis), to navigate to and open `DEanalysis.Rproj`, which you set up as an Assignment last week.
2724

25+
If you missed the last lesson, or need to make sure you have the right packages and data loaded, please run the following commands:
26+
27+
``` r
28+
# Setup
29+
# Bioconductor and CRAN libraries used - already installed on Biowulf
30+
library(tidyverse)
31+
library(RColorBrewer)
32+
library(DESeq2)
33+
library(pheatmap)
34+
library(BiocManager)
35+
36+
# Load in data
37+
data <- read.table("data/mov10_AllSamples_featurecounts.Rmatrix.txt", header=T, row.names=1)
38+
39+
meta <- read.table("data/mov10_AllSamples_metadata.txt", header=T, row.names=1)
40+
```
41+
2842
## Normalization
2943

3044
The first step in the DE analysis workflow is count normalization, which is necessary to make accurate comparisons of gene expression between samples.

lessons/wk6_lesson03_dge_qc_analysis.md

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -157,9 +157,9 @@ Now that we have a good understanding of the QC steps normally employed for RNA-
157157

158158
1.Get your HPC On Demand session going:
159159

160-
- Opening up RStudio using [HPC on Demand](https://hpcondemand.nih.gov/pun/sys/dashboard/), using default values except for Starting Directory: `/data/Bspc-training/YOUR_USERNAME/rnaseq`
160+
- Opening up RStudio using [HPC on Demand](https://hpcondemand.nih.gov/pun/sys/dashboard/), using default values except for Starting Directory: `/data/YOUR_USERNAME/rnaseq`
161161

162-
- To check whether or not you are in the correct working directory, use `getwd()`. Something like `/vf/users/Bspc-training/changes/rnaseq` should come up.
162+
- To check whether or not you are in the correct working directory, use `getwd()`. Something like `/vf/users/changes/rnaseq` should come up.
163163

164164
- Using the Project menu in the top right corner, or the Files Pane window (clicking rnaseq -\> DEanalysis), to navigate to and open `DEanalysis.Rproj`
165165

@@ -223,7 +223,9 @@ plotPCA(rld, intgroup="sampletype")
223223
```
224224

225225
<p align="center">
226-
<img src="../img/mov10_pca.png" width="600" alt="mov 10 experiement PCA"/>
226+
227+
<img src="../img/mov10_pca.png" alt="mov 10 experiement PCA" width="600"/>
228+
227229
</p>
228230

229231
------------------------------------------------------------------------
@@ -296,7 +298,9 @@ pheatmap(rld_cor, annotation = meta)
296298
When you plot using `pheatmap()` the hierarchical clustering information is used to place similar samples together and this information is represented by the tree structure along the axes. The `annotation` argument accepts a dataframe as input, in our case it is the `meta` data frame.
297299

298300
<p align="center">
299-
<img src="../img/mov10_default_heatmap.png" width="600" alt="mov10 heatmap with default settings"/>
301+
302+
<img src="../img/mov10_default_heatmap.png" alt="mov10 heatmap with default settings" width="600"/>
303+
300304
</p>
301305

302306
Overall, we observe pretty high correlations across the board ( \> 0.999) suggesting no outlying sample(s). Also, similar to the PCA plot you see the samples clustering together by sample group. Together, these plots suggest to us that the data are of good quality and we have the green light to proceed to differential expression analysis.

lessons/wk6_lesson04_design_formulas.md

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,28 @@ Approximate time: 30 minutes
1111
- Demonstrate the use of the design formula with simple and complex designs
1212
- Construct R code to execute the differential expression analysis workflow with DESeq2
1313

14+
## Catch-up Script:
15+
16+
If you need to be completely caught up, you can copy and paste the following into an R Script and run it. If you don't already have the files in your `/data` directory, please see [Wk 5 Lesson 01](../wk5_lesson01_introR_Rstudio.md) for instructions on where to obtain the input files.
17+
18+
``` r
19+
# Setup
20+
# Bioconductor and CRAN libraries used - already installed on Biowulf
21+
library(tidyverse)
22+
library(RColorBrewer)
23+
library(DESeq2)
24+
library(pheatmap)
25+
library(BiocManager)
26+
27+
# Load in data
28+
data <- read.table("data/mov10_AllSamples_featurecounts.Rmatrix.txt", header=T, row.names=1)
29+
30+
meta <- read.table("data/mov10_AllSamples_metadata.txt", header=T, row.names=1)
31+
32+
# Create DESeq2Dataset object
33+
dds <- DESeqDataSetFromMatrix(countData = data, colData = meta, design = ~ sampletype)
34+
```
35+
1436
# Differential expression analysis with DESeq2
1537

1638
The final step in the differential expression analysis workflow is **fitting the raw counts to the NB model and performing the statistical test** for differentially expressed genes. In this step we essentially want to determine whether the mean expression levels of different sample groups are significantly different.
@@ -168,3 +190,28 @@ Given the the metadata table you have sent me for your own experiment, do the fo
168190

169191
1. Write a design formula for your experiment, in the format of `design = ~ sex + age + treatment` . Make sure to include any interaction terms or terms that you want to "regress" out. There are additional recommendations for complex designs in the [DESeq2 vignette](https://www.bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#interactions).
170192
2. Briefly explain (in 1-2 sentences) the reasoning for this design formula.
193+
194+
## Your DE script
195+
196+
In this lesson, we took the additional step of running the actual DESeq2 analysis. Your `de_script.R` should now contain the following commands to re-create necessary data objects (click to show):
197+
198+
``` r
199+
# Setup
200+
# Bioconductor and CRAN libraries used - already installed on Biowulf
201+
library(tidyverse)
202+
library(RColorBrewer)
203+
library(DESeq2)
204+
library(pheatmap)
205+
library(BiocManager)
206+
207+
# Load in data
208+
data <- read.table("data/mov10_AllSamples_featurecounts.Rmatrix.txt", header=T, row.names=1)
209+
210+
meta <- read.table("data/mov10_AllSamples_metadata.txt", header=T, row.names=1)
211+
212+
# Create DESeq2Dataset object
213+
dds <- DESeqDataSetFromMatrix(countData = data, colData = meta, design = ~ sampletype)
214+
215+
# Run DESeq2 on DESeq2Dataset object
216+
dds <- DESeq(dds)
217+
```

scripts/de_analysis_wk6_lesson01.R

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,3 +11,8 @@ data <- read.table("data/mov10_AllSamples_featurecounts.Rmatrix.txt", header=T,
1111

1212
meta <- read.table("data/mov10_AllSamples_metadata.txt", header=T, row.names=1)
1313

14+
# Create DESeq2Dataset object
15+
dds <- DESeqDataSetFromMatrix(countData = data, colData = meta, design = ~ sampletype)
16+
17+
# Run DESeq2 on DESeq2Dataset object
18+
dds <- DESeq(dds)

0 commit comments

Comments
 (0)