Skip to content

Commit 56dec23

Browse files
Merge pull request #5 from jonathangoeke/master
updated README with simplified code
2 parents bbdca57 + c7c70ff commit 56dec23

File tree

2 files changed

+88
-160
lines changed

2 files changed

+88
-160
lines changed

README.Rmd

Lines changed: 44 additions & 82 deletions
Original file line numberDiff line numberDiff line change
@@ -17,122 +17,85 @@ knitr::opts_chunk$set(
1717
<!-- badges: start -->
1818
<!-- badges: end -->
1919

20-
proActiv is an R package that estimates promoter activity from RNA-Seq data. proActiv uses aligned reads and genome annotations as input, and provides absolute and relative promoter activity as output. The package can be used to identify active promoters and alternative promoters, the details of the method are described at https://doi.org/10.1101/176487.
20+
proActiv is an R package that estimates promoter activity from RNA-Seq data. proActiv uses aligned reads and genome annotations as input, and provides absolute and relative promoter activity as output. The package can be used to identify active promoters and alternative promoters, the details of the method are described in [Demircioglu et al (2019)](https://www.cell.com/cell/fulltext/S0092-8674(19)30906-7).
2121

22-
Additional data on differential promoters in tissues and cancers can be downloaded here: https://jglab.org/data-and-software/
22+
Additional data on differential promoters in tissues and cancers from TCGA, ICGC, GTEx, and PCAWG can be downloaded here: https://jglab.org/data-and-software/
2323

2424
### Installation
2525

26-
proActiv can be installed from [GitHub](https://github.com/) with:
26+
proActiv can be installed from GitHub with:
2727

2828
``` r
2929
library("devtools")
3030
devtools::install_github("GoekeLab/proActiv")
3131
```
32-
### Annotation and Example Data
3332

34-
Pre-calculated promoter annotation data for Gencode v19 (GRCh37) is available as part of the proActiv package. The PromoterAnnotation object has 4 slots:
33+
### Estimate Promoter Activity (after TopHat2 or STAR alignment)
3534

36-
- reducedExonRanges : The reduced first exon ranges for each promoter with promoter metadata for Gencode v19
37-
- promoterIdMapping : The id mapping between transcript ids, names, TSS ids, promoter ids and gene ids for Gencode v19
38-
- annotatedIntronRanges : The intron ranges annotated with the promoter information for Gencode v19
39-
- promoterCoordinates : Promoter coordinates (TSS) with gene id and internal promoter state for Gencode v19
40-
41-
Example junction files as produced by TopHat2 and STAR are available as external data. The reference genome used for alignment is Gencode v19 (GRCh37).
42-
The TopHat2 and STAR example files (5 files each) can be found at 'extdata/tophat2' and 'extdata/star' folders respectively.
43-
44-
Example TopHat2 files:
45-
46-
- extdata/tophat2/sample1.bed
47-
- extdata/tophat2/sample2.bed
48-
- extdata/tophat2/sample3.bed
49-
- extdata/tophat2/sample4.bed
50-
- extdata/tophat2/sample5.bed
51-
52-
Example STAR files:
53-
54-
- extdata/tophat2/sample1.junctions
55-
- extdata/tophat2/sample2.junctions
56-
- extdata/tophat2/sample3.junctions
57-
- extdata/tophat2/sample4.junctions
58-
- extdata/tophat2/sample5.junctions
59-
60-
### Estimate Promoter Activity (TopHat2 alignment)
61-
62-
This is a basic example to estimate promoter activity from a set of RNA-Seq data which was aligned with TopHat2. proActiv will use the junction file from the TopHat2 alignment (see below for an example with STAR-aligned reads), and a set of annotation objects that describe the associations of promoters, transcripts, and genes, to calculate promoter activity.
35+
This is a basic example to estimate promoter activity from a set of RNA-Seq data which was aligned with TopHat2 (or STAR). proActiv will use the junction file from the TopHat2 (STAR) alignment, and a set of annotation objects that describe the associations of promoters, transcripts, and genes, to calculate promoter activity.
6336

6437

6538
```{r, eval = FALSE}
6639
library(proActiv)
6740
68-
# Preprocessed data is available as part of the package for the human genome (hg19):
69-
# Available data: proActiv::promoterAnnotationData.gencode.v19
70-
71-
### TopHat2 Junction Files Example
41+
# Preprocessed annotations are available as part of the R package for the human genome (hg19):
42+
# proActiv::promoterAnnotationData.gencode.v19
7243
7344
# The paths and labels for samples
45+
junctionFiles <- list.files(system.file('extdata/tophat2', package = 'proActiv'), full.names = TRUE)
46+
47+
# for STAR alignment
48+
# junctionFiles <- list.files(system.file('extdata/star', package = 'proActiv'), full.names = TRUE)
7449
75-
tophatJunctionFiles <- list.files(system.file('extdata/tophat2', package = 'proActiv'), full.names = TRUE)
76-
tophatJunctionFileLabels <- paste0('s', 1:length(tophatJunctionFiles), '-tophat')
50+
junctionFileLabels <- paste0('s', 1:length(junctionFiles))
7751
7852
# Count the total number of junction reads for each promoter
79-
promoterCounts.tophat <- calculatePromoterReadCounts(proActiv::promoterAnnotationData.gencode.v19,
80-
junctionFilePaths = tophatJunctionFiles,
81-
junctionFileLabels = tophatJunctionFileLabels,
82-
junctionType = 'tophat')
53+
promoterCounts <- calculatePromoterReadCounts(proActiv::promoterAnnotationData.gencode.v19,
54+
junctionFilePaths = junctionFiles,
55+
junctionFileLabels = junctionFileLabels,
56+
junctionType = 'tophat') # use junctionType = 'star' for STAR aligned reads
8357
8458
# Normalize promoter read counts by DESeq2 (optional)
85-
normalizedPromoterCounts.tophat <- normalizePromoterReadCounts(promoterCounts.tophat)
59+
normalizedPromoterCounts <- normalizePromoterReadCounts(promoterCounts)
8660
8761
# Calculate absolute promoter activity
88-
absolutePromoterActivity.tophat <- getAbsolutePromoterActivity(normalizedPromoterCounts.tophat,
62+
absolutePromoterActivity <- getAbsolutePromoterActivity(normalizedPromoterCounts,
8963
proActiv::promoterAnnotationData.gencode.v19)
9064
# Calculate gene expression
91-
geneExpression.tophat <- getGeneExpression(absolutePromoterActivity.tophat)
65+
geneExpression <- getGeneExpression(absolutePromoterActivity)
9266
9367
# Calculate relative promoter activity
94-
relativePromoterActivity.tophat <- getRelativePromoterActivity(absolutePromoterActivity.tophat,
95-
geneExpression.tophat)
68+
relativePromoterActivity <- getRelativePromoterActivity(absolutePromoterActivity,
69+
geneExpression)
9670
9771
```
72+
### Annotation and Example Data
9873

74+
Pre-calculated promoter annotation data for Gencode v19 (GRCh37) is available as part of the proActiv package. The PromoterAnnotation object has 4 slots:
9975

100-
### Estimate Promoter Activity (STAR alignment)
101-
102-
```{r, eval = FALSE}
103-
library(proActiv)
104-
105-
# Preprocessed data is available as part of the package for the human genome (hg19):
106-
# Available data: proActiv::promoterAnnotationData.gencode.v19
107-
108-
### STAR Junction Files Example
109-
110-
# The paths and labels for samples
111-
starJunctionFiles <- list.files(system.file('extdata/star', package = 'proActiv'), full.names = TRUE)
112-
starJunctionFileLabels <- paste0('s', 1:length(starJunctionFiles), '-star')
113-
114-
# Count the total number of junction reads for each promoter
115-
promoterCounts.star <- calculatePromoterReadCounts(proActiv::promoterAnnotationData.gencode.v19,
116-
junctionFilePaths = starJunctionFiles,
117-
junctionFileLabels = starJunctionFileLabels,
118-
junctionType = 'star')
119-
120-
# Normalize promoter read counts by DESeq2 (optional)
121-
normalizedPromoterCounts.star <- normalizePromoterReadCounts(promoterCounts.star)
76+
- reducedExonRanges : The reduced first exon ranges for each promoter with promoter metadata for Gencode v19
77+
- promoterIdMapping : The id mapping between transcript ids, names, TSS ids, promoter ids and gene ids for Gencode v19
78+
- annotatedIntronRanges : The intron ranges annotated with the promoter information for Gencode v19
79+
- promoterCoordinates : Promoter coordinates (TSS) with gene id and internal promoter state for Gencode v19
12280

123-
# Calculate absolute promoter activity
124-
absolutePromoterActivity.star <- getAbsolutePromoterActivity(normalizedPromoterCounts.star,
125-
proActiv::promoterAnnotationData.gencode.v19)
81+
Example junction files as produced by TopHat2 and STAR are available as external data. The reference genome used for alignment is Gencode v19 (GRCh37).
82+
The TopHat2 and STAR example files (5 files each) can be found at 'extdata/tophat2' and 'extdata/star' folders respectively.
12683

127-
# Calculate gene expression
128-
geneExpression.star <- getGeneExpression(absolutePromoterActivity.star)
84+
Example TopHat2 files:
12985

130-
# Calculate relative promoter activity
131-
relativePromoterActivity.star <- getRelativePromoterActivity(absolutePromoterActivity.star,
132-
geneExpression.star)
86+
- extdata/tophat2/sample1.bed
87+
- extdata/tophat2/sample2.bed
88+
- extdata/tophat2/sample3.bed
89+
- extdata/tophat2/sample4.bed
90+
- extdata/tophat2/sample5.bed
13391

134-
```
92+
Example STAR files:
13593

94+
- extdata/tophat2/sample1.junctions
95+
- extdata/tophat2/sample2.junctions
96+
- extdata/tophat2/sample3.junctions
97+
- extdata/tophat2/sample4.junctions
98+
- extdata/tophat2/sample5.junctions
13699

137100
### Creating your own promoter annotations
138101
proActiv provides functions to create promoter annotation objects for any genome. Here we describe how the annotation can be created using a TxDb object (please see the TxDb documentation for how to create annotations from a GTF file).
@@ -156,9 +119,7 @@ species <- 'Homo_sapiens'
156119
numberOfCores <- 1
157120
158121
### Annotation data preparation
159-
### Needs to be executed once per annotation. Results can be saved and loaded later for reuse
160-
161-
promoterAnnotationData <- preparePromoterAnnotationData(txdb, species = 'Homo_sapiens', numberOfCores = 1)
122+
promoterAnnotationData <- preparePromoterAnnotationData(txdb, species = species, numberOfCores = numberOfCores)
162123
163124
# Retrieve the id mapping between transcripts, TSSs, promoters and genes
164125
head(promoterIdMapping(promoterAnnotationData))
@@ -176,7 +137,8 @@ proActiv will not provide promoter activity estimates for promoters which are no
176137
## Citing proActiv
177138

178139
If you use proActiv, please cite:
179-
Demircioğlu, Deniz, et al. "A Pan-Cancer Transcriptome Analysis Reveals Pervasive Regulation through Tumor-Associated Alternative Promoters." bioRxiv (2018): 176487.
140+
141+
Demircioğlu, Deniz, et al. "A Pan-cancer Transcriptome Analysis Reveals Pervasive Regulation through Alternative Promoters." *Cell* 178.6 (2019): 1465-1477.
180142

181143
## Contributors
182144

0 commit comments

Comments
 (0)