Skip to content

Commit 0c4c4b7

Browse files
author
Lin Yang
committed
current RIMA Kraken Version
1 parent 8f8d002 commit 0c4c4b7

16 files changed

+1929
-0
lines changed

README.md

Lines changed: 185 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,185 @@
1+
# How to Run RIMA on Kraken
2+
3+
## RIMA Workflow
4+
![](https://github.com/Lindky/RIMA_Kraken/blob/master/files/multiqc/Pipeline_Workflow_mqc.png)
5+
6+
7+
## Available Tools Checklist
8+
| **Methods** | **Description** | **Available models**|
9+
| :---: | :---: | :---: |
10+
| STAR | Spliced Transcripts Alignment to a Reference | Human, Mouse |
11+
| Salmon | Gene Quantification | Human, Mouse |
12+
| RSeQC | High Throughput Sequence Data Evaluation | Human, Mouse |
13+
| batch_removal| Remove Batch Effects Using Limma | Human, Mouse |
14+
| | **---DIFFERENTIAL EXPRESSION---** | |
15+
| DESeq2 | Gene Differential Expression Analysis | Human, Mouse |
16+
| GSEA | Gene Set Enrichment Analysis | Human, Mouse |
17+
| ssGSEA | Single-sample GSEA | Human, Mouse |
18+
| | **---IMMUNE REPERTOIRE---** | |
19+
| TRUST4 | TCR and BCR Sequences Analysis | Human, Mouse |
20+
| | **---IMMUNE INFILTRATION---** | |
21+
| ImmuneDeconv | Cell Components Estimation | Human |
22+
| mMCP | Immune Cell Estimation from mouse | Mouse |
23+
| | **---IMMUNE RESPONSE---** | |
24+
| MSIsensor2 | Microsatellite Instability (MSI) Detection | Human |
25+
| | **---MICROBIOME---** | |
26+
| Centrifuge | Bacterial Abundance Detection | Human, Mouse |
27+
| PathSeq | Microbial sequences Detection | Human |
28+
| | **---NEO-ANTIGEN---** | |
29+
| arcasHLA | HLA Class I and Class II Genes Typing | Human |
30+
| | **---RIMA REPORT---** | |
31+
| report | RIMA HTML Report Using Multiqc | Human, Mouse |
32+
33+
**Notice**: When you run Mouse data, do not run the tools that aren't available for **Mouse Model** (e.g. if you set ImmnueDeconv: true in the execution file, the results from ImmnueDeconv would be unreliable when you use Mouse data)
34+
35+
36+
## 1.Install
37+
38+
You can not run any jobs on the login node, even conda install is not allowed.
39+
40+
41+
Download RIMA_pipeline folder to your own working directory. Currently, RIMA_kraken supports hg38 and mm10 data
42+
```
43+
git clone https://github.com/Lindky/RIMA_Kraken.git
44+
```
45+
46+
## 2.Activate the RIMA enviroment
47+
48+
```
49+
export CONDA_ROOT=/liulab/linyang/rnaseq_env/miniconda3
50+
export PATH=/liulab/linyang/rnaseq_env/miniconda3/bin:$PATH
51+
52+
source activate /liulab/linyang/rnaseq_env/miniconda3/envs/rnaseq
53+
```
54+
55+
## 3. Prepare the 2 required execution files (metasheet.txt, config.yaml)
56+
57+
### 3.1 Example of metasheet
58+
59+
Ensure your metasheet contains **Two Required Columns** (SampleName, PatName) in comma-delimited format.
60+
You can also add more phenotype information that you may want to compare e.g. columns for Responder, Age, Sex etc.
61+
62+
```
63+
SampleName,PatName,Responder,Age,Gender
64+
SRR8281231,P3,R,41,Male
65+
SRR8281224,P13,NR,55,Female
66+
SRR8281221,P20,NR,63,Male
67+
SRR8281223,P21,NR,59,Male
68+
......
69+
```
70+
71+
### 3.2 Example of config.yaml
72+
73+
In the rnaseq_pipeline folder, we have prepared a config.yaml for you.
74+
75+
First, ensure the data info matches data in the **Data information** section:
76+
77+
```
78+
---
79+
############################################################
80+
# Data information #
81+
############################################################
82+
83+
ref: ref.yaml
84+
assembly: hg38 #hg38 or mm10
85+
86+
cancer_type: GBM #short name of cancer type
87+
metasheet: metasheet_latest.txt
88+
designs: [Responder] #the column from metasheet which is used to do comparsion
89+
############################################################
90+
# level1 #
91+
############################################################
92+
93+
### star
94+
#Possible values are [ff-firststrand, ff-secondstrand, ff-unstranded, fr-firststrand, fr-secondstrand, fr-unstranded (default), transfrags]
95+
library_type: 'fr-firststrand'
96+
stranded: true
97+
98+
99+
```
100+
To see the details for each parameters: [parameters Interpretation](https://github.com/Lindky/RIMA_Kraken/blob/master/Parameters_description.md)
101+
102+
Parameters within square brackets should be updated to match your analysis goals. In this tutorial, **[Responder]** is the phenotype of interest for comparison as specified in the **metasheet.txt**. All the comparison figures stored under **/images** folder
103+
104+
```
105+
############################################################
106+
# list samples #
107+
############################################################
108+
109+
samples:
110+
SRR8281228:
111+
- /mnt/zhao_trial/Zhao2019_PD1_Glioblastoma_RNASeq/SRR8281228_1.fastq.gz
112+
- /mnt/zhao_trial/Zhao2019_PD1_Glioblastoma_RNASeq/SRR8281228_2.fastq.gz
113+
SRR8281231:
114+
- /mnt/zhao_trial/Zhao2019_PD1_Glioblastoma_RNASeq/SRR8281231_1.fastq.gz
115+
- /mnt/zhao_trial/Zhao2019_PD1_Glioblastoma_RNASeq/SRR8281231_2.fastq.gz
116+
############################################################
117+
# run settings #
118+
############################################################
119+
runs:
120+
run1:
121+
- SRR8281228
122+
run2:
123+
- SRR8281231
124+
125+
```
126+
127+
Finally, set the path of your data in the **list samples** section and set the number of runs for each sample (samples' name must be consistent with your metasheet.txt)
128+
129+
Currently, only **fastq files** are accepted as input (including fastq.gz).
130+
131+
## 3. Choose the tools you want to run
132+
133+
Use **execution.yaml** to control which tools to run in RIMA. Most downstream analysis require outputs from **DATA PROCESSING** module, so please run the **DATA PROCESSING** module first, then selecting which tools you want to use.
134+
135+
Example of execution.yaml:
136+
```
137+
##DATA PROCESSING
138+
star: false
139+
salmon: false
140+
rseqc: false
141+
142+
##DIFFERENTIAL EXPRESSION
143+
batch_removal: false
144+
deseq2: true
145+
gsea: false
146+
ssgsea: false
147+
148+
....
149+
```
150+
151+
## 4.Execution
152+
153+
### Step1: Check the pipeline with a dry run to ensure correct script and data usage.
154+
155+
```
156+
snakemake -s rnaseq.snakefile -np
157+
```
158+
### Step2: submit the job.
159+
160+
After the dry-run success, please use sbatch to submit the job or run it on the working node:
161+
162+
### !!!Please not run RIMA on the interactive node!
163+
164+
```
165+
#!/bin/bash
166+
#SBATCH --job-name=RIMA
167+
#SBATCH --mem=32G # total memory need
168+
#SBATCH --time=96:00:00
169+
#SBATCH -c 16 #number of cores
170+
171+
172+
snakemake -s rnaseq.snakefile
173+
```
174+
**note**: Argument -j that set the cores for parallelly run. (e.g. '-j 4' can run 4 jobs parallelly at the same time)
175+
176+
## 5.Output files
177+
178+
The raw output files from each tools are stored under `RIMA_kraken/analysis`
179+
180+
The processed output files (esed for visualization) are stored under `RIMA_kraken/files`
181+
182+
The output figures are stored under `RIMA_kraken/images`
183+
184+
**RIMA html report** will be stored under `RIMA_kraken/files`
185+

0 commit comments

Comments
 (0)