|
| 1 | +# How to Run RIMA on Kraken |
| 2 | + |
| 3 | +## RIMA Workflow |
| 4 | + |
| 5 | + |
| 6 | + |
| 7 | +## Available Tools Checklist |
| 8 | +| **Methods** | **Description** | **Available models**| |
| 9 | +| :---: | :---: | :---: | |
| 10 | +| STAR | Spliced Transcripts Alignment to a Reference | Human, Mouse | |
| 11 | +| Salmon | Gene Quantification | Human, Mouse | |
| 12 | +| RSeQC | High Throughput Sequence Data Evaluation | Human, Mouse | |
| 13 | +| batch_removal| Remove Batch Effects Using Limma | Human, Mouse | |
| 14 | +| | **---DIFFERENTIAL EXPRESSION---** | | |
| 15 | +| DESeq2 | Gene Differential Expression Analysis | Human, Mouse | |
| 16 | +| GSEA | Gene Set Enrichment Analysis | Human, Mouse | |
| 17 | +| ssGSEA | Single-sample GSEA | Human, Mouse | |
| 18 | +| | **---IMMUNE REPERTOIRE---** | | |
| 19 | +| TRUST4 | TCR and BCR Sequences Analysis | Human, Mouse | |
| 20 | +| | **---IMMUNE INFILTRATION---** | | |
| 21 | +| ImmuneDeconv | Cell Components Estimation | Human | |
| 22 | +| mMCP | Immune Cell Estimation from mouse | Mouse | |
| 23 | +| | **---IMMUNE RESPONSE---** | | |
| 24 | +| MSIsensor2 | Microsatellite Instability (MSI) Detection | Human | |
| 25 | +| | **---MICROBIOME---** | | |
| 26 | +| Centrifuge | Bacterial Abundance Detection | Human, Mouse | |
| 27 | +| PathSeq | Microbial sequences Detection | Human | |
| 28 | +| | **---NEO-ANTIGEN---** | | |
| 29 | +| arcasHLA | HLA Class I and Class II Genes Typing | Human | |
| 30 | +| | **---RIMA REPORT---** | | |
| 31 | +| report | RIMA HTML Report Using Multiqc | Human, Mouse | |
| 32 | + |
| 33 | +**Notice**: When you run Mouse data, do not run the tools that aren't available for **Mouse Model** (e.g. if you set ImmnueDeconv: true in the execution file, the results from ImmnueDeconv would be unreliable when you use Mouse data) |
| 34 | + |
| 35 | + |
| 36 | +## 1.Install |
| 37 | + |
| 38 | +You can not run any jobs on the login node, even conda install is not allowed. |
| 39 | + |
| 40 | + |
| 41 | +Download RIMA_pipeline folder to your own working directory. Currently, RIMA_kraken supports hg38 and mm10 data |
| 42 | +``` |
| 43 | +git clone https://github.com/Lindky/RIMA_Kraken.git |
| 44 | +``` |
| 45 | + |
| 46 | +## 2.Activate the RIMA enviroment |
| 47 | + |
| 48 | +``` |
| 49 | +export CONDA_ROOT=/liulab/linyang/rnaseq_env/miniconda3 |
| 50 | +export PATH=/liulab/linyang/rnaseq_env/miniconda3/bin:$PATH |
| 51 | +
|
| 52 | +source activate /liulab/linyang/rnaseq_env/miniconda3/envs/rnaseq |
| 53 | +``` |
| 54 | + |
| 55 | +## 3. Prepare the 2 required execution files (metasheet.txt, config.yaml) |
| 56 | + |
| 57 | +### 3.1 Example of metasheet |
| 58 | + |
| 59 | +Ensure your metasheet contains **Two Required Columns** (SampleName, PatName) in comma-delimited format. |
| 60 | +You can also add more phenotype information that you may want to compare e.g. columns for Responder, Age, Sex etc. |
| 61 | + |
| 62 | +``` |
| 63 | +SampleName,PatName,Responder,Age,Gender |
| 64 | +SRR8281231,P3,R,41,Male |
| 65 | +SRR8281224,P13,NR,55,Female |
| 66 | +SRR8281221,P20,NR,63,Male |
| 67 | +SRR8281223,P21,NR,59,Male |
| 68 | +...... |
| 69 | +``` |
| 70 | + |
| 71 | +### 3.2 Example of config.yaml |
| 72 | + |
| 73 | +In the rnaseq_pipeline folder, we have prepared a config.yaml for you. |
| 74 | + |
| 75 | +First, ensure the data info matches data in the **Data information** section: |
| 76 | + |
| 77 | +``` |
| 78 | +--- |
| 79 | +############################################################ |
| 80 | +# Data information # |
| 81 | +############################################################ |
| 82 | +
|
| 83 | +ref: ref.yaml |
| 84 | +assembly: hg38 #hg38 or mm10 |
| 85 | +
|
| 86 | +cancer_type: GBM #short name of cancer type |
| 87 | +metasheet: metasheet_latest.txt |
| 88 | +designs: [Responder] #the column from metasheet which is used to do comparsion |
| 89 | +############################################################ |
| 90 | +# level1 # |
| 91 | +############################################################ |
| 92 | +
|
| 93 | +### star |
| 94 | +#Possible values are [ff-firststrand, ff-secondstrand, ff-unstranded, fr-firststrand, fr-secondstrand, fr-unstranded (default), transfrags] |
| 95 | +library_type: 'fr-firststrand' |
| 96 | +stranded: true |
| 97 | +
|
| 98 | +
|
| 99 | +``` |
| 100 | +To see the details for each parameters: [parameters Interpretation](https://github.com/Lindky/RIMA_Kraken/blob/master/Parameters_description.md) |
| 101 | + |
| 102 | +Parameters within square brackets should be updated to match your analysis goals. In this tutorial, **[Responder]** is the phenotype of interest for comparison as specified in the **metasheet.txt**. All the comparison figures stored under **/images** folder |
| 103 | + |
| 104 | +``` |
| 105 | +############################################################ |
| 106 | +# list samples # |
| 107 | +############################################################ |
| 108 | +
|
| 109 | +samples: |
| 110 | + SRR8281228: |
| 111 | + - /mnt/zhao_trial/Zhao2019_PD1_Glioblastoma_RNASeq/SRR8281228_1.fastq.gz |
| 112 | + - /mnt/zhao_trial/Zhao2019_PD1_Glioblastoma_RNASeq/SRR8281228_2.fastq.gz |
| 113 | + SRR8281231: |
| 114 | + - /mnt/zhao_trial/Zhao2019_PD1_Glioblastoma_RNASeq/SRR8281231_1.fastq.gz |
| 115 | + - /mnt/zhao_trial/Zhao2019_PD1_Glioblastoma_RNASeq/SRR8281231_2.fastq.gz |
| 116 | +############################################################ |
| 117 | +# run settings # |
| 118 | +############################################################ |
| 119 | +runs: |
| 120 | + run1: |
| 121 | + - SRR8281228 |
| 122 | + run2: |
| 123 | + - SRR8281231 |
| 124 | +
|
| 125 | +``` |
| 126 | + |
| 127 | +Finally, set the path of your data in the **list samples** section and set the number of runs for each sample (samples' name must be consistent with your metasheet.txt) |
| 128 | + |
| 129 | +Currently, only **fastq files** are accepted as input (including fastq.gz). |
| 130 | + |
| 131 | +## 3. Choose the tools you want to run |
| 132 | + |
| 133 | +Use **execution.yaml** to control which tools to run in RIMA. Most downstream analysis require outputs from **DATA PROCESSING** module, so please run the **DATA PROCESSING** module first, then selecting which tools you want to use. |
| 134 | + |
| 135 | +Example of execution.yaml: |
| 136 | +``` |
| 137 | +##DATA PROCESSING |
| 138 | +star: false |
| 139 | +salmon: false |
| 140 | +rseqc: false |
| 141 | +
|
| 142 | +##DIFFERENTIAL EXPRESSION |
| 143 | +batch_removal: false |
| 144 | +deseq2: true |
| 145 | +gsea: false |
| 146 | +ssgsea: false |
| 147 | +
|
| 148 | +.... |
| 149 | +``` |
| 150 | + |
| 151 | +## 4.Execution |
| 152 | + |
| 153 | +### Step1: Check the pipeline with a dry run to ensure correct script and data usage. |
| 154 | + |
| 155 | +``` |
| 156 | +snakemake -s rnaseq.snakefile -np |
| 157 | +``` |
| 158 | +### Step2: submit the job. |
| 159 | + |
| 160 | +After the dry-run success, please use sbatch to submit the job or run it on the working node: |
| 161 | + |
| 162 | +### !!!Please not run RIMA on the interactive node! |
| 163 | + |
| 164 | +``` |
| 165 | +#!/bin/bash |
| 166 | +#SBATCH --job-name=RIMA |
| 167 | +#SBATCH --mem=32G # total memory need |
| 168 | +#SBATCH --time=96:00:00 |
| 169 | +#SBATCH -c 16 #number of cores |
| 170 | +
|
| 171 | +
|
| 172 | +snakemake -s rnaseq.snakefile |
| 173 | +``` |
| 174 | +**note**: Argument -j that set the cores for parallelly run. (e.g. '-j 4' can run 4 jobs parallelly at the same time) |
| 175 | + |
| 176 | +## 5.Output files |
| 177 | + |
| 178 | +The raw output files from each tools are stored under `RIMA_kraken/analysis` |
| 179 | + |
| 180 | +The processed output files (esed for visualization) are stored under `RIMA_kraken/files` |
| 181 | + |
| 182 | +The output figures are stored under `RIMA_kraken/images` |
| 183 | + |
| 184 | +**RIMA html report** will be stored under `RIMA_kraken/files` |
| 185 | + |
0 commit comments