Skip to content

Commit 8f8d002

Browse files
author
Lin Yang
committed
current RIMA Kraken version
1 parent f20c093 commit 8f8d002

File tree

100 files changed

+168661
-3353
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

100 files changed

+168661
-3353
lines changed

RIMA.snakefile

Lines changed: 0 additions & 167 deletions
This file was deleted.

RIMA_environment.sh

Lines changed: 0 additions & 49 deletions
This file was deleted.

Reference_Markdown.md

Lines changed: 116 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,116 @@
1+
### We have prepared the RIMA reference folder with the current version we used. You can directly download from Iris server:
2+
3+
```
4+
# We have nearly 69G reference files
5+
6+
wget http://cistrome.org/~lyang/ref.tar.gz
7+
8+
```
9+
10+
11+
# Steps to make your own custom reference files for RIMA pipeline
12+
13+
### Reference fasta
14+
15+
The human GDC hg38 fasta file is downloaded from [GDC website](https://gdc.cancer.gov/about-data/gdc-data-processing/gdc-reference-files).
16+
17+
### Gene annotation file (gtf)
18+
19+
The human gtf annotation file is downloaded from [GENCODE website](https://www.gencodegenes.org/human/). The current annotation file we used is V27.
20+
21+
22+
### build STAR index
23+
24+
```bash
25+
conda activate rna
26+
27+
## STAR Version: STAR_2.6.1d
28+
STAR --runThreadN 16 --runMode genomeGenerate --genomeDir ./ref_files/v27_index --genomeFastaFiles GRCh38.d1.vd1.CIDC.fa --sjdbGTFfile gencode.v27.annotation.gtf
29+
...
30+
00:04:54 ..... started STAR run
31+
00:04:54 ... starting to generate Genome files
32+
00:05:57 ... starting to sort Suffix Array. This may take a long time...
33+
00:06:11 ... sorting Suffix Array chunks and saving them to disk...
34+
00:17:43 ... loading chunks from disk, packing SA...
35+
00:19:31 ... finished generating suffix array
36+
00:19:31 ... generating Suffix Array index
37+
00:23:20 ... completed Suffix Array index
38+
00:23:20 ..... processing annotations GTF
39+
00:23:35 ..... inserting junctions into the genome indices
40+
00:26:49 ... writing Genome to disk ...
41+
00:27:06 ... writing Suffix Array to disk ...
42+
00:28:53 ... writing SAindex to disk
43+
00:29:05 ..... finished successfully
44+
```
45+
### RSeQC reference files
46+
47+
We download the human annotation bed file including the whole genome bed file, and house keeping bed file from RSeQC page from [sourcforge website](https://sourceforge.net/projects/rseqc/files/BED/Human_Homo_sapiens/).
48+
49+
```bash
50+
./ref_files/refseqGenes.bed
51+
./ref_files/housekeeping_refseqGenes.bed
52+
```
53+
54+
### build salmon index
55+
56+
```bash
57+
conda activate rna
58+
59+
## salmon Version: salmon 1.1.0
60+
salmon index -t GRCh38.d1.vd1.CIDC.fa -i salmon_index
61+
62+
...
63+
index ["salmon_index"] did not previously exist . . . creating it
64+
[jLog] [info] building index
65+
[jointLog] [info] [Step 1 of 4] : counting k-mers
66+
[jointLog] [info] Replaced 164,553,847 non-ATCG nucleotides
67+
[jointLog] [info] Clipped poly-A tails from 0 transcripts
68+
[jointLog] [info] Building rank-select dictionary and saving to disk
69+
[jointLog] [info] done
70+
Elapsed time: 0.191866s
71+
[jointLog] [info] Writing sequence data to file . . .
72+
[jointLog] [info] done
73+
Elapsed time: 1.91244s
74+
[jointLog] [info] Building 64-bit suffix array (length of generalized text is 3,088,286,426)
75+
[jointLog] [info] Building suffix array . . .
76+
success
77+
saving to disk . . . done
78+
Elapsed time: 18.3072s
79+
done
80+
Elapsed time: 703.843s
81+
```
82+
### GMT file for gene set analysis
83+
84+
The GMT file is downloaded from [BROAD release page](https://data.broadinstitute.org/gsea-msigdb/msigdb/release/6.1/). The current GMT file we used is "c2.cp.kegg.v6.1.symbols.gmt"
85+
86+
87+
### STAR-Fusion genome resource lib
88+
89+
The genome resource lib is downloaded from [BROAD release page](https://www.gencodegenes.org/human/). The current lib we used is GRCh38_v22_CTAT_lib.
90+
91+
You can also prep it for use with STAR-fusion.
92+
More details, read:
93+
94+
* https://github.com/STAR-Fusion/STAR-Fusion/wiki/installing-star-fusion
95+
96+
97+
### Centrifuge index
98+
99+
The human Centrifuge index is downloaded from [Centrifuge website](http://www.ccb.jhu.edu/software/centrifuge/). The current index we used is p_compressed+h+v that includes human genome, prokaryotic genomes, and viral genomes.
100+
101+
You can also build your own custom Centrifuge index.
102+
More details, read:
103+
104+
* https://github.com/DaehwanKimLab/centrifuge
105+
106+
### TRUST4 reference files
107+
108+
TRUST4 reference files includes 1. TCR, BCR genomic sequence fasta file; 2. Reference database sequence containing annotation information.
109+
110+
```
111+
hg38_bcrtcr.fa
112+
human_IMGT+C.fa
113+
```
114+
These reference files can directlt be downloaded from [TRUST4 github](https://github.com/liulab-dfci/TRUST4).
115+
116+

0 commit comments

Comments
 (0)