|
| 1 | +### We have prepared the RIMA reference folder with the current version we used. You can directly download from Iris server: |
| 2 | + |
| 3 | +``` |
| 4 | +# We have nearly 69G reference files |
| 5 | +
|
| 6 | +wget http://cistrome.org/~lyang/ref.tar.gz |
| 7 | +
|
| 8 | +``` |
| 9 | + |
| 10 | + |
| 11 | +# Steps to make your own custom reference files for RIMA pipeline |
| 12 | + |
| 13 | +### Reference fasta |
| 14 | + |
| 15 | +The human GDC hg38 fasta file is downloaded from [GDC website](https://gdc.cancer.gov/about-data/gdc-data-processing/gdc-reference-files). |
| 16 | + |
| 17 | +### Gene annotation file (gtf) |
| 18 | + |
| 19 | +The human gtf annotation file is downloaded from [GENCODE website](https://www.gencodegenes.org/human/). The current annotation file we used is V27. |
| 20 | + |
| 21 | + |
| 22 | +### build STAR index |
| 23 | + |
| 24 | +```bash |
| 25 | +conda activate rna |
| 26 | + |
| 27 | +## STAR Version: STAR_2.6.1d |
| 28 | +STAR --runThreadN 16 --runMode genomeGenerate --genomeDir ./ref_files/v27_index --genomeFastaFiles GRCh38.d1.vd1.CIDC.fa --sjdbGTFfile gencode.v27.annotation.gtf |
| 29 | +... |
| 30 | +00:04:54 ..... started STAR run |
| 31 | +00:04:54 ... starting to generate Genome files |
| 32 | +00:05:57 ... starting to sort Suffix Array. This may take a long time... |
| 33 | +00:06:11 ... sorting Suffix Array chunks and saving them to disk... |
| 34 | +00:17:43 ... loading chunks from disk, packing SA... |
| 35 | +00:19:31 ... finished generating suffix array |
| 36 | +00:19:31 ... generating Suffix Array index |
| 37 | +00:23:20 ... completed Suffix Array index |
| 38 | +00:23:20 ..... processing annotations GTF |
| 39 | +00:23:35 ..... inserting junctions into the genome indices |
| 40 | +00:26:49 ... writing Genome to disk ... |
| 41 | +00:27:06 ... writing Suffix Array to disk ... |
| 42 | +00:28:53 ... writing SAindex to disk |
| 43 | +00:29:05 ..... finished successfully |
| 44 | +``` |
| 45 | +### RSeQC reference files |
| 46 | + |
| 47 | +We download the human annotation bed file including the whole genome bed file, and house keeping bed file from RSeQC page from [sourcforge website](https://sourceforge.net/projects/rseqc/files/BED/Human_Homo_sapiens/). |
| 48 | + |
| 49 | +```bash |
| 50 | +./ref_files/refseqGenes.bed |
| 51 | +./ref_files/housekeeping_refseqGenes.bed |
| 52 | +``` |
| 53 | + |
| 54 | +### build salmon index |
| 55 | + |
| 56 | +```bash |
| 57 | +conda activate rna |
| 58 | + |
| 59 | +## salmon Version: salmon 1.1.0 |
| 60 | +salmon index -t GRCh38.d1.vd1.CIDC.fa -i salmon_index |
| 61 | + |
| 62 | +... |
| 63 | +index ["salmon_index"] did not previously exist . . . creating it |
| 64 | +[jLog] [info] building index |
| 65 | +[jointLog] [info] [Step 1 of 4] : counting k-mers |
| 66 | +[jointLog] [info] Replaced 164,553,847 non-ATCG nucleotides |
| 67 | +[jointLog] [info] Clipped poly-A tails from 0 transcripts |
| 68 | +[jointLog] [info] Building rank-select dictionary and saving to disk |
| 69 | +[jointLog] [info] done |
| 70 | +Elapsed time: 0.191866s |
| 71 | +[jointLog] [info] Writing sequence data to file . . . |
| 72 | +[jointLog] [info] done |
| 73 | +Elapsed time: 1.91244s |
| 74 | +[jointLog] [info] Building 64-bit suffix array (length of generalized text is 3,088,286,426) |
| 75 | +[jointLog] [info] Building suffix array . . . |
| 76 | +success |
| 77 | +saving to disk . . . done |
| 78 | +Elapsed time: 18.3072s |
| 79 | +done |
| 80 | +Elapsed time: 703.843s |
| 81 | +``` |
| 82 | +### GMT file for gene set analysis |
| 83 | + |
| 84 | +The GMT file is downloaded from [BROAD release page](https://data.broadinstitute.org/gsea-msigdb/msigdb/release/6.1/). The current GMT file we used is "c2.cp.kegg.v6.1.symbols.gmt" |
| 85 | + |
| 86 | + |
| 87 | +### STAR-Fusion genome resource lib |
| 88 | + |
| 89 | +The genome resource lib is downloaded from [BROAD release page](https://www.gencodegenes.org/human/). The current lib we used is GRCh38_v22_CTAT_lib. |
| 90 | + |
| 91 | +You can also prep it for use with STAR-fusion. |
| 92 | +More details, read: |
| 93 | + |
| 94 | +* https://github.com/STAR-Fusion/STAR-Fusion/wiki/installing-star-fusion |
| 95 | + |
| 96 | + |
| 97 | +### Centrifuge index |
| 98 | + |
| 99 | +The human Centrifuge index is downloaded from [Centrifuge website](http://www.ccb.jhu.edu/software/centrifuge/). The current index we used is p_compressed+h+v that includes human genome, prokaryotic genomes, and viral genomes. |
| 100 | + |
| 101 | +You can also build your own custom Centrifuge index. |
| 102 | +More details, read: |
| 103 | + |
| 104 | +* https://github.com/DaehwanKimLab/centrifuge |
| 105 | + |
| 106 | +### TRUST4 reference files |
| 107 | + |
| 108 | +TRUST4 reference files includes 1. TCR, BCR genomic sequence fasta file; 2. Reference database sequence containing annotation information. |
| 109 | + |
| 110 | +``` |
| 111 | +hg38_bcrtcr.fa |
| 112 | +human_IMGT+C.fa |
| 113 | +``` |
| 114 | +These reference files can directlt be downloaded from [TRUST4 github](https://github.com/liulab-dfci/TRUST4). |
| 115 | + |
| 116 | + |
0 commit comments