You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/workflow.md
+31-9Lines changed: 31 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,23 +2,20 @@
2
2
3
3
This is an example workflow for running RegTools on a cohort of samples. This analysis requires that there be a VCF and RNA bam file for each sample. The workflow described below was used to run our own analysis on TCGA data.
4
4
5
-
By the end of the analysis, the directory structure should look like the example below. The `*` in the example below refers to the tag/parameter used to run `regtools cis-splice-effects identify` with.
5
+
By the end of the analysis, the directory structure should look like the example below. The `*` in the example below refers to the tag/parameter used to run `regtools cis-splice-effects identify` with. At the bottom of this page, we provide a description of each of these files.
6
6
7
7
```bash
8
8
- Project/
9
9
- all_splicing_variants*.bed
10
-
- paths.tsv
11
-
- make_vcfs.sh
12
10
- dir_names.tsv
13
11
- variants_all_sorted.vcf.gz
14
12
- variants_all_sorted.vcf.gz.tbi
15
13
- samples/
16
14
- Sample_1/
17
15
- tumor_rna_alignments.bam
18
16
- tumor_rna_alignments.bam.bai
19
-
- variants.per_gene.vep.vcf.gz
20
-
- variants.per_gene.vep.vcf.gz.tbi
21
-
- variants.ensembl
17
+
- variants.vcf.gz
18
+
- variants.vcf.gz.tbi
22
19
- logs/
23
20
- output/
24
21
- cse_identify_filtered_*
@@ -27,9 +24,8 @@ By the end of the analysis, the directory structure should look like the example
***`all_splicing_variants*.bed`** - a file containing all of the variants that regtools identified as being associated with a junction for the given parameters used to run `cis-splice-effects identify`.
107
+
***`dir_names.tsv`** - a file containing a list of each of the sample directories with each directory on a new line. This can be obtained by using `ls samples/ > dir_names.tsv`. For this example, it would look like:
108
+
109
+
```bash
110
+
Sample_1
111
+
Sample_2
112
+
```
113
+
114
+
***`variants_all_sorted.vcf.gz`** - a compressed vcf file containing all variants from all samples.
115
+
***`variants_all_sorted.vcf.gz.tbi`** - an index file for the vcf file mentioned above.
116
+
***`samples/`** - a directory containing each of the samples to be analyzed alongside each other.
117
+
***`Sample_1/`** - a sample directory. This will contain input data files as well as output files from RegTools.
118
+
***`tumor_rna_alignments.bam`** - file containing aligned RNA-seq reads for the given sample.
119
+
***`tumor_rna_alignments.bam.bai`** - index file for the above RNA-seq alignment file.
120
+
***`variants.vcf.gz`** - a compressed vcf file containing all variants from a given samples.
121
+
***`variants.vcf.gz.tbi`** - an index file for the vcf file mentioned above.
122
+
***`logs/`** - directory containing log or error files for a given sample.
123
+
***`output/`** - directory containing RegTools output files for a given sample.
124
+
***`cse_identify_filtered_*`** - RegTools output files from the initial RegTools run for a given sample. This will contain results for this sample's variants only.
125
+
***`cse_identify_filtered_compare_*`** - RegTools output files from the second RegTools run for a given sample. This will contain results for all samples' variants.
126
+
***`variants*.bed`** - a bedfile containing the variants considered to be splicing relevant for a given RegTools parameter. This is used later to make `all_splicing_variants*.bed`.
127
+
***`compare_junctions/hist/`** - directory containing output from the statistics script analyze all variants across all samples.
128
+
***`junction_pvalues_*.tsv`** - a file containing the output from the statistic analysis script.
0 commit comments