Skip to content

Commit 431d02f

Browse files
authored
Merge pull request #136 from griffithlab/docs_update
Docs update
2 parents 9a01f5e + 67d277e commit 431d02f

File tree

1 file changed

+31
-9
lines changed

1 file changed

+31
-9
lines changed

docs/workflow.md

Lines changed: 31 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -2,23 +2,20 @@
22

33
This is an example workflow for running RegTools on a cohort of samples. This analysis requires that there be a VCF and RNA bam file for each sample. The workflow described below was used to run our own analysis on TCGA data.
44

5-
By the end of the analysis, the directory structure should look like the example below. The `*` in the example below refers to the tag/parameter used to run `regtools cis-splice-effects identify` with.
5+
By the end of the analysis, the directory structure should look like the example below. The `*` in the example below refers to the tag/parameter used to run `regtools cis-splice-effects identify` with. At the bottom of this page, we provide a description of each of these files.
66

77
```bash
88
- Project/
99
- all_splicing_variants*.bed
10-
- paths.tsv
11-
- make_vcfs.sh
1210
- dir_names.tsv
1311
- variants_all_sorted.vcf.gz
1412
- variants_all_sorted.vcf.gz.tbi
1513
- samples/
1614
- Sample_1/
1715
- tumor_rna_alignments.bam
1816
- tumor_rna_alignments.bam.bai
19-
- variants.per_gene.vep.vcf.gz
20-
- variants.per_gene.vep.vcf.gz.tbi
21-
- variants.ensembl
17+
- variants.vcf.gz
18+
- variants.vcf.gz.tbi
2219
- logs/
2320
- output/
2421
- cse_identify_filtered_*
@@ -27,9 +24,8 @@ By the end of the analysis, the directory structure should look like the example
2724
- Sample_2/
2825
- tumor_rna_alignments.bam
2926
- tumor_rna_alignments.bam.bai
30-
- variants.per_gene.vep.vcf.gz
31-
- variants.per_gene.vep.vcf.gz.tbi
32-
- variants.ensembl
27+
- variants.vcf.gz
28+
- variants.vcf.gz.tbi
3329
- logs/
3430
- output/
3531
- cse_identify_filtered_*
@@ -104,3 +100,29 @@ python3 stats_wrapper.py <tag>
104100
```bash
105101
Rscript --vanilla filter_and_BH.R <tag>
106102
```
103+
104+
## File description
105+
106+
* **`all_splicing_variants*.bed`** - a file containing all of the variants that regtools identified as being associated with a junction for the given parameters used to run `cis-splice-effects identify`.
107+
* **`dir_names.tsv`** - a file containing a list of each of the sample directories with each directory on a new line. This can be obtained by using `ls samples/ > dir_names.tsv`. For this example, it would look like:
108+
109+
```bash
110+
Sample_1
111+
Sample_2
112+
```
113+
114+
* **`variants_all_sorted.vcf.gz`** - a compressed vcf file containing all variants from all samples.
115+
* **`variants_all_sorted.vcf.gz.tbi`** - an index file for the vcf file mentioned above.
116+
* **`samples/`** - a directory containing each of the samples to be analyzed alongside each other.
117+
* **`Sample_1/`** - a sample directory. This will contain input data files as well as output files from RegTools.
118+
* **`tumor_rna_alignments.bam`** - file containing aligned RNA-seq reads for the given sample.
119+
* **`tumor_rna_alignments.bam.bai`** - index file for the above RNA-seq alignment file.
120+
* **`variants.vcf.gz`** - a compressed vcf file containing all variants from a given samples.
121+
* **`variants.vcf.gz.tbi`** - an index file for the vcf file mentioned above.
122+
* **`logs/`** - directory containing log or error files for a given sample.
123+
* **`output/`** - directory containing RegTools output files for a given sample.
124+
* **`cse_identify_filtered_*`** - RegTools output files from the initial RegTools run for a given sample. This will contain results for this sample's variants only.
125+
* **`cse_identify_filtered_compare_*`** - RegTools output files from the second RegTools run for a given sample. This will contain results for all samples' variants.
126+
* **`variants*.bed`** - a bedfile containing the variants considered to be splicing relevant for a given RegTools parameter. This is used later to make `all_splicing_variants*.bed`.
127+
* **`compare_junctions/hist/`** - directory containing output from the statistics script analyze all variants across all samples.
128+
* **`junction_pvalues_*.tsv`** - a file containing the output from the statistic analysis script.

0 commit comments

Comments
 (0)