You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
### 3. Run pipeline with test data locally (all heavy jobs are still submitted with qsub):
73
+
74
+
```Bash
75
+
bash next.pbs -params-file test_config.json
76
+
```
77
+
78
+
### 4. Submit pipeline with test data:
79
+
80
+
```Bash
81
+
myqsub next.pbs -F -params-file test_config.json
82
+
```
83
+
84
+
## Running the pipeline with _real_ data
85
+
86
+
### 1. Organization
87
+
While the pipeline is still under development, it make sense to create new clones for each pipeline run, to keep track of possible changes done while running it. I propose this folder structure:
88
+
89
+
```Bash
90
+
predisposed
91
+
├── git/DoBSeqWF # Temporary local workflow repository
92
+
├── resources # Reference genome and target files.
93
+
├── data/ # Raw data for each batch
94
+
│ ├── <batch_id_I>/
95
+
│ │ └── *.fq.gz
96
+
│ ├── <batch_id_II>/
97
+
│ │ └── *.fq.gz
98
+
│ └── <batch_id_III>/
99
+
│ └── *.fq.gz
100
+
│ └── ...
101
+
└── processed_data/ # Processed data for each batch
102
+
├── <batch_id_I>/
103
+
│ ├── DoBSeqWF/ # Clone repository here
104
+
│ │ ├── config.json # Configuration file
105
+
│ │ ├── pooltable.tsv # Pool table (create with helper script)
106
+
│ │ └── decodetable.tsv # Decode table (we need a convention for this)
107
+
│ └── results
108
+
│ ├── cram/ # CRAM files for each pool
109
+
│ ├── logs/ # Log files for each process
110
+
│ ├── variants/ # VCF files for each pool
111
+
│ ├── variant_tables/ # TSV files converted from pool VCFs
112
+
│ └── pinpoint_variants/
113
+
│ ├── all_pins/ # All pinpointables for each sample in individual vcfs (*note)
114
+
│ ├── unique_pins/ # All unique pinpointables for each sample in individual vcfs (*note)
115
+
│ ├── *_merged.vcf.gz # All pinpointables for all samples in a single vcf without sample information
116
+
│ ├── summary.tsv # Variant counts for each sample
117
+
│ └── lookup.tsv # Variant to sample lookup table
118
+
├── <batch_id_II>/
119
+
│ ├── DoBSeqWF/
120
+
│ └── results/
121
+
├── <batch_id_III>/
122
+
│ ├── DoBSeqWF/
123
+
└── results/
124
+
└── ...
125
+
```
126
+
(*note) Each pinpointable variant can be represented by the horizontal or the vertical pools. In order not to loose any information, there are, _for now_, 6 vcf files for each sample. Four with representations from either dimension named {sample}\_{pool}\_{unique/all}\_pins.vcf.gz and 2 with all pins merged named {sample}\_{unique/all}.vcf.gz.
Fill out config.json with the correct paths and parameters. Decodetable is not needed for mapping only. Look into nextflow.config for possible parameters to set in the conifg.json.
153
+
154
+
### 5. Run pipeline
155
+
156
+
```Bash
157
+
myqsub next.pbs -F -params-file config.json
158
+
```
159
+
160
+
### 6. Monitor progress
161
+
162
+
```Bash
163
+
tail nextflow.log
164
+
```
165
+
166
+
If the pipeline fails - it is likely due to resource constraints. Adjust as needed in the conf/profiles.config file under NGC, and rerun the PBS script. Be aware that any direct edits of the workflow scripts, ie. modules and subworkflows, can lead to complete re-run of the pipeline.
167
+
168
+
169
+
# Workflow repository contents:
50
170
51
171
```Bash
52
172
DoBSeqWF
53
173
├── LICENSE
174
+
├── VERSION
54
175
├── README.md
55
176
├── assets
56
177
│ ├── data
57
178
│ │ ├── reference_genomes
58
179
│ │ │ └── small
59
180
│ │ │ └── small_reference.*
60
181
│ │ └── test_data
182
+
│ │ ├── coordtable.tsv
61
183
│ │ ├── decodetable.tsv
62
184
│ │ ├── pools
63
185
│ │ │ └── *.fq.gz
@@ -76,6 +198,9 @@ DoBSeqWF
76
198
├── main.nf # Main workflow
77
199
├── modules/
78
200
│ └── <module>.nf # Module scripts
201
+
├── subworkflows/
202
+
│ └── <subworkflow>.nf # Module scripts
203
+
├── next.pbs # Helper script for running on NGC-HPC
##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed">
10
+
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency, for each ALT allele, in the same order as listed">
11
+
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
12
+
##INFO=<ID=BaseQRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt Vs. Ref base qualities">
13
+
##INFO=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth; some reads may have been filtered">
14
+
##INFO=<ID=ExcessHet,Number=1,Type=Float,Description="Phred-scaled p-value for exact test of excess heterozygosity">
15
+
##INFO=<ID=FS,Number=1,Type=Float,Description="Phred-scaled p-value using Fisher's exact test to detect strand bias">
16
+
##INFO=<ID=InbreedingCoeff,Number=1,Type=Float,Description="Inbreeding coefficient as estimated from the genotype likelihoods per-sample when compared against the Hardy-Weinberg expectation">
17
+
##INFO=<ID=MLEAC,Number=A,Type=Integer,Description="Maximum likelihood expectation (MLE) for the allele counts (not necessarily the same as the AC), for each ALT allele, in the same order as listed">
18
+
##INFO=<ID=MLEAF,Number=A,Type=Float,Description="Maximum likelihood expectation (MLE) for the allele frequency (not necessarily the same as the AF), for each ALT allele, in the same order as listed">
##INFO=<ID=MQRankSum,Number=1,Type=Float,Description="Z-score From Wilcoxon rank sum test of Alt vs. Ref read mapping qualities">
21
+
##INFO=<ID=QD,Number=1,Type=Float,Description="Variant Confidence/Quality by Depth">
22
+
##INFO=<ID=ReadPosRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt vs. Ref read position bias">
23
+
##INFO=<ID=SOR,Number=1,Type=Float,Description="Symmetric Odds Ratio of 2x2 contingency table to detect strand bias">
24
+
##contig=<ID=small_ref,length=1980>
25
+
##source=HaplotypeCaller
26
+
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT B0_H1
27
+
small_ref 280 . T C 2001.64 . AC=1;AF=0.500;AN=2;BaseQRankSum=0.000;DP=102;ExcessHet=0.0000;FS=3.235;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.000;QD=20.22;ReadPosRankSum=-1.398;SOR=1.806 GT:AD:DP:GQ:PL 0/1:47,52:99:99:2009,0,1806
28
+
small_ref 655 . C T 3301.64 . AC=1;AF=0.500;AN=2;BaseQRankSum=0.000;DP=181;ExcessHet=0.0000;FS=1.851;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.000;QD=18.65;ReadPosRankSum=1.852;SOR=0.643 GT:AD:DP:GQ:PL 0/1:91,86:177:99:3309,0,3533
0 commit comments