Skip to content

Commit 85903ca

Browse files
authored
Merge pull request #12 from RasmussenLab/add_filter_module
Add all devs and filter module
2 parents e9a7262 + 77510aa commit 85903ca

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

60 files changed

+2948
-236
lines changed

.gitattributes

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
# SCM syntax highlighting
2+
pixi.lock linguist-language=YAML linguist-generated=true

.gitignore

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -178,3 +178,10 @@ nxf-tmp.*
178178
cramtable.tsv
179179
pooltable.tsv
180180
vcftable.tsv
181+
182+
# pixi environments
183+
.pixi
184+
*.egg-info
185+
pixi.toml
186+
pixi.lock
187+
.DS_Store

README.md

Lines changed: 71 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
1-
# DoBSeq Nextflow Pipeline
1+
# DoBSeq Workflow
22

3-
## General usage outside NGC-HPC:
3+
## General usage:
44

55
### 1. Install nextflow [manually](https://www.nextflow.io/docs/latest/getstarted.html) or using conda:
66

@@ -11,7 +11,7 @@ conda install nextflow
1111
### 2. In a clean folder - clone this repository:
1212

1313
```Bash
14-
git clone https://github.com/madscort/DoBSeqWF.git .
14+
git clone https://github.com/RasmussenLab/DoBSeqWF.git .
1515
```
1616

1717
### 3. Run pipeline without any data (dry-run):
@@ -26,8 +26,9 @@ nextflow run main.nf -profile (standard/esrum/ngc),test -stub
2626
nextflow run main.nf -profile (standard/esrum),test
2727
```
2828

29-
### 5. Run pipeline with input data (see test data for file contents):
30-
29+
### 5. Run pipeline with input data:
30+
The pipeline has multiple optional configurations found in ```nextflow.config```.
31+
Configurations can be supplied as a ```config.json``` and run with ```nextflow run main.nf -profile (standard/esrum) -params-file config.json```, or directly from the commandline:
3132
```Bash
3233
nextflow run main.nf \
3334
-profile (standard/esrum/ngc) \
@@ -37,6 +38,71 @@ nextflow run main.nf \
3738
--bedfile <path to bedfile with target regions> \
3839
--ploidy <integer>
3940
```
41+
The ```pooltable.tsv``` should connect (user assigned) pool id's to input FASTQ files; one entry for each pool.
42+
```Bash
43+
pool_row_1 path/to/sample1_R1.fq.gz path/to/sample1_R2.fq.gz
44+
pool_column_1 path/to/sample2_R1.fq.gz path/to/sample2_R2.fq.gz
45+
```
46+
The ```decodetable.tsv``` should map (user assigned) individual id's in the matrix to the corresponding row and column id's of each pool; one entry for each element in the matrix.
47+
```Bash
48+
individual1 pool_row_1 pool_column_1
49+
```
50+
### 6. Pipeline output
51+
The workflow will output a results folder containing multiple config dependent output files:
52+
```Bash
53+
results
54+
├── pinpointables.vcf # Merged VCF file containing all assigned variants
55+
├── cram/ # CRAM files for each pool
56+
├── logs/ # Log files for each process
57+
├── variants/ # VCF files for each pool
58+
├── variant_tables/ # TSV files converted from pool VCFs
59+
└── pinpoint_variants/
60+
├── all_pins/ # All pinpointables for each sample in individual vcfs (*note)
61+
├── unique_pins/ # All unique pinpointables for each sample in individual vcfs (*note)
62+
├── *_merged.vcf.gz # All pinpointables for all samples in a single vcf without sample information
63+
├── summary.tsv # Variant counts for each sample
64+
└── lookup.tsv # Variant to sample lookup table
65+
```
66+
A central files is the ```pinpointables.vcf```. This file contains all individually assigned variants. Since each variant contains information from two pools, these a presented as the sample columns: ROW and COLUMN.
67+
68+
# Workflow repository contents:
69+
70+
```Bash
71+
DoBSeqWF
72+
├── LICENSE
73+
├── VERSION
74+
├── README.md
75+
├── assets
76+
│ ├── data
77+
│ │ ├── reference_genomes
78+
│ │ │ └── small
79+
│ │ │ └── small_reference.*
80+
│ │ └── test_data
81+
│ │ ├── coordtable.tsv
82+
│ │ ├── decodetable.tsv
83+
│ │ ├── pools
84+
│ │ │ └── *.fq.gz
85+
│ │ ├── pooltable.tsv
86+
│ │ ├── snvlist.tsv
87+
│ │ └── target_calling.bed
88+
│ └── helper_scripts
89+
│ └── simulator.py # Script for simulating minimal pipeline data
90+
├── bin # Executable pipeline scripts
91+
│ └── <script>.*
92+
├── conf
93+
│ └── profiles.config # Configuration profiles for compute environments
94+
├── envs
95+
│ └── <name>/
96+
│ └── environment.yaml # Conda environment definitions
97+
├── main.nf # Main workflow
98+
├── modules/
99+
│ └── <module>.nf # Module scripts
100+
├── subworkflows/
101+
│ └── <subworkflow>.nf # Module scripts
102+
├── next.pbs # Helper script for running on NGC-HPC
103+
└── nextflow.config # Workflow parameters
104+
```
105+
40106

41107
## Usage on NGC-HPC
42108

@@ -165,42 +231,3 @@ tail nextflow.log
165231

166232
If the pipeline fails - it is likely due to resource constraints. Adjust as needed in the conf/profiles.config file under NGC, and rerun the PBS script. Be aware that any direct edits of the workflow scripts, ie. modules and subworkflows, can lead to complete re-run of the pipeline.
167233

168-
169-
# Workflow repository contents:
170-
171-
```Bash
172-
DoBSeqWF
173-
├── LICENSE
174-
├── VERSION
175-
├── README.md
176-
├── assets
177-
│ ├── data
178-
│ │ ├── reference_genomes
179-
│ │ │ └── small
180-
│ │ │ └── small_reference.*
181-
│ │ └── test_data
182-
│ │ ├── coordtable.tsv
183-
│ │ ├── decodetable.tsv
184-
│ │ ├── pools
185-
│ │ │ └── *.fq.gz
186-
│ │ ├── pooltable.tsv
187-
│ │ ├── snvlist.tsv
188-
│ │ └── target_calling.bed
189-
│ └── helper_scripts
190-
│ └── simulator.py # Script for simulating minimal pipeline data
191-
├── bin # Executable pipeline scripts
192-
│ └── <script>.*
193-
├── conf
194-
│ └── profiles.config # Configuration profiles for compute environments
195-
├── envs
196-
│ └── <name>/
197-
│ └── environment.yaml # Conda environment definitions
198-
├── main.nf # Main workflow
199-
├── modules/
200-
│ └── <module>.nf # Module scripts
201-
├── subworkflows/
202-
│ └── <subworkflow>.nf # Module scripts
203-
├── next.pbs # Helper script for running on NGC-HPC
204-
└── nextflow.config # Workflow parameters
205-
```
206-

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
0.1.0
1+
0.2.0
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
empty
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
empty

0 commit comments

Comments
 (0)