Skip to content

Commit 6e796e3

Browse files
authored
Merge pull request #247 from ENCODE-DCC/PIP-1642_hotfix_for_dx_on_dockstore
Pip 1642 hotfix for dx on dockstore
2 parents e0c687e + d8bdd2a commit 6e796e3

File tree

6 files changed

+79
-37
lines changed

6 files changed

+79
-37
lines changed

README.md

Lines changed: 31 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -3,14 +3,14 @@
33
[![CircleCI](https://circleci.com/gh/ENCODE-DCC/chip-seq-pipeline2/tree/master.svg?style=svg)](https://circleci.com/gh/ENCODE-DCC/chip-seq-pipeline2/tree/master)
44

55

6-
## Download new Caper>=2.0
6+
## Download new Caper>=2.1
77

88
New Caper is out. You need to update your Caper to work with the latest ENCODE ChIP-seq pipeline.
99
```bash
1010
$ pip install caper --upgrade
1111
```
1212

13-
## Local/HPC users and new Caper>=2.0
13+
## Local/HPC users and new Caper>=2.1
1414

1515
There are tons of changes for local/HPC backends: `local`, `slurm`, `sge`, `pbs` and `lsf`(added). Make a backup of your current Caper configuration file `~/.caper/default.conf` and run `caper init`. Local/HPC users need to reset/initialize Caper's configuration file according to your chosen backend. Edit the configuration file and follow instructions in there.
1616
```bash
@@ -72,10 +72,19 @@ This ChIP-Seq pipeline is based off the ENCODE (phase-3) transcription factor an
7272
$ bash scripts/install_conda_env.sh
7373
```
7474

75+
## Input JSON file
76+
77+
> **IMPORTANT**: DO NOT BLINDLY USE A TEMPLATE/EXAMPLE INPUT JSON. READ THROUGH THE FOLLOWING GUIDE TO MAKE A CORRECT INPUT JSON FILE.
78+
79+
An input JSON file specifies all the input parameters and files that are necessary for successfully running this pipeline. This includes a specification of the path to the genome reference files and the raw data fastq file. Please make sure to specify absolute paths rather than relative paths in your input JSON files.
80+
81+
1) [Input JSON file specification (short)](docs/input_short.md)
82+
2) [Input JSON file specification (long)](docs/input.md)
7583

76-
## Test run
7784

78-
You can use URIs(`s3://`, `gs://` and `http(s)://`) in Caper's command lines and input JSON file then Caper will automatically download/localize such files. Input JSON file URL: https://storage.googleapis.com/encode-pipeline-test-samples/encode-chip-seq-pipeline/ENCSR000DYI_subsampled_chr19_only.json
85+
## Running on local computer/HPCs
86+
87+
You can use URIs(`s3://`, `gs://` and `http(s)://`) in Caper's command lines and input JSON file then Caper will automatically download/localize such files. Input JSON file example: https://storage.googleapis.com/encode-pipeline-test-samples/encode-chip-seq-pipeline/ENCSR000DYI_subsampled_chr19_only.json
7988

8089
According to your chosen platform of Caper, run Caper or submit Caper command line to the cluster. You can choose other environments like `--singularity` or `--docker` instead of `--conda`. But you must define one of the environments.
8190

@@ -87,10 +96,16 @@ The followings are just examples. Please read [Caper's README](https://github.co
8796
# Or submit it as a leader job (with long/enough resources) to SLURM (Stanford Sherlock) with Singularity
8897
# It will fail if you directly run the leader job on login nodes
8998
$ sbatch -p [SLURM_PARTITON] -J [WORKFLOW_NAME] --export=ALL --mem 4G -t 4-0 --wrap "caper chip chip.wdl -i https://storage.googleapis.com/encode-pipeline-test-samples/encode-chip-seq-pipeline/ENCSR000DYI_subsampled_chr19_only.json --singularity"
99+
100+
# Check status of your leader job
101+
$ squeue -u $USER | grep [WORKFLOW_NAME]
102+
103+
# Cancel the leader node to close all of its children jobs
104+
$ scancel -j [JOB_ID]
90105
```
91106

92107

93-
## Running a pipeline on Terra/Anvil (using Dockstore)
108+
## Running on Terra/Anvil (using Dockstore)
94109

95110
Visit our pipeline repo on [Dockstore](https://dockstore.org/workflows/github.com/ENCODE-DCC/chip-seq-pipeline2). Click on `Terra` or `Anvil`. Follow Terra's instruction to create a workspace on Terra and add Terra's billing bot to your Google Cloud account.
96111

@@ -99,31 +114,31 @@ Download this [test input JSON for Terra](https://storage.googleapis.com/encode-
99114
If you want to use your own input JSON file, then make sure that all files in the input JSON are on a Google Cloud Storage bucket (`gs://`). URLs will not work.
100115

101116

102-
## Running a pipeline on DNAnexus (using Dockstore)
117+
## Running on DNAnexus (using Dockstore)
103118

104119
Sign up for a new account on [DNAnexus](https://platform.dnanexus.com/) and create a new project on either AWS or Azure. Visit our pipeline repo on [Dockstore](https://dockstore.org/workflows/github.com/ENCODE-DCC/chip-seq-pipeline2). Click on `DNAnexus`. Choose a destination directory on your DNAnexus project. Click on `Submit` and visit DNAnexus. This will submit a conversion job so that you can check status of it on `Monitor` on DNAnexus UI.
105120

106121
Once conversion is done download one of the following input JSON files according to your chosen platform (AWS or Azure) for your DNAnexus project:
107122
- AWS: https://storage.googleapis.com/encode-pipeline-test-samples/encode-chip-seq-pipeline/ENCSR000DYI_subsampled_chr19_only_dx.json
108123
- Azure: https://storage.googleapis.com/encode-pipeline-test-samples/encode-chip-seq-pipeline/ENCSR000DYI_subsampled_chr19_only_dx_azure.json
109124

110-
You cannot use these input JSON files directly. Go to the destination directory on DNAnexus and click on the converted workflow `chip`. You will see input file boxes in the left-hand side of the task graph. Expand it and define FASTQs (`fastq_repX_R1`) and `genome_tsv` as in the downloaded input JSON file. Click on the `common` task box and define other non-file pipeline parameters.
111-
125+
You cannot use these input JSON files directly. Go to the destination directory on DNAnexus and click on the converted workflow `chip`. You will see input file boxes in the left-hand side of the task graph. Expand it and define FASTQs (`fastq_repX_R1` and `fastq_repX_R1`) and `genome_tsv` as in the downloaded input JSON file. Click on the `common` task box and define other non-file pipeline parameters. e.g. `pipeline_type`, `paired_end` and `ctl_paired_end`.
112126

113-
## Running a pipeline on DNAnexus (using our pre-built workflows)
127+
We have a separate project on DNANexus to provide example FASTQs and `genome_tsv` for `hg38` and `mm10` (also chr19-only version of those two. Use chr19-only versions for testing). We recommend to make copies of these directories on your own project.
114128

115-
See [this](docs/tutorial_dx_web.md) for details.
129+
`genome_tsv`
130+
- AWS: https://platform.dnanexus.com/projects/BKpvFg00VBPV975PgJ6Q03v6/data/pipeline-genome-data/genome_tsv/v3
131+
- Azure: https://platform.dnanexus.com/projects/F6K911Q9xyfgJ36JFzv03Z5J/data/pipeline-genome-data/genome_tsv/v3
116132

133+
Example FASTQs
134+
- AWS: https://platform.dnanexus.com/projects/BKpvFg00VBPV975PgJ6Q03v6/data/pipeline-test-samples/encode-chip-seq-pipeline/ENCSR000DYI/fastq_subsampled
135+
- Azure: https://platform.dnanexus.com/projects/F6K911Q9xyfgJ36JFzv03Z5J/data/pipeline-test-samples/encode-chip-seq-pipeline/ENCSR000DYI/fastq_subsampled
117136

118137

119-
## Input JSON file
120-
121-
> **IMPORTANT**: DO NOT BLINDLY USE A TEMPLATE/EXAMPLE INPUT JSON. READ THROUGH THE FOLLOWING GUIDE TO MAKE A CORRECT INPUT JSON FILE.
138+
## Running on DNAnexus (using our pre-built workflows)
122139

123-
An input JSON file specifies all the input parameters and files that are necessary for successfully running this pipeline. This includes a specification of the path to the genome reference files and the raw data fastq file. Please make sure to specify absolute paths rather than relative paths in your input JSON files.
140+
See [this](docs/tutorial_dx_web.md) for details.
124141

125-
1) [Input JSON file specification (short)](docs/input_short.md)
126-
2) [Input JSON file specification (long)](docs/input.md)
127142

128143
## Running and sharing on Truwl
129144
You can run this pipeline on [truwl.com](https://truwl.com/). This provides a web interface that allows you to define inputs and parameters, run the job on GCP, and monitor progress. To run it you will need to create an account on the platform then request early access by emailing [[email protected]](mailto:[email protected]) to get the right permissions. You can see the example cases from this repo at [https://truwl.com/workflows/instance/WF_dd6938.8f.340f/command](https://truwl.com/workflows/instance/WF_dd6938.8f.340f/command) and [https://truwl.com/workflows/instance/WF_dd6938.8f.8aa3/command](https://truwl.com/workflows/instance/WF_dd6938.8f.8aa3/command). The example jobs (or other jobs) can be forked to pre-populate the inputs for your own job.

chip.wdl

Lines changed: 15 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -7,10 +7,10 @@ struct RuntimeEnvironment {
77
}
88

99
workflow chip {
10-
String pipeline_ver = 'v2.1.0'
10+
String pipeline_ver = 'v2.1.1'
1111

1212
meta {
13-
version: 'v2.1.0'
13+
version: 'v2.1.1'
1414

1515
author: 'Jin wook Lee'
1616
@@ -19,8 +19,8 @@ workflow chip {
1919

2020
specification_document: 'https://docs.google.com/document/d/1lG_Rd7fnYgRpSIqrIfuVlAz2dW1VaSQThzk836Db99c/edit?usp=sharing'
2121

22-
default_docker: 'encodedcc/chip-seq-pipeline:v2.1.0'
23-
default_singularity: 'library://leepc12/default/chip-seq-pipeline:v2.1.0'
22+
default_docker: 'encodedcc/chip-seq-pipeline:v2.1.1'
23+
default_singularity: 'library://leepc12/default/chip-seq-pipeline:v2.1.1'
2424
croo_out_def: 'https://storage.googleapis.com/encode-pipeline-output-definition/chip.croo.v5.json'
2525

2626
parameter_group: {
@@ -71,8 +71,8 @@ workflow chip {
7171
}
7272
input {
7373
# group: runtime_environment
74-
String docker = 'encodedcc/chip-seq-pipeline:v2.1.0'
75-
String singularity = 'library://leepc12/default/chip-seq-pipeline:v2.1.0'
74+
String docker = 'encodedcc/chip-seq-pipeline:v2.1.1'
75+
String singularity = 'library://leepc12/default/chip-seq-pipeline:v2.1.1'
7676
String conda = 'encode-chip-seq-pipeline'
7777
String conda_macs2 = 'encode-chip-seq-pipeline-macs2'
7878
String conda_spp = 'encode-chip-seq-pipeline-spp'
@@ -1257,11 +1257,11 @@ workflow chip {
12571257
else select_first([paired_end])
12581258

12591259
Boolean has_input_of_align = i<length(fastqs_R1) && length(fastqs_R1[i])>0
1260-
Boolean has_output_of_align = i<length(bams) && defined(bams[i])
1260+
Boolean has_output_of_align = i<length(bams)
12611261
if ( has_input_of_align && !has_output_of_align ) {
12621262
call align { input :
12631263
fastqs_R1 = fastqs_R1[i],
1264-
fastqs_R2 = fastqs_R2[i],
1264+
fastqs_R2 = if paired_end_ then fastqs_R2[i] else [],
12651265
crop_length = crop_length,
12661266
crop_length_tol = crop_length_tol,
12671267
trimmomatic_phred_score_format = trimmomatic_phred_score_format,
@@ -1289,7 +1289,7 @@ workflow chip {
12891289
File? bam_ = if has_output_of_align then bams[i] else align.bam
12901290
12911291
Boolean has_input_of_filter = has_output_of_align || defined(align.bam)
1292-
Boolean has_output_of_filter = i<length(nodup_bams) && defined(nodup_bams[i])
1292+
Boolean has_output_of_filter = i<length(nodup_bams)
12931293
# skip if we already have output of this step
12941294
if ( has_input_of_filter && !has_output_of_filter ) {
12951295
call filter { input :
@@ -1315,7 +1315,7 @@ workflow chip {
13151315
File? nodup_bam_ = if has_output_of_filter then nodup_bams[i] else filter.nodup_bam
13161316
13171317
Boolean has_input_of_bam2ta = has_output_of_filter || defined(filter.nodup_bam)
1318-
Boolean has_output_of_bam2ta = i<length(tas) && defined(tas[i])
1318+
Boolean has_output_of_bam2ta = i<length(tas)
13191319
if ( has_input_of_bam2ta && !has_output_of_bam2ta ) {
13201320
call bam2ta { input :
13211321
bam = nodup_bam_,
@@ -1490,7 +1490,7 @@ workflow chip {
14901490
14911491
# before peak calling, get fragment length from xcor analysis or given input
14921492
# if fraglen [] is defined in the input JSON, fraglen from xcor will be ignored
1493-
Int? fraglen_ = if i<length(fraglen) && defined(fraglen[i]) then fraglen[i]
1493+
Int? fraglen_ = if i<length(fraglen) then fraglen[i]
14941494
else xcor.fraglen
14951495
}
14961496
@@ -1502,11 +1502,11 @@ workflow chip {
15021502
else select_first([ctl_paired_end, paired_end])
15031503

15041504
Boolean has_input_of_align_ctl = i<length(ctl_fastqs_R1) && length(ctl_fastqs_R1[i])>0
1505-
Boolean has_output_of_align_ctl = i<length(ctl_bams) && defined(ctl_bams[i])
1505+
Boolean has_output_of_align_ctl = i<length(ctl_bams)
15061506
if ( has_input_of_align_ctl && !has_output_of_align_ctl ) {
15071507
call align as align_ctl { input :
15081508
fastqs_R1 = ctl_fastqs_R1[i],
1509-
fastqs_R2 = ctl_fastqs_R2[i],
1509+
fastqs_R2 = if ctl_paired_end_ then ctl_fastqs_R2[i] else [],
15101510
crop_length = crop_length,
15111511
crop_length_tol = crop_length_tol,
15121512
trimmomatic_phred_score_format = trimmomatic_phred_score_format,
@@ -1534,7 +1534,7 @@ workflow chip {
15341534
File? ctl_bam_ = if has_output_of_align_ctl then ctl_bams[i] else align_ctl.bam
15351535
15361536
Boolean has_input_of_filter_ctl = has_output_of_align_ctl || defined(align_ctl.bam)
1537-
Boolean has_output_of_filter_ctl = i<length(ctl_nodup_bams) && defined(ctl_nodup_bams[i])
1537+
Boolean has_output_of_filter_ctl = i<length(ctl_nodup_bams)
15381538
# skip if we already have output of this step
15391539
if ( has_input_of_filter_ctl && !has_output_of_filter_ctl ) {
15401540
call filter as filter_ctl { input :
@@ -1560,7 +1560,7 @@ workflow chip {
15601560
File? ctl_nodup_bam_ = if has_output_of_filter_ctl then ctl_nodup_bams[i] else filter_ctl.nodup_bam
15611561
15621562
Boolean has_input_of_bam2ta_ctl = has_output_of_filter_ctl || defined(filter_ctl.nodup_bam)
1563-
Boolean has_output_of_bam2ta_ctl = i<length(ctl_tas) && defined(ctl_tas[i])
1563+
Boolean has_output_of_bam2ta_ctl = i<length(ctl_tas)
15641564
if ( has_input_of_bam2ta_ctl && !has_output_of_bam2ta_ctl ) {
15651565
call bam2ta as bam2ta_ctl { input :
15661566
bam = ctl_nodup_bam_,
@@ -3268,13 +3268,11 @@ task rounded_mean {
32683268
task raise_exception {
32693269
input {
32703270
String msg
3271-
Array[String]? vals
32723271

32733272
RuntimeEnvironment runtime_environment
32743273
}
32753274
command {
32763275
echo -e "\n* Error: ${msg}\n" >&2
3277-
echo -e "* Vals: ${sep=',' vals}\n" >&2
32783276
exit 2
32793277
}
32803278
output {

dev/test/test_task/test_choose_ctl.wdl

Lines changed: 30 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -191,11 +191,40 @@ workflow test_choose_ctl {
191191
String k = test.left
192192
Pair[Int, Int] v = test.right
193193
if ( v.left != v.right ) {
194-
call chip.raise_exception { input:
194+
call raise_exception_and_print { input:
195195
msg = k,
196196
vals = [v.left, v.right],
197197
runtime_environment = runtime_environment,
198198
}
199199
}
200200
}
201201
}
202+
203+
204+
task raise_exception_and_print {
205+
input {
206+
String msg
207+
Array[String]? vals
208+
209+
RuntimeEnvironment runtime_environment
210+
}
211+
command {
212+
echo -e "\n* Error: ${msg}\n" >&2
213+
echo -e "* Vals: ${sep=',' vals}\n" >&2
214+
exit 2
215+
}
216+
output {
217+
String error_msg = '${msg}'
218+
}
219+
runtime {
220+
maxRetries : 0
221+
cpu : 1
222+
memory : '2 GB'
223+
time : 4
224+
disks : 'local-disk 10 SSD'
225+
226+
docker : runtime_environment.docker
227+
singularity : runtime_environment.singularity
228+
conda : runtime_environment.conda
229+
}
230+
}

example_input_json/dx/ENCSR000DYI_subsampled_chr19_only_dx.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"chip.pipeline_type" : "tf",
3-
"chip.genome_tsv" : "dx://project-BKpvFg00VBPV975PgJ6Q03v6:/pipeline-genome-data/genome_tsv/v1/hg38_chr19_chrM_dx.tsv",
3+
"chip.genome_tsv" : "dx://project-BKpvFg00VBPV975PgJ6Q03v6:/pipeline-genome-data/genome_tsv/v3/hg38_chr19_chrM.dx.tsv",
44
"chip.fastqs_rep1_R1" : ["dx://project-BKpvFg00VBPV975PgJ6Q03v6:/pipeline-test-samples/encode-chip-seq-pipeline/ENCSR000DYI/fastq_subsampled/rep1.subsampled.25.fastq.gz"
55
],
66
"chip.fastqs_rep2_R1" : ["dx://project-BKpvFg00VBPV975PgJ6Q03v6:/pipeline-test-samples/encode-chip-seq-pipeline/ENCSR000DYI/fastq_subsampled/rep2.subsampled.20.fastq.gz"

example_input_json/dx/ENCSR000DYI_subsampled_chr19_only_rep1_dx.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"chip.pipeline_type" : "tf",
3-
"chip.genome_tsv" : "dx://project-BKpvFg00VBPV975PgJ6Q03v6:/pipeline-genome-data/genome_tsv/v1/hg38_chr19_chrM_dx.tsv",
3+
"chip.genome_tsv" : "dx://project-BKpvFg00VBPV975PgJ6Q03v6:/pipeline-genome-data/genome_tsv/v3/hg38_chr19_chrM.dx.tsv",
44
"chip.fastqs_rep1_R1" : ["dx://project-BKpvFg00VBPV975PgJ6Q03v6:/pipeline-test-samples/encode-chip-seq-pipeline/ENCSR000DYI/fastq_subsampled/rep1.subsampled.25.fastq.gz"
55
],
66
"chip.ctl_fastqs_rep1_R1" : ["dx://project-BKpvFg00VBPV975PgJ6Q03v6:/pipeline-test-samples/encode-chip-seq-pipeline/ENCSR000DYI/fastq_subsampled/ctl1.subsampled.25.fastq.gz"

example_input_json/dx_azure/ENCSR000DYI_subsampled_chr19_only_dx_azure.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"chip.pipeline_type" : "tf",
3-
"chip.genome_tsv" : "dx://project-F6K911Q9xyfgJ36JFzv03Z5J:/pipeline-genome-data/genome_tsv/v1/hg38_chr19_chrM_dx_azure.tsv",
3+
"chip.genome_tsv" : "dx://project-F6K911Q9xyfgJ36JFzv03Z5J:/pipeline-genome-data/genome_tsv/v3/hg38_chr19_chrM.dx_azure.tsv",
44
"chip.fastqs_rep1_R1" : ["dx://project-F6K911Q9xyfgJ36JFzv03Z5J:/pipeline-test-samples/encode-chip-seq-pipeline/ENCSR000DYI/fastq_subsampled/rep1.subsampled.25.fastq.gz"
55
],
66
"chip.fastqs_rep2_R1" : ["dx://project-F6K911Q9xyfgJ36JFzv03Z5J:/pipeline-test-samples/encode-chip-seq-pipeline/ENCSR000DYI/fastq_subsampled/rep2.subsampled.15.fastq.gz"

0 commit comments

Comments
 (0)