Skip to content

Commit cd1508c

Browse files
authored
Merge pull request #351 from ENCODE-DCC/dev
v2.0.2
2 parents 4f8e9bd + 219f033 commit cd1508c

File tree

2 files changed

+44
-27
lines changed

2 files changed

+44
-27
lines changed

README.md

Lines changed: 33 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -3,14 +3,14 @@
33
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.156534.svg)](https://doi.org/10.5281/zenodo.156534)[![CircleCI](https://circleci.com/gh/ENCODE-DCC/atac-seq-pipeline/tree/master.svg?style=svg)](https://circleci.com/gh/ENCODE-DCC/atac-seq-pipeline/tree/master)
44

55

6-
## Download new Caper>=2.0
6+
## Download new Caper>=2.1
77

88
New Caper is out. You need to update your Caper to work with the latest ENCODE ATAC-seq pipeline.
99
```bash
1010
$ pip install caper --upgrade
1111
```
1212

13-
## Local/HPC users and new Caper>=2.0
13+
## Local/HPC users and new Caper>=2.1
1414

1515
There are tons of changes for local/HPC backends: `local`, `slurm`, `sge`, `pbs` and `lsf`(added). Make a backup of your current Caper configuration file `~/.caper/default.conf` and run `caper init`. Local/HPC users need to reset/initialize Caper's configuration file according to your chosen backend. Edit the configuration file and follow instructions in there.
1616
```bash
@@ -75,9 +75,20 @@ The ATAC-seq pipeline protocol specification is [here](https://docs.google.com/d
7575
$ bash scripts/install_conda_env.sh
7676
```
7777

78-
## Test run
7978

80-
You can use URIs(`s3://`, `gs://` and `http(s)://`) in Caper's command lines and input JSON file then Caper will automatically download/localize such files. Input JSON file URL: https://storage.googleapis.com/encode-pipeline-test-samples/encode-atac-seq-pipeline/ENCSR356KRQ_subsampled.json
79+
## Input JSON file specification
80+
81+
> **IMPORTANT**: DO NOT BLINDLY USE A TEMPLATE/EXAMPLE INPUT JSON. READ THROUGH THE FOLLOWING GUIDE TO MAKE A CORRECT INPUT JSON FILE. ESPECIALLY FOR AUTODETECTING/DEFINING ADAPTERS.
82+
83+
An input JSON file specifies all the input parameters and files that are necessary for successfully running this pipeline. This includes a specification of the path to the genome reference files and the raw data fastq file. Please make sure to specify absolute paths rather than relative paths in your input JSON files.
84+
85+
1) [Input JSON file specification (short)](docs/input_short.md)
86+
2) [Input JSON file specification (long)](docs/input.md)
87+
88+
89+
## Running on local computer/HPCs
90+
91+
You can use URIs(`s3://`, `gs://` and `http(s)://`) in Caper's command lines and input JSON file then Caper will automatically download/localize such files. Input JSON file example: https://storage.googleapis.com/encode-pipeline-test-samples/encode-atac-seq-pipeline/ENCSR356KRQ_subsampled.json
8192

8293
According to your chosen platform of Caper, run Caper or submit Caper command line to the cluster. You can choose other environments like `--singularity` or `--docker` instead of `--conda`. But you must define one of the environments.
8394

@@ -89,6 +100,12 @@ The followings are just examples. Please read [Caper's README](https://github.co
89100
# Or submit it as a leader job (with long/enough resources) to SLURM (Stanford Sherlock) with Singularity
90101
# It will fail if you directly run the leader job on login nodes
91102
$ sbatch -p [SLURM_PARTITON] -J [WORKFLOW_NAME] --export=ALL --mem 4G -t 4-0 --wrap "caper run atac.wdl -i https://storage.googleapis.com/encode-pipeline-test-samples/encode-atac-seq-pipeline/ENCSR356KRQ_subsampled.json --singularity"
103+
104+
# Check status of your leader job
105+
$ squeue -u $USER | grep [WORKFLOW_NAME]
106+
107+
# Cancel the leader node to close all of its children jobs
108+
$ scancel -j [JOB_ID]
92109
```
93110

94111

@@ -99,7 +116,7 @@ You can run this pipeline on [truwl.com](https://truwl.com/). This provides a we
99116
If you do not run the pipeline on Truwl, you can still share your use-case/job on the platform by getting in touch at [info@truwl.com](mailto:info@truwl.com) and providing your inputs.json file.
100117

101118

102-
## Running a pipeline on Terra/Anvil (using Dockstore)
119+
## Running on Terra/Anvil (using Dockstore)
103120

104121
Visit our pipeline repo on [Dockstore](https://dockstore.org/workflows/github.com/ENCODE-DCC/atac-seq-pipeline). Click on `Terra` or `Anvil`. Follow Terra's instruction to create a workspace on Terra and add Terra's billing bot to your Google Cloud account.
105122

@@ -108,30 +125,30 @@ Download this [test input JSON for Terra](https://storage.googleapis.com/encode-
108125
If you want to use your own input JSON file, then make sure that all files in the input JSON are on a Google Cloud Storage bucket (`gs://`). URLs will not work.
109126

110127

111-
## Running a pipeline on DNAnexus (using Dockstore)
128+
## Running on DNAnexus (using Dockstore)
112129

113130
Sign up for a new account on [DNAnexus](https://platform.dnanexus.com/) and create a new project on either AWS or Azure. Visit our pipeline repo on [Dockstore](https://dockstore.org/workflows/github.com/ENCODE-DCC/atac-seq-pipeline). Click on `DNAnexus`. Choose a destination directory on your DNAnexus project. Click on `Submit` and visit DNAnexus. This will submit a conversion job so that you can check status of it on `Monitor` on DNAnexus UI.
114131

115132
Once conversion is done download one of the following input JSON files according to your chosen platform (AWS or Azure) for your DNAnexus project:
116133
- AWS: https://storage.googleapis.com/encode-pipeline-test-samples/encode-atac-seq-pipeline/ENCSR356KRQ_subsampled_dx.json
117134
- Azure: https://storage.googleapis.com/encode-pipeline-test-samples/encode-atac-seq-pipeline/ENCSR356KRQ_subsampled_dx_azure.json
118135

119-
You cannot use these input JSON files directly. Go to the destination directory on DNAnexus and click on the converted workflow `atac`. You will see input file boxes in the left-hand side of the task graph. Expand it and define FASTQs (`fastq_repX_R1`) and `genome_tsv` as in the downloaded input JSON file. Click on the `common` task box and define other non-file pipeline parameters.
120-
136+
You cannot use these input JSON files directly. Go to the destination directory on DNAnexus and click on the converted workflow `atac`. You will see input file boxes in the left-hand side of the task graph. Expand it and define FASTQs (`fastq_repX_R1` and also `fastq_repX_R2` if it's paired-ended) and `genome_tsv` as in the downloaded input JSON file. Click on the `common` task box and define other non-file pipeline parameters. e.g. `auto_detect_adapters` and `paired_end`.
121137

122-
## Running a pipeline on DNAnexus (using our pre-built workflows)
123-
124-
See [this](docs/tutorial_dx_web.md) for details.
138+
We have a separate project on DNANexus to provide example FASTQs and `genome_tsv` for `hg38` and `mm10`. We recommend to make copies of these directories on your own project.
125139

140+
`genome_tsv`
141+
- AWS: https://platform.dnanexus.com/projects/BKpvFg00VBPV975PgJ6Q03v6/data/pipeline-genome-data/genome_tsv/v3
142+
- Azure: https://platform.dnanexus.com/projects/F6K911Q9xyfgJ36JFzv03Z5J/data/pipeline-genome-data/genome_tsv/v3
126143

127-
## Input JSON file specification
144+
Example FASTQs
145+
- AWS: https://platform.dnanexus.com/projects/BKpvFg00VBPV975PgJ6Q03v6/data/pipeline-test-samples/encode-atac-seq-pipeline/ENCSR356KRQ/fastq_subsampled
146+
- Azure: https://platform.dnanexus.com/projects/F6K911Q9xyfgJ36JFzv03Z5J/data/pipeline-test-samples/encode-atac-seq-pipeline/ENCSR356KRQ/fastq_subsampled
128147

129-
> **IMPORTANT**: DO NOT BLINDLY USE A TEMPLATE/EXAMPLE INPUT JSON. READ THROUGH THE FOLLOWING GUIDE TO MAKE A CORRECT INPUT JSON FILE. ESPECIALLY FOR ADAPTERS.
130148

131-
An input JSON file specifies all the input parameters and files that are necessary for successfully running this pipeline. This includes a specification of the path to the genome reference files and the raw data fastq file. Please make sure to specify absolute paths rather than relative paths in your input JSON files.
149+
## Running on DNAnexus (using our pre-built workflows)
132150

133-
1) [Input JSON file specification (short)](docs/input_short.md)
134-
2) [Input JSON file specification (long)](docs/input.md)
151+
See [this](docs/tutorial_dx_web.md) for details.
135152

136153

137154
## How to organize outputs

atac.wdl

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -7,10 +7,10 @@ struct RuntimeEnvironment {
77
}
88

99
workflow atac {
10-
String pipeline_ver = 'v2.0.1'
10+
String pipeline_ver = 'v2.0.2'
1111

1212
meta {
13-
version: 'v2.0.1'
13+
version: 'v2.0.2'
1414

1515
author: 'Jin wook Lee'
1616
email: 'leepc12@gmail.com'
@@ -19,8 +19,8 @@ workflow atac {
1919

2020
specification_document: 'https://docs.google.com/document/d/1f0Cm4vRyDQDu0bMehHD7P7KOMxTOP-HiNoIvL1VcBt8/edit?usp=sharing'
2121

22-
default_docker: 'encodedcc/atac-seq-pipeline:v2.0.1'
23-
default_singularity: 'library://leepc12/default/atac-seq-pipeline:v2.0.1'
22+
default_docker: 'encodedcc/atac-seq-pipeline:v2.0.2'
23+
default_singularity: 'library://leepc12/default/atac-seq-pipeline:v2.0.2'
2424
default_conda: 'encode-atac-seq-pipeline'
2525
croo_out_def: 'https://storage.googleapis.com/encode-pipeline-output-definition/atac.croo.v5.json'
2626

@@ -72,8 +72,8 @@ workflow atac {
7272
}
7373
input {
7474
# group: runtime_environment
75-
String docker = 'encodedcc/atac-seq-pipeline:v2.0.1'
76-
String singularity = 'library://leepc12/default/atac-seq-pipeline:v2.0.1'
75+
String docker = 'encodedcc/atac-seq-pipeline:v2.0.2'
76+
String singularity = 'library://leepc12/default/atac-seq-pipeline:v2.0.2'
7777
String conda = 'encode-atac-seq-pipeline'
7878
String conda_macs2 = 'encode-atac-seq-pipeline-macs2'
7979
String conda_spp = 'encode-atac-seq-pipeline-spp'
@@ -1108,7 +1108,7 @@ workflow atac {
11081108
else select_first([paired_end])
11091109

11101110
Boolean has_input_of_align = i<length(fastqs_R1) && length(fastqs_R1[i])>0
1111-
Boolean has_output_of_align = i<length(bams) && defined(bams[i])
1111+
Boolean has_output_of_align = i<length(bams)
11121112
if ( has_input_of_align && !has_output_of_align ) {
11131113
call align { input :
11141114
fastqs_R1 = fastqs_R1[i],
@@ -1172,7 +1172,7 @@ workflow atac {
11721172
}
11731173
11741174
Boolean has_input_of_filter = has_output_of_align || defined(align.bam)
1175-
Boolean has_output_of_filter = i<length(nodup_bams) && defined(nodup_bams[i])
1175+
Boolean has_output_of_filter = i<length(nodup_bams)
11761176
# skip if we already have output of this step
11771177
if ( has_input_of_filter && !has_output_of_filter ) {
11781178
call filter { input :
@@ -1197,7 +1197,7 @@ workflow atac {
11971197
File? nodup_bam_ = if has_output_of_filter then nodup_bams[i] else filter.nodup_bam
11981198
11991199
Boolean has_input_of_bam2ta = has_output_of_filter || defined(filter.nodup_bam)
1200-
Boolean has_output_of_bam2ta = i<length(tas) && defined(tas[i])
1200+
Boolean has_output_of_bam2ta = i<length(tas)
12011201
if ( has_input_of_bam2ta && !has_output_of_bam2ta ) {
12021202
call bam2ta { input :
12031203
bam = nodup_bam_,
@@ -1392,10 +1392,10 @@ workflow atac {
13921392
}
13931393
# tasks factored out from ATAqC
13941394
Boolean has_input_of_tss_enrich = defined(nodup_bam_) && defined(tss_) && (
1395-
defined(align.read_len) || i<length(read_len) && defined(read_len[i]) )
1395+
defined(align.read_len) || i<length(read_len) )
13961396
if ( enable_tss_enrich && has_input_of_tss_enrich ) {
13971397
call tss_enrich { input :
1398-
read_len = if i<length(read_len) && defined(read_len[i]) then read_len[i]
1398+
read_len = if i<length(read_len) then read_len[i]
13991399
else align.read_len,
14001400
nodup_bam = nodup_bam_,
14011401
tss = tss_,

0 commit comments

Comments
 (0)