Skip to content

Commit 2d51d34

Browse files
authored
Merge pull request #214 from ENCODE-DCC/dev
v1.7.1
2 parents 72360b4 + c80c672 commit 2d51d34

File tree

6 files changed

+58
-29
lines changed

6 files changed

+58
-29
lines changed

README.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,11 @@
55

66
## Important notice for Conda users
77

8+
If it takes too long to resolve Conda package conflicts while installing pipeline's Conda environment, then try with `mamba` instead. Add `mamba` to the install command line.
9+
```bash
10+
$ scripts/install_conda_env.sh mamba
11+
```
12+
813
For every new pipeline release, Conda users always need to update pipeline's Conda environment (`encode-chip-seq-pipeline`) even though they don't use new added features.
914
```bash
1015
$ cd chip-seq-pipeline2
@@ -83,6 +88,11 @@ An input JSON file specifies all the input parameters and files that are necessa
8388
1) [Input JSON file specification (short)](docs/input_short.md)
8489
2) [Input JSON file specification (long)](docs/input.md)
8590

91+
## Running and sharing on Truwl
92+
You can run this pipeline on [truwl.com](https://truwl.com/). This provides a web interface that allows you to define inputs and parameters, run the job on GCP, and monitor progress. To run it you will need to create an account on the platform then request early access by emailing [info@truwl.com](mailto:info@truwl.com) to get the right permissions. You can see the example cases from this repo at [https://truwl.com/workflows/instance/WF_dd6938.8f.340f/command](https://truwl.com/workflows/instance/WF_dd6938.8f.340f/command) and [https://truwl.com/workflows/instance/WF_dd6938.8f.8aa3/command](https://truwl.com/workflows/instance/WF_dd6938.8f.8aa3/command). The example jobs (or other jobs) can be forked to pre-populate the inputs for your own job.
93+
94+
If you do not run the pipeline on Truwl, you can still share your use-case/job on the platform by getting in touch at [info@truwl.com](mailto:info@truwl.com) and providing your inputs.json file.
95+
8696
## Running a pipeline on DNAnexus
8797

8898
You can also run this pipeline on DNAnexus without using Caper or Cromwell. There are two ways to build a workflow on DNAnexus based on our WDL.

chip.wdl

Lines changed: 14 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,16 @@
11
version 1.0
22

33
workflow chip {
4-
String pipeline_ver = 'v1.7.0'
4+
String pipeline_ver = 'v1.7.1'
55

66
meta {
7+
version: 'v1.7.1'
78
author: 'Jin wook Lee (leepc12@gmail.com) at ENCODE-DCC'
89
description: 'ENCODE TF/Histone ChIP-Seq pipeline'
910
specification_document: 'https://docs.google.com/document/d/1lG_Rd7fnYgRpSIqrIfuVlAz2dW1VaSQThzk836Db99c/edit?usp=sharing'
1011

11-
caper_docker: 'encodedcc/chip-seq-pipeline:v1.7.0'
12-
caper_singularity: 'docker://encodedcc/chip-seq-pipeline:v1.7.0'
12+
caper_docker: 'encodedcc/chip-seq-pipeline:v1.7.1'
13+
caper_singularity: 'docker://encodedcc/chip-seq-pipeline:v1.7.1'
1314
croo_out_def: 'https://storage.googleapis.com/encode-pipeline-output-definition/chip.croo.v5.json'
1415

1516
parameter_group: {
@@ -183,15 +184,15 @@ workflow chip {
183184
Int filter_cpu = 4
184185
Float filter_mem_factor = 0.4
185186
Int filter_time_hr = 24
186-
Float filter_disk_factor = 6.0
187+
Float filter_disk_factor = 8.0
187188

188189
Int bam2ta_cpu = 2
189190
Float bam2ta_mem_factor = 0.35
190191
Int bam2ta_time_hr = 6
191192
Float bam2ta_disk_factor = 4.0
192193

193-
Float spr_mem_factor = 4.5
194-
Float spr_disk_factor = 6.0
194+
Float spr_mem_factor = 13.5
195+
Float spr_disk_factor = 18.0
195196

196197
Int jsd_cpu = 4
197198
Float jsd_mem_factor = 0.1
@@ -203,19 +204,19 @@ workflow chip {
203204
Int xcor_time_hr = 24
204205
Float xcor_disk_factor = 4.5
205206

206-
Float subsample_ctl_mem_factor = 7.0
207-
Float subsample_ctl_disk_factor = 7.5
207+
Float subsample_ctl_mem_factor = 14.0
208+
Float subsample_ctl_disk_factor = 15.0
208209

209-
Float macs2_signal_track_mem_factor = 6.0
210+
Float macs2_signal_track_mem_factor = 12.0
210211
Int macs2_signal_track_time_hr = 24
211-
Float macs2_signal_track_disk_factor = 40.0
212+
Float macs2_signal_track_disk_factor = 80.0
212213

213214
Int call_peak_cpu = 6
214215
Float call_peak_spp_mem_factor = 5.0
215-
Float call_peak_macs2_mem_factor = 2.5
216+
Float call_peak_macs2_mem_factor = 5.0
216217
Int call_peak_time_hr = 72
217218
Float call_peak_spp_disk_factor = 5.0
218-
Float call_peak_macs2_disk_factor = 15.0
219+
Float call_peak_macs2_disk_factor = 30.0
219220

220221
String? align_trimmomatic_java_heap
221222
String? filter_picard_java_heap
@@ -2776,7 +2777,7 @@ task gc_bias {
27762777
cpu : 1
27772778
memory : '${mem_gb} GB'
27782779
time : 6
2779-
disks : 'local-disk 100 SSD'
2780+
disks : 'local-disk 150 SSD'
27802781
}
27812782
}
27822783

dev/docker_image/Dockerfile

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -90,8 +90,9 @@ RUN pip3 install --no-cache-dir SAMstats==0.2.1
9090
RUN git clone --branch 2.0.4.2 --single-branch https://github.com/kundajelab/idr && \
9191
cd idr && python3 setup.py install && cd ../ && rm -rf idr*
9292

93-
# Install system/math python packages (python2)
94-
RUN pip2 install --no-cache-dir numpy matplotlib==2.2.4
93+
# Install system/math python packages and biopython
94+
RUN pip2 install --no-cache-dir numpy scipy matplotlib==2.2.4 bx-python==0.8.2 biopython==1.76
95+
RUN pip3 install --no-cache-dir biopython==1.76
9596

9697
# Install genomic python packages (python2)
9798
RUN pip2 install --no-cache-dir metaseq==0.5.6

docs/input.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -256,7 +256,7 @@ Parameter|Default|Description
256256
`chip.filter_cpu` | 4 |
257257
`chip.filter_mem_factor` | 0.4 | Multiplied to size of BAM to determine required memory
258258
`chip.filter_time_hr` | 24 | Walltime (HPCs only)
259-
`chip.filter_disk_factor` | 6.0 | Multiplied to size of BAM to determine required disk
259+
`chip.filter_disk_factor` | 8.0 | Multiplied to size of BAM to determine required disk
260260
261261
Parameter|Default|Description
262262
---------|-------|-----------
@@ -267,8 +267,8 @@ Parameter|Default|Description
267267
268268
Parameter|Default|Description
269269
---------|-------|-----------
270-
`chip.spr_mem_factor` | 4.5 | Multiplied to size of filtered BAM to determine required memory
271-
`chip.spr_disk_factor` | 6.0 | Multiplied to size of filtered BAM to determine required disk
270+
`chip.spr_mem_factor` | 13.5 | Multiplied to size of filtered BAM to determine required memory
271+
`chip.spr_disk_factor` | 18.0 | Multiplied to size of filtered BAM to determine required disk
272272
273273
Parameter|Default|Description
274274
---------|-------|-----------
@@ -288,22 +288,22 @@ Parameter|Default|Description
288288
---------|-------|-----------
289289
`chip.call_peak_cpu` | 6 | Used for both peak callers (`spp` and `macs2`). `spp` is well multithreaded but `macs2` is single-threaded. More than 2 is not required for `macs2`.
290290
`chip.call_peak_spp_mem_factor` | 5.0 | Multiplied to size of TAG-ALIGN BED to determine required memory
291-
`chip.call_peak_macs2_mem_factor` | 2.5 | Multiplied to size of TAG-ALIGN BED to determine required memory
291+
`chip.call_peak_macs2_mem_factor` | 5.0 | Multiplied to size of TAG-ALIGN BED to determine required memory
292292
`chip.call_peak_time_hr` | 24 | Walltime (HPCs only)
293293
`chip.call_peak_spp_disk_factor` | 5.0 | Multiplied to size of TAG-ALIGN BED to determine required disk
294-
`chip.call_peak_macs2_disk_factor` | 15.0 | Multiplied to size of TAG-ALIGN BED to determine required disk
294+
`chip.call_peak_macs2_disk_factor` | 30.0 | Multiplied to size of TAG-ALIGN BED to determine required disk
295295
296296
Parameter|Default|Description
297297
---------|-------|-----------
298-
`chip.macs2_signal_track_mem_factor` | 6.0 | Multiplied to size of TAG-ALIGN BED to determine required memory
298+
`chip.macs2_signal_track_mem_factor` | 12.0 | Multiplied to size of TAG-ALIGN BED to determine required memory
299299
`chip.macs2_signal_track_time_hr` | 24 | Walltime (HPCs only)
300-
`chip.macs2_signal_track_disk_factor` | 40.0 | Multiplied to size of TAG-ALIGN BED to determine required disk
300+
`chip.macs2_signal_track_disk_factor` | 80.0 | Multiplied to size of TAG-ALIGN BED to determine required disk
301301
302302
Parameter|Default|Description
303303
---------|-------|-----------
304-
`chip.subsample_ctl_mem_factor` | 7.0 | Multiplied to size of TAG-ALIGN BED to determine required memory
304+
`chip.subsample_ctl_mem_factor` | 14.0 | Multiplied to size of TAG-ALIGN BED to determine required memory
305305
`chip.macs2_signal_track_time_hr` | 24 | Walltime (HPCs only)
306-
`chip.subsample_ctl_disk_factor` | 7.5 | Multiplied to size of TAG-ALIGN BED to determine required disk
306+
`chip.subsample_ctl_disk_factor` | 15.0 | Multiplied to size of TAG-ALIGN BED to determine required disk
307307
308308
If your system/cluster does not allow large memory allocation for Java applications, check the following resource parameters to manually define Java memory. It is **NOT RECOMMENDED** for most users to change these parameters since pipeline automatically takes 90% of task's memory for Java apps.
309309

docs/install_conda.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -32,11 +32,11 @@ If you do not have miniconda (or anaconda) installed, follow the instructions be
3232

3333
4) **IMPORTANT**: Close your session and re-login.
3434

35-
5) Install pipeline's Conda environment.
35+
5) Install pipeline's Conda environment. Add `mamba` to the install command line to resolve conflicts much faster.
3636

3737
```bash
3838
$ bash scripts/uninstall_conda_env.sh # uninstall it for clean-install
39-
$ bash scripts/install_conda_env.sh
39+
$ bash scripts/install_conda_env.sh mamba # remove mamba if it does not work
4040
```
4141

4242
> **WARNING**: DO NOT PROCEED TO RUN PIPELINES UNTIL YOU SEE THE FOLLOWING SUCCESS MESSAGE OR PIPELINE WILL NOT WORK.

scripts/install_conda_env.sh

Lines changed: 20 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,11 +9,28 @@ REQ_TXT_PY3=${SH_SCRIPT_DIR}/requirements.txt
99
REQ_TXT_PY2=${SH_SCRIPT_DIR}/requirements_py2.txt
1010
SRC_DIR=${SH_SCRIPT_DIR}/../src
1111

12-
conda --version # check if conda exists
12+
echo "=== Checking conda version ==="
13+
conda --version
1314

1415
echo "=== Installing pipeline's Conda environments ==="
15-
conda create -n ${CONDA_ENV_PY3} --file ${REQ_TXT_PY3} -y -c defaults -c r -c bioconda -c conda-forge
16-
conda create -n ${CONDA_ENV_PY2} --file ${REQ_TXT_PY2} -y -c defaults -c r -c bioconda -c conda-forge
16+
17+
if [[ "$1" == mamba ]]; then
18+
conda install mamba -y -c conda-forge
19+
mamba create -n ${CONDA_ENV_PY3} --file ${REQ_TXT_PY3} -y -c defaults -c r -c bioconda -c conda-forge
20+
mamba create -n ${CONDA_ENV_PY2} --file ${REQ_TXT_PY2} -y -c defaults -c r -c bioconda -c conda-forge
21+
else
22+
echo
23+
echo "If it takes too long to resolve conflicts, then try with mamba."
24+
echo
25+
echo "Usage: ./install_conda_env.sh mamba"
26+
echo
27+
echo "mamba will resolve conflicts much faster then the original conda."
28+
echo "If you get another conflict in the mamba installation step itself "
29+
echo "Then you may need to clean-install miniconda3 and re-login."
30+
echo
31+
conda create -n ${CONDA_ENV_PY3} --file ${REQ_TXT_PY3} -y -c defaults -c r -c bioconda -c conda-forge
32+
conda create -n ${CONDA_ENV_PY2} --file ${REQ_TXT_PY2} -y -c defaults -c r -c bioconda -c conda-forge
33+
fi
1734

1835
echo "=== Configuring for pipeline's Conda environments ==="
1936
CONDA_PREFIX_PY3=$(conda env list | grep -E "\b${CONDA_ENV_PY3}[[:space:]]" | awk '{if (NF==3) print $3; else print $2}')

0 commit comments

Comments
 (0)