You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
## Conda environment name change (since v2.2.0 or 6/13/2022)
7
-
8
-
Pipeline's Conda environment's names have been shortened to work around the following error:
9
-
```
10
-
PaddingError: Placeholder of length '80' too short in package /XXXXXXXXXXX/miniconda3/envs/
11
-
```
12
-
13
-
You need to reinstall pipeline's Conda environment. It's recommended to do this for every version update.
14
-
```bash
15
-
$ bash scripts/uninstall_conda_env.sh
16
-
$ bash scripts/install_conda_env.sh
17
-
```
18
-
19
-
20
6
## Introduction
21
7
22
8
This ChIP-Seq pipeline is based off the ENCODE (phase-3) transcription factor and histone ChIP-seq pipeline specifications (by Anshul Kundaje) in [this google doc](https://docs.google.com/document/d/1lG_Rd7fnYgRpSIqrIfuVlAz2dW1VaSQThzk836Db99c/edit#).
@@ -29,20 +15,17 @@ This ChIP-Seq pipeline is based off the ENCODE (phase-3) transcription factor an
29
15
30
16
## Installation
31
17
32
-
1)Make sure that you have Python>=3.6. Caper does not work with Python2. Install Caper and check its version >=2.0.
18
+
1) Install Caper (Python Wrapper/CLI for [Cromwell](https://github.com/broadinstitute/cromwell)).
33
19
```bash
34
20
$ pip install caper
35
-
36
-
# use caper version >= 2.3.0 for a new HPC feature (caper hpc submit/list/abort).
37
-
$ caper -v
38
21
```
39
-
2) Read Caper's [README](https://github.com/ENCODE-DCC/caper/blob/master/README.md) carefully to choose a backend for your system. Follow the instruction in the configuration file.
22
+
23
+
2)**IMPORTANT**: Read Caper's [README](https://github.com/ENCODE-DCC/caper/blob/master/README.md) carefully to choose a backend for your system. Follow the instruction in the configuration file.
40
24
```bash
41
-
# this will overwrite the existing conf file ~/.caper/default.conf
42
-
# make a backup of it first if needed
25
+
# backend: local or your HPC type (e.g. slurm, sge, pbs, lsf). read Caper's README carefully.
43
26
$ caper init [YOUR_BACKEND]
44
27
45
-
# edit the conf file
28
+
# IMPORTANT: edit the conf file and follow commented instructions in there
46
29
$ vi ~/.caper/default.conf
47
30
```
48
31
@@ -52,61 +35,83 @@ This ChIP-Seq pipeline is based off the ENCODE (phase-3) transcription factor an
4)(Optional for Conda) **DO NOT USE A SHARED CONDA. INSTALL YOUR OWN [MINICONDA3](https://docs.conda.io/en/latest/miniconda.html) AND USE IT.** Install pipeline's Conda environments if you don't have Singularity or Docker installed on your system. We recommend to use Singularity instead of Conda.
38
+
4)Define test input JSON.
56
39
```bash
57
-
# check if you have Singularity on your system, if so then it's not recommended to use Conda
58
-
$ singularity --version
59
-
60
-
# check if you are not using a shared conda, if so then delete it or remove it from your PATH
5) If you have Docker and want to run pipelines locally on your laptop. `--max-concurrent-tasks 1` is to limit number of concurrent tasks to test-run the pipeline on a laptop. Uncomment it if run it on a workstation/HPC.
44
+
```bash
45
+
# check if Docker works on your machine
46
+
$ docker run ubuntu:latest echo hello
68
47
69
-
# install new envs, you need to run this for every pipeline version update.
70
-
# it may be killed if you run this command line on a login node.
71
-
# it's recommended to make an interactive node and run it there.
72
-
$ bash scripts/install_conda_env.sh
48
+
# --max-concurrent-tasks 1 is for computers with limited resources
49
+
$ caper run chip.wdl -i "${INPUT_JSON}" --docker --max-concurrent-tasks 1
73
50
```
74
51
75
-
## Input JSON file
52
+
6) Otherwise, install Singularity on your system. Please follow [this instruction](https://neuro.debian.net/install_pkg.html?p=singularity-container) to install Singularity on a Debian-based OS. Or ask your system administrator to install Singularity on your HPC.
> **IMPORTANT**: DO NOT BLINDLY USE A TEMPLATE/EXAMPLE INPUT JSON. READ THROUGH THE FOLLOWING GUIDE TO MAKE A CORRECT INPUT JSON FILE.
57
+
# on your local machine (--max-concurrent-tasks 1 is for computers with limited resources)
58
+
$ caper run chip.wdl -i "${INPUT_JSON}" --singularity --max-concurrent-tasks 1
78
59
79
-
An input JSON file specifies all the input parameters and files that are necessary for successfully running this pipeline. This includes a specification of the path to the genome reference files and the raw data fastq file. Please make sure to specify absolute paths rather than relative paths in your input JSON files.
60
+
# on HPC, make sure that Caper's conf ~/.caper/default.conf is correctly configured to work with your HPC
61
+
# the following command will submit Caper as a leader job to SLURM with Singularity
# cancel the leader node to close all of its children jobs
68
+
# If you directly use cluster command like scancel or qdel then
69
+
# child jobs will not be terminated
70
+
$ caper hpc abort [JOB_ID]
71
+
```
84
72
85
-
## Running on local computer/HPCs
73
+
7) (Optional Conda method) **WE DO NOT HELP USERS FIX CONDA DEPENDENCY ISSUES. IF CONDA METHOD FAILS THEN PLEASE USE SINGULARITY METHOD INSTEAD**. **DO NOT USE A SHARED CONDA. INSTALL YOUR OWN [MINICONDA3](https://docs.conda.io/en/latest/miniconda.html) AND USE IT.**
74
+
```bash
75
+
# check if you are not using a shared conda, if so then delete it or remove it from your PATH
76
+
$ which conda
86
77
87
-
You can use URIs(`s3://`, `gs://` and `http(s)://`) in Caper's command lines and input JSON file then Caper will automatically download/localize such files. Input JSON file example: https://storage.googleapis.com/encode-pipeline-test-samples/encode-chip-seq-pipeline/ENCSR000DYI_subsampled_chr19_only.json
78
+
# uninstall pipeline's old environments
79
+
$ bash scripts/uninstall_conda_env.sh
88
80
89
-
According to your chosen platform of Caper, run Caper or submit Caper command line to the cluster. You can choose other environments like `--singularity` or `--docker` instead of `--conda`. But you must define one of the environments.
81
+
# install new envs, you need to run this for every pipeline version update.
82
+
# it may be killed if you run this command line on a login node on HPC.
83
+
# it's recommended to make an interactive node with enough resources and run it there.
84
+
$ bash scripts/install_conda_env.sh
90
85
91
-
PLEASE READ [CAPER'S README](https://github.com/ENCODE-DCC/caper) VERY CAREFULLY BEFORE RUNNING ANY PIPELINES. YOU WILL NEED TO CORRECTLY CONFIGURE CAPER FIRST. These are just example command lines.
86
+
# if installation fails please use Singularity method instead.
92
87
93
-
```bash
94
-
# Run it locally with Conda (DO NOT ACTIVATE PIPELINE'S CONDA ENVIRONEMT)
95
-
$ caper run chip.wdl -i https://storage.googleapis.com/encode-pipeline-test-samples/encode-chip-seq-pipeline/ENCSR000DYI_subsampled_chr19_only.json --conda
88
+
# on your local machine (--max-concurrent-tasks 1 is for computers with limited resources)
89
+
$ caper run chip.wdl -i "${INPUT_JSON}" --conda --max-concurrent-tasks 1
96
90
97
-
# On HPC, submit it as a leader job to SLURM with Singularity
# Cancel the leader node to close all of its children jobs
98
+
# cancel the leader node to close all of its children jobs
104
99
# If you directly use cluster command like scancel or qdel then
105
100
# child jobs will not be terminated
106
101
$ caper hpc abort [JOB_ID]
107
102
```
108
103
109
104
105
+
## Input JSON file
106
+
107
+
> **IMPORTANT**: DO NOT BLINDLY USE A TEMPLATE/EXAMPLE INPUT JSON. READ THROUGH THE FOLLOWING GUIDE TO MAKE A CORRECT INPUT JSON FILE.
108
+
109
+
An input JSON file specifies all the input parameters and files that are necessary for successfully running this pipeline. This includes a specification of the path to the genome reference files and the raw data fastq file. Please make sure to specify absolute paths rather than relative paths in your input JSON files.
Visit our pipeline repo on [Dockstore](https://dockstore.org/workflows/github.com/ENCODE-DCC/chip-seq-pipeline2). Click on `Terra` or `Anvil`. Follow Terra's instruction to create a workspace on Terra and add Terra's billing bot to your Google Cloud account.
0 commit comments