Skip to content

Commit 56cd2cb

Browse files
authored
Merge pull request #286 from ENCODE-DCC/hotfix_conda_support
Hotfix conda support
2 parents 8ef5043 + 7b1a71a commit 56cd2cb

File tree

7 files changed

+131
-94
lines changed

7 files changed

+131
-94
lines changed

.circleci/config.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,12 +2,12 @@ version: 2.1
22

33
defaults: &defaults
44
docker:
5-
- image: google/cloud-sdk:latest
5+
- image: cimg/base@sha256:d75b94c6eae6e660b6db36761709626b93cabe8c8da5b955bfbf7832257e4201
66
working_directory: ~/chip-seq-pipeline2
77

88
machine_defaults: &machine_defaults
99
machine:
10-
image: ubuntu-2004:202010-01
10+
image: ubuntu-2004:202201-02
1111
working_directory: ~/chip-seq-pipeline2
1212

1313
make_tag: &make_tag

README.md

Lines changed: 58 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -3,20 +3,6 @@
33
[![CircleCI](https://circleci.com/gh/ENCODE-DCC/chip-seq-pipeline2/tree/master.svg?style=svg)](https://circleci.com/gh/ENCODE-DCC/chip-seq-pipeline2/tree/master)
44

55

6-
## Conda environment name change (since v2.2.0 or 6/13/2022)
7-
8-
Pipeline's Conda environment's names have been shortened to work around the following error:
9-
```
10-
PaddingError: Placeholder of length '80' too short in package /XXXXXXXXXXX/miniconda3/envs/
11-
```
12-
13-
You need to reinstall pipeline's Conda environment. It's recommended to do this for every version update.
14-
```bash
15-
$ bash scripts/uninstall_conda_env.sh
16-
$ bash scripts/install_conda_env.sh
17-
```
18-
19-
206
## Introduction
217

228
This ChIP-Seq pipeline is based off the ENCODE (phase-3) transcription factor and histone ChIP-seq pipeline specifications (by Anshul Kundaje) in [this google doc](https://docs.google.com/document/d/1lG_Rd7fnYgRpSIqrIfuVlAz2dW1VaSQThzk836Db99c/edit#).
@@ -29,20 +15,17 @@ This ChIP-Seq pipeline is based off the ENCODE (phase-3) transcription factor an
2915

3016
## Installation
3117

32-
1) Make sure that you have Python>=3.6. Caper does not work with Python2. Install Caper and check its version >=2.0.
18+
1) Install Caper (Python Wrapper/CLI for [Cromwell](https://github.com/broadinstitute/cromwell)).
3319
```bash
3420
$ pip install caper
35-
36-
# use caper version >= 2.3.0 for a new HPC feature (caper hpc submit/list/abort).
37-
$ caper -v
3821
```
39-
2) Read Caper's [README](https://github.com/ENCODE-DCC/caper/blob/master/README.md) carefully to choose a backend for your system. Follow the instruction in the configuration file.
22+
23+
2) **IMPORTANT**: Read Caper's [README](https://github.com/ENCODE-DCC/caper/blob/master/README.md) carefully to choose a backend for your system. Follow the instruction in the configuration file.
4024
```bash
41-
# this will overwrite the existing conf file ~/.caper/default.conf
42-
# make a backup of it first if needed
25+
# backend: local or your HPC type (e.g. slurm, sge, pbs, lsf). read Caper's README carefully.
4326
$ caper init [YOUR_BACKEND]
4427

45-
# edit the conf file
28+
# IMPORTANT: edit the conf file and follow commented instructions in there
4629
$ vi ~/.caper/default.conf
4730
```
4831

@@ -52,61 +35,83 @@ This ChIP-Seq pipeline is based off the ENCODE (phase-3) transcription factor an
5235
$ git clone https://github.com/ENCODE-DCC/chip-seq-pipeline2
5336
```
5437

55-
4) (Optional for Conda) **DO NOT USE A SHARED CONDA. INSTALL YOUR OWN [MINICONDA3](https://docs.conda.io/en/latest/miniconda.html) AND USE IT.** Install pipeline's Conda environments if you don't have Singularity or Docker installed on your system. We recommend to use Singularity instead of Conda.
38+
4) Define test input JSON.
5639
```bash
57-
# check if you have Singularity on your system, if so then it's not recommended to use Conda
58-
$ singularity --version
59-
60-
# check if you are not using a shared conda, if so then delete it or remove it from your PATH
61-
$ which conda
62-
63-
# change directory to pipeline's git repo
64-
$ cd chip-seq-pipeline2
40+
INPUT_JSON="https://storage.googleapis.com/encode-pipeline-test-samples/encode-chip-seq-pipeline/ENCSR000DYI_subsampled_chr19_only.json"
41+
```
6542

66-
# uninstall old environments
67-
$ bash scripts/uninstall_conda_env.sh
43+
5) If you have Docker and want to run pipelines locally on your laptop. `--max-concurrent-tasks 1` is to limit number of concurrent tasks to test-run the pipeline on a laptop. Uncomment it if run it on a workstation/HPC.
44+
```bash
45+
# check if Docker works on your machine
46+
$ docker run ubuntu:latest echo hello
6847

69-
# install new envs, you need to run this for every pipeline version update.
70-
# it may be killed if you run this command line on a login node.
71-
# it's recommended to make an interactive node and run it there.
72-
$ bash scripts/install_conda_env.sh
48+
# --max-concurrent-tasks 1 is for computers with limited resources
49+
$ caper run chip.wdl -i "${INPUT_JSON}" --docker --max-concurrent-tasks 1
7350
```
7451

75-
## Input JSON file
52+
6) Otherwise, install Singularity on your system. Please follow [this instruction](https://neuro.debian.net/install_pkg.html?p=singularity-container) to install Singularity on a Debian-based OS. Or ask your system administrator to install Singularity on your HPC.
53+
```bash
54+
# check if Singularity works on your machine
55+
$ singularity exec docker://ubuntu:latest echo hello
7656

77-
> **IMPORTANT**: DO NOT BLINDLY USE A TEMPLATE/EXAMPLE INPUT JSON. READ THROUGH THE FOLLOWING GUIDE TO MAKE A CORRECT INPUT JSON FILE.
57+
# on your local machine (--max-concurrent-tasks 1 is for computers with limited resources)
58+
$ caper run chip.wdl -i "${INPUT_JSON}" --singularity --max-concurrent-tasks 1
7859

79-
An input JSON file specifies all the input parameters and files that are necessary for successfully running this pipeline. This includes a specification of the path to the genome reference files and the raw data fastq file. Please make sure to specify absolute paths rather than relative paths in your input JSON files.
60+
# on HPC, make sure that Caper's conf ~/.caper/default.conf is correctly configured to work with your HPC
61+
# the following command will submit Caper as a leader job to SLURM with Singularity
62+
$ caper hpc submit chip.wdl -i "${INPUT_JSON}" --singularity --leader-job-name ANY_GOOD_LEADER_JOB_NAME
8063

81-
1) [Input JSON file specification (short)](docs/input_short.md)
82-
2) [Input JSON file specification (long)](docs/input.md)
64+
# check job ID and status of your leader jobs
65+
$ caper hpc list
8366

67+
# cancel the leader node to close all of its children jobs
68+
# If you directly use cluster command like scancel or qdel then
69+
# child jobs will not be terminated
70+
$ caper hpc abort [JOB_ID]
71+
```
8472

85-
## Running on local computer/HPCs
73+
7) (Optional Conda method) **WE DO NOT HELP USERS FIX CONDA DEPENDENCY ISSUES. IF CONDA METHOD FAILS THEN PLEASE USE SINGULARITY METHOD INSTEAD**. **DO NOT USE A SHARED CONDA. INSTALL YOUR OWN [MINICONDA3](https://docs.conda.io/en/latest/miniconda.html) AND USE IT.**
74+
```bash
75+
# check if you are not using a shared conda, if so then delete it or remove it from your PATH
76+
$ which conda
8677

87-
You can use URIs(`s3://`, `gs://` and `http(s)://`) in Caper's command lines and input JSON file then Caper will automatically download/localize such files. Input JSON file example: https://storage.googleapis.com/encode-pipeline-test-samples/encode-chip-seq-pipeline/ENCSR000DYI_subsampled_chr19_only.json
78+
# uninstall pipeline's old environments
79+
$ bash scripts/uninstall_conda_env.sh
8880

89-
According to your chosen platform of Caper, run Caper or submit Caper command line to the cluster. You can choose other environments like `--singularity` or `--docker` instead of `--conda`. But you must define one of the environments.
81+
# install new envs, you need to run this for every pipeline version update.
82+
# it may be killed if you run this command line on a login node on HPC.
83+
# it's recommended to make an interactive node with enough resources and run it there.
84+
$ bash scripts/install_conda_env.sh
9085

91-
PLEASE READ [CAPER'S README](https://github.com/ENCODE-DCC/caper) VERY CAREFULLY BEFORE RUNNING ANY PIPELINES. YOU WILL NEED TO CORRECTLY CONFIGURE CAPER FIRST. These are just example command lines.
86+
# if installation fails please use Singularity method instead.
9287

93-
```bash
94-
# Run it locally with Conda (DO NOT ACTIVATE PIPELINE'S CONDA ENVIRONEMT)
95-
$ caper run chip.wdl -i https://storage.googleapis.com/encode-pipeline-test-samples/encode-chip-seq-pipeline/ENCSR000DYI_subsampled_chr19_only.json --conda
88+
# on your local machine (--max-concurrent-tasks 1 is for computers with limited resources)
89+
$ caper run chip.wdl -i "${INPUT_JSON}" --conda --max-concurrent-tasks 1
9690

97-
# On HPC, submit it as a leader job to SLURM with Singularity
98-
$ caper hpc submit chip.wdl -i https://storage.googleapis.com/encode-pipeline-test-samples/encode-chip-seq-pipeline/ENCSR000DYI_subsampled_chr19_only.json --singularity --leader-job-name ANY_GOOD_LEADER_JOB_NAME
91+
# on HPC, make sure that Caper's conf ~/.caper/default.conf is correctly configured to work with your HPC
92+
# the following command will submit Caper as a leader job to SLURM with Conda
93+
$ caper hpc submit chip.wdl -i "${INPUT_JSON}" --conda --leader-job-name ANY_GOOD_LEADER_JOB_NAME
9994

100-
# Check job ID and status of your leader jobs
95+
# check job ID and status of your leader jobs
10196
$ caper hpc list
10297

103-
# Cancel the leader node to close all of its children jobs
98+
# cancel the leader node to close all of its children jobs
10499
# If you directly use cluster command like scancel or qdel then
105100
# child jobs will not be terminated
106101
$ caper hpc abort [JOB_ID]
107102
```
108103

109104

105+
## Input JSON file
106+
107+
> **IMPORTANT**: DO NOT BLINDLY USE A TEMPLATE/EXAMPLE INPUT JSON. READ THROUGH THE FOLLOWING GUIDE TO MAKE A CORRECT INPUT JSON FILE.
108+
109+
An input JSON file specifies all the input parameters and files that are necessary for successfully running this pipeline. This includes a specification of the path to the genome reference files and the raw data fastq file. Please make sure to specify absolute paths rather than relative paths in your input JSON files.
110+
111+
1) [Input JSON file specification (short)](docs/input_short.md)
112+
2) [Input JSON file specification (long)](docs/input.md)
113+
114+
110115
## Running on Terra/Anvil (using Dockstore)
111116

112117
Visit our pipeline repo on [Dockstore](https://dockstore.org/workflows/github.com/ENCODE-DCC/chip-seq-pipeline2). Click on `Terra` or `Anvil`. Follow Terra's instruction to create a workspace on Terra and add Terra's billing bot to your Google Cloud account.

chip.wdl

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -7,10 +7,10 @@ struct RuntimeEnvironment {
77
}
88

99
workflow chip {
10-
String pipeline_ver = 'v2.2.0'
10+
String pipeline_ver = 'v2.2.1'
1111

1212
meta {
13-
version: 'v2.2.0'
13+
version: 'v2.2.1'
1414

1515
author: 'Jin wook Lee'
1616
@@ -19,8 +19,8 @@ workflow chip {
1919

2020
specification_document: 'https://docs.google.com/document/d/1lG_Rd7fnYgRpSIqrIfuVlAz2dW1VaSQThzk836Db99c/edit?usp=sharing'
2121

22-
default_docker: 'encodedcc/chip-seq-pipeline:v2.2.0'
23-
default_singularity: 'https://encode-pipeline-singularity-image.s3.us-west-2.amazonaws.com/chip-seq-pipeline_v2.2.0.sif'
22+
default_docker: 'encodedcc/chip-seq-pipeline:v2.2.1'
23+
default_singularity: 'https://encode-pipeline-singularity-image.s3.us-west-2.amazonaws.com/chip-seq-pipeline_v2.2.1.sif'
2424
croo_out_def: 'https://storage.googleapis.com/encode-pipeline-output-definition/chip.croo.v5.json'
2525

2626
parameter_group: {
@@ -71,8 +71,8 @@ workflow chip {
7171
}
7272
input {
7373
# group: runtime_environment
74-
String docker = 'encodedcc/chip-seq-pipeline:v2.2.0'
75-
String singularity = 'https://encode-pipeline-singularity-image.s3.us-west-2.amazonaws.com/chip-seq-pipeline_v2.2.0.sif'
74+
String docker = 'encodedcc/chip-seq-pipeline:v2.2.1'
75+
String singularity = 'https://encode-pipeline-singularity-image.s3.us-west-2.amazonaws.com/chip-seq-pipeline_v2.2.1.sif'
7676
String conda = 'encd-chip'
7777
String conda_macs2 = 'encd-chip-macs2'
7878
String conda_spp = 'encd-chip-spp'

scripts/install_conda_env.sh

Lines changed: 62 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,28 @@
11
#!/bin/bash
22
set -e # Stop on error
33

4+
install_ucsc_tools_369() {
5+
# takes in conda env name and find conda bin
6+
CONDA_BIN=$(conda run -n $1 bash -c "echo \$(dirname \$(which python))")
7+
curl -o "$CONDA_BIN/fetchChromSizes" "https://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64.v369/fetchChromSizes"
8+
curl -o "$CONDA_BIN/wigToBigWig" "https://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64.v369/wigToBigWig"
9+
curl -o "$CONDA_BIN/bedGraphToBigWig" "https://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64.v369/bedGraphToBigWig"
10+
curl -o "$CONDA_BIN/bigWigInfo" "https://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64.v369/bigWigInfo"
11+
curl -o "$CONDA_BIN/bedClip" "https://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64.v369/bedClip"
12+
curl -o "$CONDA_BIN/bedToBigBed" "https://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64.v369/bedToBigBed"
13+
curl -o "$CONDA_BIN/twoBitToFa" "https://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64.v369/twoBitToFa"
14+
curl -o "$CONDA_BIN/bigWigAverageOverBed" "https://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64.v369/bigWigAverageOverBed"
15+
16+
chmod +x "$CONDA_BIN/fetchChromSizes"
17+
chmod +x "$CONDA_BIN/wigToBigWig"
18+
chmod +x "$CONDA_BIN/bedGraphToBigWig"
19+
chmod +x "$CONDA_BIN/bigWigInfo"
20+
chmod +x "$CONDA_BIN/bedClip"
21+
chmod +x "$CONDA_BIN/bedToBigBed"
22+
chmod +x "$CONDA_BIN/twoBitToFa"
23+
chmod +x "$CONDA_BIN/bigWigAverageOverBed"
24+
}
25+
426
SH_SCRIPT_DIR=$(cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd)
527

628
echo "$(date): Installing pipeline's Conda environments..."
@@ -12,15 +34,52 @@ conda create -n encd-chip-macs2 --file ${SH_SCRIPT_DIR}/requirements.macs2.txt \
1234
--override-channels -c bioconda -c defaults -y
1335

1436
conda create -n encd-chip-spp --file ${SH_SCRIPT_DIR}/requirements.spp.txt \
15-
--override-channels -c r -c bioconda -c defaults -y
37+
-c r -c bioconda -c defaults -y
1638

1739
# adhoc fix for the following issues:
1840
# - https://github.com/ENCODE-DCC/chip-seq-pipeline2/issues/259
1941
# - https://github.com/ENCODE-DCC/chip-seq-pipeline2/issues/265
2042
# force-install readline 6.2, ncurses 5.9 from conda-forge (ignoring dependencies)
21-
conda install -n encd-chip-spp --no-deps --no-update-deps -y \
22-
readline==6.2 ncurses==5.9 -c conda-forge
43+
# conda install -n encd-chip-spp --no-deps --no-update-deps -y \
44+
# readline==6.2 ncurses==5.9 -c conda-forge
45+
46+
CONDA_BIN=$(conda run -n encd-chip-spp bash -c "echo \$(dirname \$(which python))")
47+
48+
echo "$(date): Installing phantompeakqualtools in Conda environments..."
49+
RUN_SPP="https://raw.githubusercontent.com/kundajelab/phantompeakqualtools/1.2.2/run_spp.R"
50+
conda run -n encd-chip-spp bash -c \
51+
"curl -o $CONDA_BIN/run_spp.R $RUN_SPP && chmod +x $CONDA_BIN/run_spp.R"
52+
53+
echo "$(date): Installing R packages in Conda environments..."
54+
CRAN="https://cran.r-project.org/"
55+
conda run -n encd-chip-spp bash -c \
56+
"Rscript -e \"install.packages('snow', repos='$CRAN')\""
57+
conda run -n encd-chip-spp bash -c \
58+
"Rscript -e \"install.packages('snowfall', repos='$CRAN')\""
59+
conda run -n encd-chip-spp bash -c \
60+
"Rscript -e \"install.packages('bitops', repos='$CRAN')\""
61+
conda run -n encd-chip-spp bash -c \
62+
"Rscript -e \"install.packages('caTools', repos='$CRAN')\""
63+
conda run -n encd-chip-spp bash -c \
64+
"Rscript -e \"install.packages('BiocManager', repos='$CRAN')\""
65+
conda run -n encd-chip-spp bash -c \
66+
"Rscript -e \"require('BiocManager'); BiocManager::install('Rsamtools'); BiocManager::install('Rcpp')\""
67+
68+
echo "$(date): Installing R spp 1.15.5 in Conda environments..."
69+
SPP="https://cran.r-project.org/src/contrib/Archive/spp/spp_1.15.5.tar.gz"
70+
SPP_BASENAME=$(basename $SPP)
71+
curl -o "$CONDA_BIN/$SPP_BASENAME" "$SPP"
72+
conda run -n encd-chip-spp bash -c \
73+
"Rscript -e \"install.packages('$CONDA_BIN/$SPP_BASENAME')\""
74+
75+
echo "$(date): Installing USCS tools (v369)..."
76+
install_ucsc_tools_369 encd-chip
77+
install_ucsc_tools_369 encd-chip-spp
78+
install_ucsc_tools_369 encd-chip-macs2
2379

2480
echo "$(date): Done successfully."
81+
echo
82+
echo "If you see readline or ncurses library errors while running pipelines"
83+
echo "then switch to Singularity method. Conda method will not work on your system."
2584

2685
bash ${SH_SCRIPT_DIR}/update_conda_env.sh

scripts/requirements.macs2.txt

Lines changed: 0 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -6,19 +6,10 @@ python >=3
66
macs2 ==2.2.4
77
bedtools ==2.29.0
88
bedops ==2.4.39
9-
ucsc-fetchchromsizes # 377 in docker/singularity image
10-
ucsc-wigtobigwig
11-
ucsc-bedgraphtobigwig
12-
ucsc-bigwiginfo
13-
ucsc-bedclip
14-
ucsc-bedtobigbed
15-
ucsc-twobittofa
16-
ucsc-bigWigAverageOverBed
179
pybedtools ==0.8.0
1810
pybigwig ==0.3.13
1911
tabix
2012

2113
matplotlib
2214
ghostscript
2315

24-
openssl ==1.0.2u # to fix missing libssl.so.1.0.0 for UCSC tools (bedClip, ...)

scripts/requirements.spp.txt

Lines changed: 3 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,25 +1,17 @@
11
# Conda environment for tasks (spp, xcor) in atac/chip
2+
# some packages (phantompeakquals, r-spp) will be installed separately
3+
# couldn't resolve all conda conflicts
24

35
python >=3
46
bedtools ==2.29.0
57
bedops ==2.4.39
6-
phantompeakqualtools ==1.2.2
78

8-
ucsc-bedclip
9-
ucsc-bedtobigbed
9+
r-base ==3.6.1
1010

11-
r #==3.5.1 # 3.4.4 in docker/singularity image
12-
r-snow
13-
r-snowfall
14-
r-bitops
15-
r-catools
16-
bioconductor-rsamtools
17-
r-spp <1.16 #==1.15.5 # previously 1.15.5, and 1.14 in docker/singularity image, 1.16 has lwcc() error
1811
tabix
1912

2013
matplotlib
2114
pandas
2215
numpy
2316
ghostscript
2417

25-
openssl ==1.0.2u # to fix missing libssl.so.1.0.0 for UCSC tools (bedClip, ...)

scripts/requirements.txt

Lines changed: 0 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -13,15 +13,6 @@ pysam ==0.15.3
1313
pybedtools ==0.8.0
1414
pybigwig ==0.3.13
1515

16-
ucsc-fetchchromsizes # 377 in docker/singularity image
17-
ucsc-wigtobigwig
18-
ucsc-bedgraphtobigwig
19-
ucsc-bigwiginfo
20-
ucsc-bedclip
21-
ucsc-bedtobigbed
22-
ucsc-twobittofa
23-
ucsc-bigWigAverageOverBed
24-
2516
deeptools ==3.3.1
2617
cutadapt ==2.5
2718
preseq ==2.0.3
@@ -49,4 +40,3 @@ java-jdk
4940
picard ==2.20.7
5041
trimmomatic ==0.39
5142

52-
openssl ==1.0.2u # to fix missing libssl.so.1.0.0 for UCSC tools (bedClip, ...)

0 commit comments

Comments
 (0)