Skip to content

Commit 8ef5043

Browse files
authored
Merge pull request #281 from ENCODE-DCC/PIPE-77_shorten-conda-env-name
Pipe 77 shorten conda env name
2 parents eb46b46 + d2d7753 commit 8ef5043

File tree

8 files changed

+64
-119
lines changed

8 files changed

+64
-119
lines changed

README.md

Lines changed: 38 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -3,36 +3,22 @@
33
[![CircleCI](https://circleci.com/gh/ENCODE-DCC/chip-seq-pipeline2/tree/master.svg?style=svg)](https://circleci.com/gh/ENCODE-DCC/chip-seq-pipeline2/tree/master)
44

55

6-
## Download new Caper>=2.1
6+
## Conda environment name change (since v2.2.0 or 6/13/2022)
77

8-
New Caper is out. You need to update your Caper to work with the latest ENCODE ChIP-seq pipeline.
9-
```bash
10-
$ pip install caper --upgrade
8+
Pipeline's Conda environment's names have been shortened to work around the following error:
119
```
12-
13-
## Local/HPC users and new Caper>=2.1
14-
15-
There are tons of changes for local/HPC backends: `local`, `slurm`, `sge`, `pbs` and `lsf`(added). Make a backup of your current Caper configuration file `~/.caper/default.conf` and run `caper init`. Local/HPC users need to reset/initialize Caper's configuration file according to your chosen backend. Edit the configuration file and follow instructions in there.
16-
```bash
17-
$ cd ~/.caper
18-
$ cp default.conf default.conf.bak
19-
$ caper init [YOUR_BACKEND]
10+
PaddingError: Placeholder of length '80' too short in package /XXXXXXXXXXX/miniconda3/envs/
2011
```
2112

22-
In order to run a pipeline, you need to add one of the following flags to specify the environment to run each task within. i.e. `--conda`, `--singularity` and `--docker`. These flags are not required for cloud backend users (`aws` and `gcp`).
13+
You need to reinstall pipeline's Conda environment. It's recommended to do this for every version update.
2314
```bash
24-
# for example
25-
$ caper run ... --singularity
15+
$ bash scripts/uninstall_conda_env.sh
16+
$ bash scripts/install_conda_env.sh
2617
```
2718

28-
For Conda users, **RE-INSTALL PIPELINE'S CONDA ENVIRONMENT AND DO NOT ACTIVATE CONDA ENVIRONMENT BEFORE RUNNING PIPELINES**. Caper will internally call `conda run -n ENV_NAME CROMWELL_JOB_SCRIPT`. Just make sure that pipeline's new Conda environments are correctly installed.
29-
```bash
30-
$ scripts/uninstall_conda_env.sh
31-
$ scripts/install_conda_env.sh
32-
```
3319

20+
## Introduction
3421

35-
## Introduction
3622
This ChIP-Seq pipeline is based off the ENCODE (phase-3) transcription factor and histone ChIP-seq pipeline specifications (by Anshul Kundaje) in [this google doc](https://docs.google.com/document/d/1lG_Rd7fnYgRpSIqrIfuVlAz2dW1VaSQThzk836Db99c/edit#).
3723

3824
### Features
@@ -45,30 +31,44 @@ This ChIP-Seq pipeline is based off the ENCODE (phase-3) transcription factor an
4531

4632
1) Make sure that you have Python>=3.6. Caper does not work with Python2. Install Caper and check its version >=2.0.
4733
```bash
48-
$ python --version
4934
$ pip install caper
35+
36+
# use caper version >= 2.3.0 for a new HPC feature (caper hpc submit/list/abort).
37+
$ caper -v
5038
```
51-
2) Make a backup of your Caper configuration file `~/.caper/default.conf` if you are upgrading from old Caper(<2.0.0). Reset/initialize Caper's configuration file. Read Caper's [README](https://github.com/ENCODE-DCC/caper/blob/master/README.md) carefully to choose a backend for your system. Follow the instruction in the configuration file.
39+
2) Read Caper's [README](https://github.com/ENCODE-DCC/caper/blob/master/README.md) carefully to choose a backend for your system. Follow the instruction in the configuration file.
5240
```bash
53-
# make a backup of ~/.caper/default.conf if you already have it
41+
# this will overwrite the existing conf file ~/.caper/default.conf
42+
# make a backup of it first if needed
5443
$ caper init [YOUR_BACKEND]
5544

56-
# then edit ~/.caper/default.conf
45+
# edit the conf file
5746
$ vi ~/.caper/default.conf
5847
```
5948

6049
3) Git clone this pipeline.
61-
> **IMPORTANT**: use `~/chip-seq-pipeline2/chip.wdl` as `[WDL]` in Caper's documentation.
6250
```bash
6351
$ cd
6452
$ git clone https://github.com/ENCODE-DCC/chip-seq-pipeline2
6553
```
6654

67-
4) (Optional for Conda) Install pipeline's Conda environments if you don't have Singularity or Docker installed on your system. We recommend to use Singularity instead of Conda. If you don't have Conda on your system, install [Miniconda3](https://docs.conda.io/en/latest/miniconda.html).
55+
4) (Optional for Conda) **DO NOT USE A SHARED CONDA. INSTALL YOUR OWN [MINICONDA3](https://docs.conda.io/en/latest/miniconda.html) AND USE IT.** Install pipeline's Conda environments if you don't have Singularity or Docker installed on your system. We recommend to use Singularity instead of Conda.
6856
```bash
57+
# check if you have Singularity on your system, if so then it's not recommended to use Conda
58+
$ singularity --version
59+
60+
# check if you are not using a shared conda, if so then delete it or remove it from your PATH
61+
$ which conda
62+
63+
# change directory to pipeline's git repo
6964
$ cd chip-seq-pipeline2
70-
# uninstall old environments (<2.0.0)
65+
66+
# uninstall old environments
7167
$ bash scripts/uninstall_conda_env.sh
68+
69+
# install new envs, you need to run this for every pipeline version update.
70+
# it may be killed if you run this command line on a login node.
71+
# it's recommended to make an interactive node and run it there.
7272
$ bash scripts/install_conda_env.sh
7373
```
7474

@@ -88,20 +88,22 @@ You can use URIs(`s3://`, `gs://` and `http(s)://`) in Caper's command lines and
8888

8989
According to your chosen platform of Caper, run Caper or submit Caper command line to the cluster. You can choose other environments like `--singularity` or `--docker` instead of `--conda`. But you must define one of the environments.
9090

91-
The followings are just examples. Please read [Caper's README](https://github.com/ENCODE-DCC/caper) very carefully to find an actual working command line for your chosen platform.
91+
PLEASE READ [CAPER'S README](https://github.com/ENCODE-DCC/caper) VERY CAREFULLY BEFORE RUNNING ANY PIPELINES. YOU WILL NEED TO CORRECTLY CONFIGURE CAPER FIRST. These are just example command lines.
92+
9293
```bash
93-
# Run it locally with Conda (You don't need to activate it, make sure to install Conda envs first)
94+
# Run it locally with Conda (DO NOT ACTIVATE PIPELINE'S CONDA ENVIRONEMT)
9495
$ caper run chip.wdl -i https://storage.googleapis.com/encode-pipeline-test-samples/encode-chip-seq-pipeline/ENCSR000DYI_subsampled_chr19_only.json --conda
9596

96-
# Or submit it as a leader job (with long/enough resources) to SLURM (Stanford Sherlock) with Singularity
97-
# It will fail if you directly run the leader job on login nodes
98-
$ sbatch -p [SLURM_PARTITON] -J [WORKFLOW_NAME] --export=ALL --mem 4G -t 4-0 --wrap "caper chip chip.wdl -i https://storage.googleapis.com/encode-pipeline-test-samples/encode-chip-seq-pipeline/ENCSR000DYI_subsampled_chr19_only.json --singularity"
97+
# On HPC, submit it as a leader job to SLURM with Singularity
98+
$ caper hpc submit chip.wdl -i https://storage.googleapis.com/encode-pipeline-test-samples/encode-chip-seq-pipeline/ENCSR000DYI_subsampled_chr19_only.json --singularity --leader-job-name ANY_GOOD_LEADER_JOB_NAME
9999

100-
# Check status of your leader job
101-
$ squeue -u $USER | grep [WORKFLOW_NAME]
100+
# Check job ID and status of your leader jobs
101+
$ caper hpc list
102102

103103
# Cancel the leader node to close all of its children jobs
104-
$ scancel -j [JOB_ID]
104+
# If you directly use cluster command like scancel or qdel then
105+
# child jobs will not be terminated
106+
$ caper hpc abort [JOB_ID]
105107
```
106108

107109

chip.wdl

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -7,10 +7,10 @@ struct RuntimeEnvironment {
77
}
88

99
workflow chip {
10-
String pipeline_ver = 'v2.1.6'
10+
String pipeline_ver = 'v2.2.0'
1111

1212
meta {
13-
version: 'v2.1.6'
13+
version: 'v2.2.0'
1414

1515
author: 'Jin wook Lee'
1616
@@ -19,8 +19,8 @@ workflow chip {
1919

2020
specification_document: 'https://docs.google.com/document/d/1lG_Rd7fnYgRpSIqrIfuVlAz2dW1VaSQThzk836Db99c/edit?usp=sharing'
2121

22-
default_docker: 'encodedcc/chip-seq-pipeline:v2.1.6'
23-
default_singularity: 'https://encode-pipeline-singularity-image.s3.us-west-2.amazonaws.com/chip-seq-pipeline_v2.1.6.sif'
22+
default_docker: 'encodedcc/chip-seq-pipeline:v2.2.0'
23+
default_singularity: 'https://encode-pipeline-singularity-image.s3.us-west-2.amazonaws.com/chip-seq-pipeline_v2.2.0.sif'
2424
croo_out_def: 'https://storage.googleapis.com/encode-pipeline-output-definition/chip.croo.v5.json'
2525

2626
parameter_group: {
@@ -71,11 +71,11 @@ workflow chip {
7171
}
7272
input {
7373
# group: runtime_environment
74-
String docker = 'encodedcc/chip-seq-pipeline:v2.1.6'
75-
String singularity = 'https://encode-pipeline-singularity-image.s3.us-west-2.amazonaws.com/chip-seq-pipeline_v2.1.6.sif'
76-
String conda = 'encode-chip-seq-pipeline'
77-
String conda_macs2 = 'encode-chip-seq-pipeline-macs2'
78-
String conda_spp = 'encode-chip-seq-pipeline-spp'
74+
String docker = 'encodedcc/chip-seq-pipeline:v2.2.0'
75+
String singularity = 'https://encode-pipeline-singularity-image.s3.us-west-2.amazonaws.com/chip-seq-pipeline_v2.2.0.sif'
76+
String conda = 'encd-chip'
77+
String conda_macs2 = 'encd-chip-macs2'
78+
String conda_spp = 'encd-chip-spp'
7979

8080
# group: pipeline_metadata
8181
String title = 'Untitled'
@@ -228,7 +228,7 @@ workflow chip {
228228
Int xcor_time_hr = 24
229229
Float xcor_disk_factor = 4.5
230230

231-
Float subsample_ctl_mem_factor = 14.0
231+
Float subsample_ctl_mem_factor = 22.0
232232
Float subsample_ctl_disk_factor = 15.0
233233

234234
Float macs2_signal_track_mem_factor = 12.0
@@ -261,17 +261,17 @@ workflow chip {
261261
conda: {
262262
description: 'Default Conda environment name to run WDL tasks. For Conda users only.',
263263
group: 'runtime_environment',
264-
example: 'encode-atac-seq-pipeline'
264+
example: 'encd-chip'
265265
}
266266
conda_macs2: {
267267
description: 'Conda environment name for task macs2. For Conda users only.',
268268
group: 'runtime_environment',
269-
example: 'encode-atac-seq-pipeline-macs2'
269+
example: 'encd-chip-macs2'
270270
}
271271
conda_spp: {
272272
description: 'Conda environment name for tasks spp/xcor. For Conda users only.',
273273
group: 'runtime_environment',
274-
example: 'encode-atac-seq-pipeline-spp'
274+
example: 'encd-chip-spp'
275275
}
276276
title: {
277277
description: 'Experiment title.',

docs/build_genome_database.md

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -8,11 +8,7 @@
88

99
# How to build genome database
1010

11-
1. [Install Conda](https://conda.io/miniconda.html). Skip this if you already have equivalent Conda alternatives (Anaconda Python). Download and run the [installer](https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh). Agree to the license term by typing `yes`. It will ask you about the installation location. On Stanford clusters (Sherlock and SCG4), we recommend to install it outside of your `$HOME` directory since its filesystem is slow and has very limited space. At the end of the installation, choose `yes` to add Miniconda's binary to `$PATH` in your BASH startup script.
12-
```bash
13-
$ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
14-
$ bash Miniconda3-latest-Linux-x86_64.sh
15-
```
11+
1. [Install Conda](https://conda.io/miniconda.html).
1612

1713
2. Install pipeline's Conda environment.
1814
```bash
@@ -22,7 +18,7 @@
2218
2319
2. Choose `GENOME` from `hg19`, `hg38`, `mm9` and `mm10` and specify a destination directory. This will take several hours. We recommend not to run this installer on a login node of your cluster. It will take >8GB memory and >2h time.
2420
```bash
25-
$ conda activate encode-chip-seq-pipeline
21+
$ conda activate encd-chip
2622
$ bash scripts/build_genome_data.sh [GENOME] [DESTINATION_DIR]
2723
```
2824

docs/input.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -302,7 +302,7 @@ Parameter|Default|Description
302302

303303
Parameter|Default|Description
304304
---------|-------|-----------
305-
`chip.subsample_ctl_mem_factor` | 14.0 | Multiplied to size of TAG-ALIGN BED to determine required memory
305+
`chip.subsample_ctl_mem_factor` | 22.0 | Multiplied to size of TAG-ALIGN BED to determine required memory
306306
`chip.macs2_signal_track_time_hr` | 24 | Walltime (HPCs only)
307307
`chip.subsample_ctl_disk_factor` | 15.0 | Multiplied to size of TAG-ALIGN BED to determine required disk
308308

docs/install_conda.md

Lines changed: 0 additions & 53 deletions
This file was deleted.

scripts/install_conda_env.sh

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,20 +5,20 @@ SH_SCRIPT_DIR=$(cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd)
55

66
echo "$(date): Installing pipeline's Conda environments..."
77

8-
conda create -n encode-chip-seq-pipeline --file ${SH_SCRIPT_DIR}/requirements.txt \
8+
conda create -n encd-chip --file ${SH_SCRIPT_DIR}/requirements.txt \
99
--override-channels -c bioconda -c defaults -y
1010

11-
conda create -n encode-chip-seq-pipeline-macs2 --file ${SH_SCRIPT_DIR}/requirements.macs2.txt \
11+
conda create -n encd-chip-macs2 --file ${SH_SCRIPT_DIR}/requirements.macs2.txt \
1212
--override-channels -c bioconda -c defaults -y
1313

14-
conda create -n encode-chip-seq-pipeline-spp --file ${SH_SCRIPT_DIR}/requirements.spp.txt \
14+
conda create -n encd-chip-spp --file ${SH_SCRIPT_DIR}/requirements.spp.txt \
1515
--override-channels -c r -c bioconda -c defaults -y
1616

1717
# adhoc fix for the following issues:
1818
# - https://github.com/ENCODE-DCC/chip-seq-pipeline2/issues/259
1919
# - https://github.com/ENCODE-DCC/chip-seq-pipeline2/issues/265
2020
# force-install readline 6.2, ncurses 5.9 from conda-forge (ignoring dependencies)
21-
conda install -n encode-chip-seq-pipeline-spp --no-deps --no-update-deps -y \
21+
conda install -n encd-chip-spp --no-deps --no-update-deps -y \
2222
readline==6.2 ncurses==5.9 -c conda-forge
2323

2424
echo "$(date): Done successfully."

scripts/uninstall_conda_env.sh

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
#!/bin/bash
22

33
PIPELINE_CONDA_ENVS=(
4-
encode-chip-seq-pipeline
5-
encode-chip-seq-pipeline-macs2
6-
encode-chip-seq-pipeline-spp
4+
encd-chip
5+
encd-chip-macs2
6+
encd-chip-spp
77
)
88
for PIPELINE_CONDA_ENV in "${PIPELINE_CONDA_ENVS[@]}"
99
do

scripts/update_conda_env.sh

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,9 @@ SH_SCRIPT_DIR=$(cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd)
55
SRC_DIR=${SH_SCRIPT_DIR}/../src
66

77
PIPELINE_CONDA_ENVS=(
8-
encode-chip-seq-pipeline
9-
encode-chip-seq-pipeline-macs2
10-
encode-chip-seq-pipeline-spp
8+
encd-chip
9+
encd-chip-macs2
10+
encd-chip-spp
1111
)
1212
chmod u+rx ${SRC_DIR}/*.py
1313

0 commit comments

Comments
 (0)