Skip to content

Commit 4d494b7

Browse files
authored
Merge pull request #386 from ENCODE-DCC/dev
v2.2.0
2 parents 0f7e63e + b3c6564 commit 4d494b7

File tree

8 files changed

+70
-78
lines changed

8 files changed

+70
-78
lines changed

.circleci/config.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ make_tag: &make_tag
1818
commands:
1919
install_python3_caper_gcs:
2020
description: "Install py3, caper and gcs. Set py3 as default python."
21-
steps:
21+
steps:
2222
- run:
2323
command: |
2424
sudo apt-get update && sudo apt-get install software-properties-common git wget curl -y

README.md

Lines changed: 38 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -3,36 +3,17 @@
33
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.156534.svg)](https://doi.org/10.5281/zenodo.156534)[![CircleCI](https://circleci.com/gh/ENCODE-DCC/atac-seq-pipeline/tree/master.svg?style=svg)](https://circleci.com/gh/ENCODE-DCC/atac-seq-pipeline/tree/master)
44

55

6-
## Updated genome TSV files (v3 -> v4)
6+
## Conda environment name change (since v2.2.0 or 6/13/2022)
77

8-
9-
10-
## Download new Caper>=2.1
11-
12-
New Caper is out. You need to update your Caper to work with the latest ENCODE ATAC-seq pipeline.
13-
```bash
14-
$ pip install caper --upgrade
15-
```
16-
17-
## Local/HPC users and new Caper>=2.1
18-
19-
There are tons of changes for local/HPC backends: `local`, `slurm`, `sge`, `pbs` and `lsf`(added). Make a backup of your current Caper configuration file `~/.caper/default.conf` and run `caper init`. Local/HPC users need to reset/initialize Caper's configuration file according to your chosen backend. Edit the configuration file and follow instructions in there.
20-
```bash
21-
$ cd ~/.caper
22-
$ cp default.conf default.conf.bak
23-
$ caper init [YOUR_BACKEND]
8+
Pipeline's Conda environment's names have been shortened to work around the following error:
249
```
25-
26-
In order to run a pipeline, you need to add one of the following flags to specify the environment to run each task within. i.e. `--conda`, `--singularity` and `--docker`. These flags are not required for cloud backend users (`aws` and `gcp`).
27-
```bash
28-
# for example
29-
$ caper run ... --singularity
10+
PaddingError: Placeholder of length '80' too short in package /XXXXXXXXXXX/miniconda3/envs/
3011
```
3112

32-
For Conda users, **RE-INSTALL PIPELINE'S CONDA ENVIRONMENT AND DO NOT ACTIVATE CONDA ENVIRONMENT BEFORE RUNNING PIPELINES**. Caper will internally call `conda run -n ENV_NAME CROMWELL_JOB_SCRIPT`. Just make sure that pipeline's new Conda environments are correctly installed.
13+
You need to reinstall pipeline's Conda environment. It's recommended to do this for every version update.
3314
```bash
34-
$ scripts/uninstall_conda_env.sh
35-
$ scripts/install_conda_env.sh
15+
$ bash scripts/uninstall_conda_env.sh
16+
$ bash scripts/install_conda_env.sh
3617
```
3718

3819
## Introduction
@@ -51,31 +32,44 @@ The ATAC-seq pipeline protocol specification is [here](https://docs.google.com/d
5132

5233
1) Make sure that you have Python>=3.6. Caper does not work with Python2. Install Caper and check its version >=2.0.
5334
```bash
54-
$ python --version
5535
$ pip install caper
36+
37+
# use caper version >= 2.3.0 for a new HPC feature (caper hpc submit/list/abort).
38+
$ caper -v
5639
```
57-
2) Make a backup of your Caper configuration file `~/.caper/default.conf` if you are upgrading from old Caper(<2.0.0). Reset/initialize Caper's configuration file. Read Caper's [README](https://github.com/ENCODE-DCC/caper/blob/master/README.md) carefully to choose a backend for your system. Follow the instruction in the configuration file.
40+
2) Read Caper's [README](https://github.com/ENCODE-DCC/caper/blob/master/README.md) carefully to choose a backend for your system. Follow the instruction in the configuration file.
5841
```bash
59-
# make a backup of ~/.caper/default.conf if you already have it
42+
# this will overwrite the existing conf file ~/.caper/default.conf
43+
# make a backup of it first if needed
6044
$ caper init [YOUR_BACKEND]
6145

62-
# then edit ~/.caper/default.conf
46+
# edit the conf file
6347
$ vi ~/.caper/default.conf
6448
```
6549

6650
3) Git clone this pipeline.
67-
> **IMPORTANT**: use `~/atac-seq-pipeline/atac.wdl` as `[WDL]` in Caper's documentation.
68-
6951
```bash
7052
$ cd
7153
$ git clone https://github.com/ENCODE-DCC/atac-seq-pipeline
7254
```
7355

74-
4) (Optional for Conda users) Install pipeline's Conda environments if you don't have Singularity or Docker installed on your system. We recommend to use Singularity instead of Conda. If you don't have Conda on your system, install [Miniconda3](https://docs.conda.io/en/latest/miniconda.html).
56+
4) (Optional for Conda) **DO NOT USE A SHARED CONDA. INSTALL YOUR OWN [MINICONDA3](https://docs.conda.io/en/latest/miniconda.html) AND USE IT.** Install pipeline's Conda environments if you don't have Singularity or Docker installed on your system. We recommend to use Singularity instead of Conda.
7557
```bash
58+
# check if you have Singularity on your system, if so then it's not recommended to use Conda
59+
$ singularity --version
60+
61+
# check if you are not using a shared conda, if so then delete it or remove it from your PATH
62+
$ which conda
63+
64+
# change directory to pipeline's git repo
7665
$ cd atac-seq-pipeline
77-
# uninstall old environments (<2.0.0)
66+
67+
# uninstall old environments
7868
$ bash scripts/uninstall_conda_env.sh
69+
70+
# install new envs, you need to run this for every pipeline version update.
71+
# it may be killed if you run this command line on a login node.
72+
# it's recommended to make an interactive node and run it there.
7973
$ bash scripts/install_conda_env.sh
8074
```
8175

@@ -96,22 +90,23 @@ You can use URIs(`s3://`, `gs://` and `http(s)://`) in Caper's command lines and
9690

9791
According to your chosen platform of Caper, run Caper or submit Caper command line to the cluster. You can choose other environments like `--singularity` or `--docker` instead of `--conda`. But you must define one of the environments.
9892

99-
The followings are just examples. Please read [Caper's README](https://github.com/ENCODE-DCC/caper) very carefully to find an actual working command line for your chosen platform.
93+
PLEASE READ [CAPER'S README](https://github.com/ENCODE-DCC/caper) VERY CAREFULLY BEFORE RUNNING ANY PIPELINES. YOU WILL NEED TO CORRECTLY CONFIGURE CAPER FIRST. These are just example command lines.
94+
10095
```bash
101-
# Run it locally with Conda (You don't need to activate it, make sure to install Conda envs first)
96+
# Run it locally with Conda (DO NOT ACTIVATE PIPELINE'S CONDA ENVIRONEMT)
10297
$ caper run atac.wdl -i https://storage.googleapis.com/encode-pipeline-test-samples/encode-atac-seq-pipeline/ENCSR356KRQ_subsampled.json --conda
10398

104-
# Or submit it as a leader job (with long/enough resources) to SLURM (Stanford Sherlock) with Singularity
105-
# It will fail if you directly run the leader job on login nodes
106-
$ sbatch -p [SLURM_PARTITON] -J [WORKFLOW_NAME] --export=ALL --mem 4G -t 4-0 --wrap "caper run atac.wdl -i https://storage.googleapis.com/encode-pipeline-test-samples/encode-atac-seq-pipeline/ENCSR356KRQ_subsampled.json --singularity"
99+
# On HPC, submit it as a leader job to SLURM with Singularity
100+
$ caper hpc submit atac.wdl -i https://storage.googleapis.com/encode-pipeline-test-samples/encode-atac-seq-pipeline/ENCSR356KRQ_subsampled.json --singularity --leader-job-name ANY_GOOD_LEADER_JOB_NAME
107101

108-
# Check status of your leader job
109-
$ squeue -u $USER | grep [WORKFLOW_NAME]
102+
# Check job ID and status of your leader jobs
103+
$ caper hpc list
110104

111105
# Cancel the leader node to close all of its children jobs
112-
$ scancel -j [JOB_ID]
113-
```
114-
106+
# If you directly use cluster command like scancel or qdel then
107+
# child jobs will not be terminated
108+
$ caper hpc abort [JOB_ID]
109+
```
115110

116111
## Running and sharing on Truwl
117112

atac.wdl

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -7,10 +7,10 @@ struct RuntimeEnvironment {
77
}
88

99
workflow atac {
10-
String pipeline_ver = 'v2.1.3'
10+
String pipeline_ver = 'v2.2.0'
1111

1212
meta {
13-
version: 'v2.1.3'
13+
version: 'v2.2.0'
1414

1515
author: 'Jin wook Lee'
1616
email: 'leepc12@gmail.com'
@@ -19,9 +19,9 @@ workflow atac {
1919

2020
specification_document: 'https://docs.google.com/document/d/1f0Cm4vRyDQDu0bMehHD7P7KOMxTOP-HiNoIvL1VcBt8/edit?usp=sharing'
2121

22-
default_docker: 'encodedcc/atac-seq-pipeline:v2.1.3'
23-
default_singularity: 'https://encode-pipeline-singularity-image.s3.us-west-2.amazonaws.com/atac-seq-pipeline_v2.1.3.sif'
24-
default_conda: 'encode-atac-seq-pipeline'
22+
default_docker: 'encodedcc/atac-seq-pipeline:v2.2.0'
23+
default_singularity: 'https://encode-pipeline-singularity-image.s3.us-west-2.amazonaws.com/atac-seq-pipeline_v2.2.0.sif'
24+
default_conda: 'encd-atac'
2525
croo_out_def: 'https://storage.googleapis.com/encode-pipeline-output-definition/atac.croo.v5.json'
2626

2727
parameter_group: {
@@ -72,12 +72,12 @@ workflow atac {
7272
}
7373
input {
7474
# group: runtime_environment
75-
String docker = 'encodedcc/atac-seq-pipeline:v2.1.3'
76-
String singularity = 'https://encode-pipeline-singularity-image.s3.us-west-2.amazonaws.com/atac-seq-pipeline_v2.1.3.sif'
77-
String conda = 'encode-atac-seq-pipeline'
78-
String conda_macs2 = 'encode-atac-seq-pipeline-macs2'
79-
String conda_spp = 'encode-atac-seq-pipeline-spp'
80-
String conda_python2 = 'encode-atac-seq-pipeline-python2'
75+
String docker = 'encodedcc/atac-seq-pipeline:v2.2.0'
76+
String singularity = 'https://encode-pipeline-singularity-image.s3.us-west-2.amazonaws.com/atac-seq-pipeline_v2.2.0.sif'
77+
String conda = 'encd-atac'
78+
String conda_macs2 = 'encd-atac-macs2'
79+
String conda_spp = 'encd-atac-spp'
80+
String conda_python2 = 'encd-atac-py2'
8181

8282
# group: pipeline_metadata
8383
String title = 'Untitled'
@@ -255,22 +255,22 @@ workflow atac {
255255
conda: {
256256
description: 'Default Conda environment name to run WDL tasks. For Conda users only.',
257257
group: 'runtime_environment',
258-
example: 'encode-atac-seq-pipeline'
258+
example: 'encd-atac'
259259
}
260260
conda_macs2: {
261261
description: 'Conda environment name for task macs2. For Conda users only.',
262262
group: 'runtime_environment',
263-
example: 'encode-atac-seq-pipeline-macs2'
263+
example: 'encd-atac-macs2'
264264
}
265265
conda_spp: {
266266
description: 'Conda environment name for tasks spp/xcor. For Conda users only.',
267267
group: 'runtime_environment',
268-
example: 'encode-atac-seq-pipeline-spp'
268+
example: 'encd-atac-spp'
269269
}
270270
conda_python2: {
271271
description: 'Conda environment name for tasks with python2 wrappers (tss_enrich). For Conda users only.',
272272
group: 'runtime_environment',
273-
example: 'encode-atac-seq-pipeline-python2'
273+
example: 'encd-atac-py2'
274274
}
275275
title: {
276276
description: 'Experiment title.',

docs/build_genome_database.md

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -8,11 +8,7 @@
88

99
# How to build genome database
1010

11-
1. [Install Conda](https://conda.io/miniconda.html). Skip this if you already have equivalent Conda alternatives (Anaconda Python). Download and run the [installer](https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh). Agree to the license term by typing `yes`. It will ask you about the installation location. On Stanford clusters (Sherlock and SCG4), we recommend to install it outside of your `$HOME` directory since its filesystem is slow and has very limited space. At the end of the installation, choose `yes` to add Miniconda's binary to `$PATH` in your BASH startup script.
12-
```bash
13-
$ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
14-
$ bash Miniconda3-latest-Linux-x86_64.sh
15-
```
11+
1. [Install Conda](https://conda.io/miniconda.html).
1612

1713
2. Install pipeline's Conda environment.
1814
```bash
@@ -22,7 +18,7 @@
2218
2319
3. Choose `GENOME` from `hg19`, `hg38`, `mm9` and `mm10` and specify a destination directory. This will take several hours. We recommend not to run this installer on a login node of your cluster. It will take >8GB memory and >2h time.
2420
```bash
25-
$ conda activate encode-atac-seq-pipeline
21+
$ conda activate encd-atac
2622
$ bash scripts/build_genome_data.sh [GENOME] [DESTINATION_DIR]
2723
```
2824

scripts/install_conda_env.sh

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -5,23 +5,23 @@ SH_SCRIPT_DIR=$(cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd)
55

66
echo "$(date): Installing pipeline's Conda environments..."
77

8-
conda create -n encode-atac-seq-pipeline --file ${SH_SCRIPT_DIR}/requirements.txt \
8+
conda create -n encd-atac --file ${SH_SCRIPT_DIR}/requirements.txt \
99
--override-channels -c bioconda -c defaults -y
1010

11-
conda create -n encode-atac-seq-pipeline-macs2 --file ${SH_SCRIPT_DIR}/requirements.macs2.txt \
11+
conda create -n encd-atac-macs2 --file ${SH_SCRIPT_DIR}/requirements.macs2.txt \
1212
--override-channels -c bioconda -c defaults -y
1313

14-
conda create -n encode-atac-seq-pipeline-spp --file ${SH_SCRIPT_DIR}/requirements.spp.txt \
14+
conda create -n encd-atac-spp --file ${SH_SCRIPT_DIR}/requirements.spp.txt \
1515
--override-channels -c r -c bioconda -c defaults -y
1616

1717
# adhoc fix for the following issues:
1818
# - https://github.com/ENCODE-DCC/chip-seq-pipeline2/issues/259
1919
# - https://github.com/ENCODE-DCC/chip-seq-pipeline2/issues/265
2020
# force-install readline 6.2, ncurses 5.9 from conda-forge (ignoring dependencies)
21-
conda install -n encode-atac-seq-pipeline-spp --no-deps --no-update-deps -y \
21+
conda install -n encd-atac-spp --no-deps --no-update-deps -y \
2222
readline==6.2 ncurses==5.9 -c conda-forge
2323

24-
conda create -n encode-atac-seq-pipeline-python2 --file ${SH_SCRIPT_DIR}/requirements.python2.txt \
24+
conda create -n encd-atac-py2 --file ${SH_SCRIPT_DIR}/requirements.python2.txt \
2525
--override-channels -c conda-forge -c bioconda -c defaults -y
2626

2727
echo "$(date): Done successfully."

scripts/requirements.python2.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ python ==2.7.16
66
biopython ==1.76
77
metaseq ==0.5.6
88
samtools ==1.9
9+
gffutils ==0.10.1 # 0.11.0 is not py2 compatible
910

1011
python-dateutil ==2.8.0
1112
grep

scripts/uninstall_conda_env.sh

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
#!/bin/bash
22

33
PIPELINE_CONDA_ENVS=(
4-
encode-atac-seq-pipeline
5-
encode-atac-seq-pipeline-macs2
6-
encode-atac-seq-pipeline-spp
7-
encode-atac-seq-pipeline-python2
4+
encd-atac
5+
encd-atac-macs2
6+
encd-atac-spp
7+
encd-atac-py2
88
)
99
for PIPELINE_CONDA_ENV in "${PIPELINE_CONDA_ENVS[@]}"
1010
do

scripts/update_conda_env.sh

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,10 +5,10 @@ SH_SCRIPT_DIR=$(cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd)
55
SRC_DIR=${SH_SCRIPT_DIR}/../src
66

77
PIPELINE_CONDA_ENVS=(
8-
encode-atac-seq-pipeline
9-
encode-atac-seq-pipeline-macs2
10-
encode-atac-seq-pipeline-spp
11-
encode-atac-seq-pipeline-python2
8+
encd-atac
9+
encd-atac-macs2
10+
encd-atac-spp
11+
encd-atac-py2
1212
)
1313
chmod u+rx ${SRC_DIR}/*.py
1414

0 commit comments

Comments
 (0)