Releases · ENCODE-DCC/chip-seq-pipeline2

26 Oct 02:04

leepc12

v2.0.0

43c10a6

v2.0.0

Upgrade Caper to the latest >=2.0.0. Old versions of Caper won't work correctly on HPCs.

$ pip install caper --upgrade
$ caper -v # check if >=2.0.0

Conda users must re-install pipeline's Conda environments. YOU DO NOT NEED TO ACTIVATE CONDA ENVIRONMENT BEFORE RUNNING A PIPELINE. New Caper internally runs each task inside an installed Conda environment.

$ bash scripts/uninstall_conda_env.sh
$ bash scripts/install_conda_env.sh

HPC USERS MUST SPECIFY AN ENVIRONMENT TO RUN A PIPELINE ON. Choices are --conda, --singularity and --docker. This pipeline defaults to run with --docker so it will not work on HPCs without caper run ... --conda or caper run ... --singularity. It's recommended to use Singularity if your cluster supports it.

Please read new Caper (>=2.0.0)'s README carefully. There are very important updates on Caper's side for better HPC (Conda/Singularity/SLURM/...) support.

Assets 2

11 May 16:26

leepc12

v1.9.0

6b70b8d

v1.9.0

Conda users must update pipeline's environment

$ bash scripts/update_conda_env.sh

Added a new parameter to fix random seed for pseudoreplication.

This parameter controls random seed for shuffling reads in a TAG-ALIGN during pseudoreplication.
- GNU shuf --random-source=some_hash_function(seed).
chip.pseudoreplication_random_seed: Any positive integer is allowed.
If 0 (default) then input TAG-ALIGN's file size (in bytes) is used for the random seed.

Assets 2

20 Apr 08:11

leepc12

v1.8.1

6921fd6

v1.8.1

Fixed issues on DNAnexus
- Upgraded dxWDL to 1.50 (for pipelines >= 1.8.1)
- Fixed broken genome TSV files on DNAnexus.
Fixed GNU sort memory issue.

Assets 2

26 Mar 21:34

leepc12

v1.8.0

b4ffdfb

v1.8.0

Conda users must update pipeline's environment.

$ bash scripts/update_conda_env.sh

Added input parameters:

chip.bowtie2_use_local_mode
- If this flag is on then the pipeline will add --local to bowtie2 command line, which will override the default --end-to-end mode of bowtie2.
- See details in this bowtie2 manual.
chip.bwa_mem_read_len_limit
- This parameter is only valid if chip.use_bwa_mem_for_pe and FASTQs are paired-ended.
- This parameter defaults to 70 (as mentioned in bwa's manual).
- This parameter controls the threshold for read length of bwa mem for paired ended dataset. The pipeline automatically determines sample's read length from a (merged) FASTQ R1 file. If such read length is shorter than this threshold then pipeline automatically switches back to bwa aln instead of bwa mem. If you FASTQ's read length is < 70 and you still want to use bwa mem then try to reduce this parameter.
- See details in this bwa manual.

Conda environment

Added and fixed version of tbb in the environment, which will possibly fix the bowtie2 and mamba conflicting library issue.

Assets 2

23 Feb 20:15

leepc12

v1.7.1

2d51d34

v1.7.1

Conda users must re-install Conda environment.

$ scripts/uninstall_conda_env.sh
$ scripts/install_conda_env.sh mamba

mamba support for Conda environment installation

Add mamba to the installer command line to speed up resolving conflicts.
If it doesn't work then try without mamba.
mamba will be helpful for resolving conflicts of Conda packages much faster.

Increased resource factors

Increased factors for some heavy tasks (spr, filter, subsample_ctl and macs2_signal_track).
Increased fixed disk size for several tasks (gc_bias).

Others

Added version to meta.

Assets 2

09 Feb 05:23

leepc12

v1.7.0

72360b4

v1.7.0

Conda users must update their environment.

$ bash scripts/update_conda.env.sh

Added chip.redact_nodup_bam

This will redact filtered/nodup BAMs by replacing indels with reference sequences to protect donor's private information.

Added chip.trimmomatic_phred_score_format

Choices: [auto (default), phred33, phred66] (no hyphen).
Users can activate Trimmomatic's flag -phred33 or -phred66 by defining this parameter as phred33 or phred66.
Defaults to auto (using Trimmomatic's auto detection).
More details at [this doc] (http://www.usadellab.org/cms/uploads/supplementary/Trimmomatic/TrimmomaticManual_V0.32.pdf)

Removed caper and croo from pipeline's Conda environment.

There has been some conflicts between conda-forge and bioconda packages. These two apps will be added back to the environment later after all conflicts are fixed.

Assets 2

02 Nov 19:56

leepc12

v1.6.1

9461e6b

v1.6.1

Conda users should re-install pipeline's environment.

$ bash scripts/uninstall_conda_env.sh
$ bash scripts/install_conda_env.sh

Bug fixes

Dependencies
- py2 Conda environment
  - Fixed biopython at 1.76 which is the last version that supports py2.
- py3 Conda environment
  - Added Caper's python dependency scikit-learn to it.
Malformed required memory for samtools sort command line.

starch support

Generate starch output for blacklist-filtered peaks. (.starch)
New Croo output definition JSON (v5) for starches.

Assets 2

15 Sep 05:10

leepc12

v1.6.0

f3ab828

v1.6.0

Conda users should update pipeline's environment. However, reinstalling is always recommended since we added GNU utils to the installer.

# To update env
$ bash scripts/update_conda_env.sh

# To re-install env
$ bash scripts/uninstall_conda_env.sh
$ bash scripts/install_conda_env.sh

New factor-based resource parameters

New parameters are factor-based and those factors are multiplied to task's input file sizes to determine required resources (mem/disk) to run a task (on a cloud instance or as an HCP job).
e.g. for each replicate, sum of all R1/R2 FASTQs size will be used to determine resource for task align and BAM size will be used for task filter.
e.g. if you have total 20 GB (R1 + R2) of PE FASTQs and default chip.align_mem_factor is 0.15 and base memory is fixed at 4-6 GB for most tasks (5 GB for task align). So instance's memory for task align will be 20 * 0.15 + 5 = 8 GB
Also, optimized memory/disk requirements for each task, all tasks should use less memory/disk than previous versions.
Use SSD for all tasks on Google Cloud. This will cost x4 than HDD but it's still negligible (cost for SSD 100 GB is $0.5 per hour).

Change of default for resource parameters

chip.align_cpu: 2 -> 6
chip.filter_cpu: 2 -> 4
chip.call_peak_cpu: 1 -> 2 (peak-caller MACS2 is single-threaded. No more than 2 is required)

Added resource parameters

chip.spr_disk_factor
chip.preseq_disk_factor
chip.call_peak_cpu

Change of resource parameters.

chip.align_mem_mb -> chip.align_bowtie2_mem_factor and chip.align_bwa_mem_factor
- According to chosen aligner chip.aligner (bowtie2 or bwa), For custom aligner, it will use chip.align_bwa_mem_factor.
chip.align_disks -> chip.align_bowtie2_disk_factor and chip.align_bwa_disk_factor
- According to chosen aligner chip.aligner (bowtie2 or bwa), For custom aligner, it will use chip.align_bwa_disk_factor.
chip.filter_mem_mb -> chip.filter_mem_factor
chip.filter_disks -> chip.filter_disk_factor
chip.bam2ta_mem_mb -> chip.bam2ta_mem_factor
chip.bam2ta_disks -> chip.bam2ta_disk_factor
chip.xcor_mem_mb -> chip.xcor_mem_factor
chip.xcor_disks -> chip.xcor_disk_factor
chip.spr_mem_mb -> chip.spr_mem_factor
chip.spr_disks -> chip.spr_disk_factor
chip.jsd_mem_mb -> chip.jsd_mem_factor
chip.jsd_disks -> chip.jsd_disk_factor
chip.call_peak_mem_mb -> chip.call_peak_spp_mem_factor and chip.call_peak_macs2_mem_factor
- According to chosen peak caller chip.peak_caller (defaulting to spp for TF ChIP and macs2 for histone ChIP).
chip.call_peak_disks -> chip.call_peak_spp_disk_factor and chip.call_peak_macs2_disk_factor
- According to chosen peak caller chip.peak_caller (defaulting to spp for TF ChIP and macs2 for histone ChIP).
chip.macs2_signal_track_mem_mb -> chip.macs2_signal_track_mem_factor
chip.macs2_signal_track_disks -> chip.macs2_signal_track_disk_factor

Resources for task align

Custom aligner python script must be updated with --mem-gb.
- Task align will use BWA's resources (chip.align_bwa_mem_factor and chip.align_bwa_disk_factor).
- --mem-gb should be added to your Python script chip.custom_align_py.
- See input documentation for details.

Resources for task call_peak

Different factor-based parameters will be used for different peak caller chip.peak_caller (defaulting to spp for TF ChIP and macs2 for histone ChIP).
If chip.peak_caller is not defined then TF ChIP-seq ("chip.pipeline_type": "tf") will default to use spp peak caller, hence chip.call_peak_spp_mem_factor and chip.call_peak_spp_disk_factor).
If chip.peak_caller is not defined then histone ChIP-seq ("chip.pipeline_type": "histone") will default to use macs2 peak caller, hence chip.call_peak_macs2_mem_factor and chip.call_peak_macs2_disk_factor).

Misc.

Better multi-threading samtools view/index/sort.
Added GNU utils to Conda environment.

Assets 2

10 Aug 19:41

annashcherbina

v1.5.2

b33c8c6

Zenodo integration for citation purposes Pre-release

Pre-release

Integration with zenodo to generate a doi and citation that will update automatically with each subsequent release.

Assets 2

22 Jul 18:51

leepc12

v1.5.1

b33c8c6

v1.5.1

New resource parameter for control subsampling.

Control subsampling is separated from two peak-calling-related tasks (call_peak and macs2_signal_track) to prevent allocating high resource for subsampling, which is not fully utilized for peak-calling.
There is a new task for control subsampling, whose max. memory is controlled by chip.subsample_ctl_mem_mb.
- It's 16000 by default.
- Use higher number for huge controls. e.g. 32000 or 64000.

Bug fixes

Typo in documentation about parameter chip.mapq_thresh.
Syntax error in WDL's meta section, which is not caught by Womtool but caught by miniwdl.

Assets 2

Releases: ENCODE-DCC/chip-seq-pipeline2

v2.0.0

Uh oh!

v1.9.0

Uh oh!

v1.8.1

Uh oh!

v1.8.0

Uh oh!

v1.7.1

Uh oh!

v1.7.0

Uh oh!

v1.6.1

Uh oh!

v1.6.0

Uh oh!

Zenodo integration for citation purposes

Uh oh!

v1.5.1

Uh oh!