Releases: ENCODE-DCC/chip-seq-pipeline2
v2.0.0
Upgrade Caper to the latest >=2.0.0. Old versions of Caper won't work correctly on HPCs.
$ pip install caper --upgrade
$ caper -v # check if >=2.0.0Conda users must re-install pipeline's Conda environments. YOU DO NOT NEED TO ACTIVATE CONDA ENVIRONMENT BEFORE RUNNING A PIPELINE. New Caper internally runs each task inside an installed Conda environment.
$ bash scripts/uninstall_conda_env.sh
$ bash scripts/install_conda_env.shHPC USERS MUST SPECIFY AN ENVIRONMENT TO RUN A PIPELINE ON. Choices are --conda, --singularity and --docker. This pipeline defaults to run with --docker so it will not work on HPCs without caper run ... --conda or caper run ... --singularity. It's recommended to use Singularity if your cluster supports it.
Please read new Caper (>=2.0.0)'s README carefully. There are very important updates on Caper's side for better HPC (Conda/Singularity/SLURM/...) support.
v1.9.0
Conda users must update pipeline's environment
$ bash scripts/update_conda_env.shAdded a new parameter to fix random seed for pseudoreplication.
- This parameter controls random seed for shuffling reads in a TAG-ALIGN during pseudoreplication.
- GNU
shuf --random-source=some_hash_function(seed).
- GNU
chip.pseudoreplication_random_seed: Any positive integer is allowed.- If
0(default) then input TAG-ALIGN's file size (in bytes) is used for the random seed.
v1.8.1
v1.8.0
Conda users must update pipeline's environment.
$ bash scripts/update_conda_env.shAdded input parameters:
chip.bowtie2_use_local_mode- If this flag is on then the pipeline will add
--localtobowtie2command line, which will override the default--end-to-endmode ofbowtie2. - See details in this bowtie2 manual.
- If this flag is on then the pipeline will add
chip.bwa_mem_read_len_limit- This parameter is only valid if
chip.use_bwa_mem_for_peand FASTQs are paired-ended. - This parameter defaults to
70(as mentioned in bwa's manual). - This parameter controls the threshold for read length of
bwa memfor paired ended dataset. The pipeline automatically determines sample's read length from a (merged) FASTQ R1 file. If such read length is shorter than this threshold then pipeline automatically switches back tobwa alninstead ofbwa mem. If you FASTQ's read length is <70and you still want to usebwa memthen try to reduce this parameter. - See details in this bwa manual.
- This parameter is only valid if
Conda environment
- Added and fixed version of
tbbin the environment, which will possibly fix thebowtie2andmambaconflicting library issue.
v1.7.1
Conda users must re-install Conda environment.
$ scripts/uninstall_conda_env.sh
$ scripts/install_conda_env.sh mambamamba support for Conda environment installation
- Add
mambato the installer command line to speed up resolving conflicts. - If it doesn't work then try without
mamba. mambawill be helpful for resolving conflicts of Conda packages much faster.
Increased resource factors
- Increased factors for some heavy tasks (
spr,filter,subsample_ctlandmacs2_signal_track). - Increased fixed disk size for several tasks (
gc_bias).
Others
- Added
versiontometa.
v1.7.0
Conda users must update their environment.
$ bash scripts/update_conda.env.shAdded chip.redact_nodup_bam
- This will redact filtered/nodup BAMs by replacing indels with reference sequences to protect donor's private information.
Added chip.trimmomatic_phred_score_format
- Choices: [
auto(default),phred33,phred66] (no hyphen). - Users can activate Trimmomatic's flag
-phred33or-phred66by defining this parameter asphred33orphred66. - Defaults to
auto(using Trimmomatic's auto detection). - More details at [this doc] (http://www.usadellab.org/cms/uploads/supplementary/Trimmomatic/TrimmomaticManual_V0.32.pdf)
Removed caper and croo from pipeline's Conda environment.
- There has been some conflicts between
conda-forgeandbiocondapackages. These two apps will be added back to the environment later after all conflicts are fixed.
v1.6.1
Conda users should re-install pipeline's environment.
$ bash scripts/uninstall_conda_env.sh
$ bash scripts/install_conda_env.shBug fixes
- Dependencies
- py2 Conda environment
- Fixed
biopythonat 1.76 which is the last version that supports py2.
- Fixed
- py3 Conda environment
- Added Caper's python dependency
scikit-learnto it.
- Added Caper's python dependency
- py2 Conda environment
- Malformed required memory for
samtools sortcommand line.
starch support
- Generate
starchoutput for blacklist-filtered peaks. (.starch) - New Croo output definition JSON (v5) for
starches.
v1.6.0
Conda users should update pipeline's environment. However, reinstalling is always recommended since we added GNU utils to the installer.
# To update env
$ bash scripts/update_conda_env.sh
# To re-install env
$ bash scripts/uninstall_conda_env.sh
$ bash scripts/install_conda_env.shNew factor-based resource parameters
- New parameters are factor-based and those factors are multiplied to task's input file sizes to determine required resources (mem/disk) to run a task (on a cloud instance or as an HCP job).
- e.g. for each replicate, sum of all R1/R2 FASTQs size will be used to determine resource for task
alignand BAM size will be used for taskfilter. - e.g. if you have total
20 GB(R1 + R2) of PE FASTQs and defaultchip.align_mem_factoris0.15and base memory is fixed at4-6 GBfor most tasks (5 GBfor taskalign). So instance's memory for taskalignwill be20 * 0.15 + 5 = 8 GB - Also, optimized memory/disk requirements for each task, all tasks should use less memory/disk than previous versions.
- Use SSD for all tasks on Google Cloud. This will cost x4 than HDD but it's still negligible (cost for SSD 100 GB is $0.5 per hour).
Change of default for resource parameters
chip.align_cpu: 2 -> 6chip.filter_cpu: 2 -> 4chip.call_peak_cpu: 1 -> 2 (peak-caller MACS2 is single-threaded. No more than 2 is required)
Added resource parameters
chip.spr_disk_factorchip.preseq_disk_factorchip.call_peak_cpu
Change of resource parameters.
chip.align_mem_mb->chip.align_bowtie2_mem_factorandchip.align_bwa_mem_factor- According to chosen aligner
chip.aligner(bowtie2orbwa), For custom aligner, it will usechip.align_bwa_mem_factor.
- According to chosen aligner
chip.align_disks->chip.align_bowtie2_disk_factorandchip.align_bwa_disk_factor- According to chosen aligner
chip.aligner(bowtie2orbwa), For custom aligner, it will usechip.align_bwa_disk_factor.
- According to chosen aligner
chip.filter_mem_mb->chip.filter_mem_factorchip.filter_disks->chip.filter_disk_factorchip.bam2ta_mem_mb->chip.bam2ta_mem_factorchip.bam2ta_disks->chip.bam2ta_disk_factorchip.xcor_mem_mb->chip.xcor_mem_factorchip.xcor_disks->chip.xcor_disk_factorchip.spr_mem_mb->chip.spr_mem_factorchip.spr_disks->chip.spr_disk_factorchip.jsd_mem_mb->chip.jsd_mem_factorchip.jsd_disks->chip.jsd_disk_factorchip.call_peak_mem_mb->chip.call_peak_spp_mem_factorandchip.call_peak_macs2_mem_factor- According to chosen peak caller
chip.peak_caller(defaulting tosppfor TF ChIP andmacs2for histone ChIP).
- According to chosen peak caller
chip.call_peak_disks->chip.call_peak_spp_disk_factorandchip.call_peak_macs2_disk_factor- According to chosen peak caller
chip.peak_caller(defaulting tosppfor TF ChIP andmacs2for histone ChIP).
- According to chosen peak caller
chip.macs2_signal_track_mem_mb->chip.macs2_signal_track_mem_factorchip.macs2_signal_track_disks->chip.macs2_signal_track_disk_factor
Resources for task align
- Custom aligner python script must be updated with
--mem-gb.- Task
alignwill use BWA's resources (chip.align_bwa_mem_factorandchip.align_bwa_disk_factor). --mem-gbshould be added to your Python scriptchip.custom_align_py.- See input documentation for details.
- Task
Resources for task call_peak
- Different factor-based parameters will be used for different peak caller
chip.peak_caller(defaulting tosppfor TF ChIP andmacs2for histone ChIP). - If
chip.peak_calleris not defined then TF ChIP-seq ("chip.pipeline_type": "tf") will default to usespppeak caller, hencechip.call_peak_spp_mem_factorandchip.call_peak_spp_disk_factor). - If
chip.peak_calleris not defined then histone ChIP-seq ("chip.pipeline_type": "histone") will default to usemacs2peak caller, hencechip.call_peak_macs2_mem_factorandchip.call_peak_macs2_disk_factor).
Misc.
- Better multi-threading
samtools view/index/sort. - Added GNU utils to Conda environment.
Zenodo integration for citation purposes
Integration with zenodo to generate a doi and citation that will update automatically with each subsequent release.
v1.5.1
New resource parameter for control subsampling.
- Control subsampling is separated from two peak-calling-related tasks (
call_peakandmacs2_signal_track) to prevent allocating high resource for subsampling, which is not fully utilized for peak-calling. - There is a new task for control subsampling, whose max. memory is controlled by
chip.subsample_ctl_mem_mb.- It's
16000by default. - Use higher number for huge controls. e.g.
32000or64000.
- It's
Bug fixes
- Typo in documentation about parameter
chip.mapq_thresh. - Syntax error in WDL's
metasection, which is not caught by Womtool but caught byminiwdl.