Skip to content

Commit 26e0e72

Browse files
sreichlFangwen Zhao
andauthored
Macs2 keep dup configurable (#48)
* Made macs2 keep-dups paramter configurable. Currently set to true default=1 * typo, forgot a comma * Added documentation for keep_dup parameter * another typo fixed * Made removal of duplicates configurable in filtering of reads prior to peak calling. This is now independent of macs2 filtering --------- Co-authored-by: Fangwen Zhao <[email protected]>
1 parent 663e2ca commit 26e0e72

File tree

2 files changed

+13
-2
lines changed

2 files changed

+13
-2
lines changed

config/config.yaml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,16 @@ annot_columns: ['pass_qc','read_type','organism'] # (optional) can be empty [""]
2727
tss_slop: 2000
2828
noise_lower: 100
2929

30+
# specify if duplicates should be kept during filtering bam files to define samtools view filtering flags
31+
# filtered reads used as input for macs2 peak calling and counts
32+
# warning: inclusion of duplicates should be intentional, and may lead to a large number of consensus regions
33+
remove_dup: True # bool: True by default
34+
35+
# specify how duplicate reads should be handled by macs2
36+
# warning: inclusion of duplicates should be intentional, and may lead to a large number of consensus regions
37+
# see documentation for parameter: https://manpages.ubuntu.com/manpages/mantic/man1/macs2_callpeak.1.html
38+
keep_dup: 1 # int: default = 1, int for kept reads at given genomic coordinates. or "auto"
39+
3040
# determination of consensus regions using (py)bedtools
3141
slop_extension: 250
3242

workflow/rules/processing.smk

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ rule align:
2020
# alignment parameters
2121
interleaved_in = lambda w: "--interleaved_in" if samples["{}".format(w.sample)]["read_type"] == "paired" else " ",
2222
interleaved = lambda w: "--interleaved" if samples["{}".format(w.sample)]["read_type"] == "paired" else " ",
23-
filtering = lambda w: "-q 30 -F 2316 -f 2 -L {}".format(config["whitelisted_regions"]) if samples["{}".format(w.sample)]["read_type"] == "paired" else "-q 30 -F 2316 -L {}".format(config["whitelisted_regions"]),
23+
filtering = lambda w: "-q 30 -F {flag} -f 2 -L {whitelist}".format(flag=3340 if config['remove_dup'] else 2316, whitelist=config["whitelisted_regions"]) if samples["{}".format(w.sample)]["read_type"] == "paired" else "-q 30 -F {flag} -L {whitelist}".format(flag=3340 if config['remove_dup'] else 2316, whitelist=config["whitelisted_regions"]),
2424
add_mate_tags = lambda w: "--addMateTags" if samples["{}".format(w.sample)]["read_type"] == "paired" else " ",
2525
adapter_sequence = "-a " + config["adapter_sequence"] if config["adapter_sequence"] !="" else " ",
2626
adapter_fasta = "--adapter_fasta " + config["adapter_fasta"] if config["adapter_fasta"] !="" else " ",
@@ -116,6 +116,7 @@ rule peak_calling:
116116
genome_size = config["genome_size"],
117117
genome = config["genome"],
118118
regulatory_regions = config["regulatory_regions"],
119+
keep_dup = config['keep_dup'],
119120
resources:
120121
mem_mb=config.get("mem", "16000"),
121122
threads: 4*config.get("threads", 2)
@@ -128,7 +129,7 @@ rule peak_calling:
128129
export PATH="{params.homer_bin}:$PATH";
129130
130131
macs2 callpeak -t {input.bam} {params.formating} \
131-
--nomodel --keep-dup auto --extsize 147 -g {params.genome_size} \
132+
--nomodel --keep-dup {params.keep_dup} --extsize 147 -g {params.genome_size} \
132133
-n {wildcards.sample} \
133134
--outdir {params.peaks_dir} > "{output.macs2_log}" 2>&1;
134135

0 commit comments

Comments
 (0)