Skip to content
Draft
Show file tree
Hide file tree
Changes from 71 commits
Commits
Show all changes
82 commits
Select commit Hold shift + click to select a range
6873749
initial commit of dvpolish workflow
MartinPippel Feb 21, 2024
79cb1bd
add modules for error polishing workflow
MartinPippel Feb 21, 2024
226e688
update modules.json file and add missing patch files
MartinPippel Feb 21, 2024
635aefe
fix typo
MartinPippel Feb 21, 2024
a18184e
fix typo
MartinPippel Feb 21, 2024
083bd6f
add some code pieces to get started on nac
MartinPippel Feb 21, 2024
5acc002
add nac profile
MartinPippel Feb 21, 2024
28f7061
first trial to include dvpolish workflow into EBP-pilot
MartinPippel Apr 17, 2024
4bd979c
add some debug info to understand why consensus step is not executed
MartinPippel Apr 26, 2024
6d0f383
call DVpolish subworkflow in refactored main.nf
MartinPippel Apr 26, 2024
7e93ca0
bugfix, DVpolish use two input channels
MartinPippel Apr 26, 2024
fe8e8c4
bugfix input
MartinPippel Apr 26, 2024
3ec6392
bugfix typo
MartinPippel Apr 26, 2024
6981dc3
finished dvpolish pipeline - ready for testing
MartinPippel May 3, 2024
5b62bbb
code cleanup, bugfix dvpolish when multiple read files are present
MartinPippel May 6, 2024
0c2ea8c
bugfix dvpolish: use unique_assembly_ch
MartinPippel May 6, 2024
f9e2a2d
add default chunk_size of 50M
MartinPippel May 6, 2024
6e9f78a
bugfix: remove optional flag for pbmm2 align output files
MartinPippel May 13, 2024
08bc4a0
use better branch names muliples and singleton
MartinPippel May 13, 2024
3ccafa7
add chunk_size via ext.args, convert size units via NextFlow getBytes…
MartinPippel May 13, 2024
d96ae59
bugfix for parsing chunk_size values
MartinPippel May 13, 2024
945006b
refactor channel handling for samtools view process
MartinPippel May 13, 2024
d6abfbc
refactor channel handling for samtools view process v2
MartinPippel May 13, 2024
c379b77
refactor channel handling for samtools view process v3
MartinPippel May 13, 2024
9a5965d
refactor pbmm2 align module - combine bam and bai into one output tuple
MartinPippel May 15, 2024
98ca0c1
refactor dvpolish main.nf
MartinPippel May 15, 2024
3c39a85
code cleanuo
MartinPippel May 15, 2024
ee64e8f
bugfix misplaced comma, rename channels
MartinPippel May 15, 2024
b8e142d
samtools view: provide qname to -L argument which is now the bed chun…
MartinPippel May 17, 2024
62e9d2e
refactor input channel for samtools view process
MartinPippel May 17, 2024
797532c
bugfix deepvariant_ch channel logic
MartinPippel May 20, 2024
9608ab4
cleanup debug messages
MartinPippel May 20, 2024
23241a2
add modified usage of samtools view process
MartinPippel May 20, 2024
1d99b3a
remove nf-core todo comments
MartinPippel May 20, 2024
e692c24
bugfix: sync input channels of deepvariant process
MartinPippel May 20, 2024
8457b41
bugfix typo
MartinPippel May 20, 2024
ff93e67
bugfix: sync input channels of bcftools merge process
MartinPippel May 21, 2024
d36c4aa
bugfix typo muliMap
MartinPippel May 21, 2024
60d4b7c
bugfix use combineByMetaKeys instead of joinByMetaKeys for dv_input c…
MartinPippel May 23, 2024
eb469a9
start to integrate meryl + merqury into dvpolish pipeline
MartinPippel May 23, 2024
180b5c7
run merqury on dv_input assembly
MartinPippel May 23, 2024
ca2f4d5
bufgix module path
MartinPippel May 23, 2024
a51e886
bugfix merquery integration into dvpolish
MartinPippel May 23, 2024
2716ac2
bufgix correct input channel order for merqury call
MartinPippel May 23, 2024
42ac63d
run merqury on polished assembly
MartinPippel May 27, 2024
ec88039
add new module that creates the best polished assembly
MartinPippel May 27, 2024
f5abfef
add new channels for testing createFinalAsm process
MartinPippel May 27, 2024
28c0c12
comment out some code, that needs to be executed at runtime
MartinPippel May 27, 2024
3341d20
comment out more code, that needs to be executed at runtime
MartinPippel May 27, 2024
6793ecc
make unpolished and polished assembly names unique
MartinPippel May 28, 2024
9205daf
add last step of dvpolish pipeline
MartinPippel May 28, 2024
9a166ca
add publishDir for dvpolish processes
MartinPippel May 28, 2024
663f79c
correct final output assembly of dvpolish subworkflow
MartinPippel May 28, 2024
4019f77
bugfix - use scaffold_qv instead of assembly_qv
MartinPippel May 28, 2024
6e0bf22
bugfix typos and comment out internal variable nl_pol_ASM
MartinPippel May 28, 2024
94359bd
bugfix - FastK seems to work on fasta.gz files but not on fa.gz files…
MartinPippel May 29, 2024
ce01b41
swtich from getSimpleName to getBaseName, because polished was remove…
MartinPippel May 29, 2024
b95ffeb
bugfix file naming: _part -> .part
MartinPippel May 29, 2024
40663ec
bugfix, createFinalAsm.nf should append contigs to result file, and n…
MartinPippel May 29, 2024
e06c279
bugfix, update expected output files of dvpolish: fa.gz -> fasta.gz
MartinPippel May 29, 2024
bf2b8f5
Update configs/modules.config
MartinPippel Jun 19, 2024
2944627
add meta.assembly.build to dvpolish publish path
MartinPippel Jun 19, 2024
39436c5
add small description to cryptic bash lines of createFinalAsm process
MartinPippel Jun 19, 2024
8951f02
replace path_closure with transpose channel operator -> needs to be t…
MartinPippel Jun 19, 2024
3379abf
fix typo in MINIMAP2_ALIGN_READS configuration
MartinPippel Jun 19, 2024
04cbce0
manually reformat indentation
MartinPippel Jun 19, 2024
ee00abc
align polish subwf with refactored structure
CormacKinsella Nov 10, 2025
6f655d9
move polish section, remove redundant evaluate section
CormacKinsella Nov 10, 2025
555a08f
whitespace
CormacKinsella Nov 10, 2025
b03c661
mix polished with other assemblies for evaluate
CormacKinsella Nov 10, 2025
6e46ab1
compatible with function refactor
CormacKinsella Nov 10, 2025
dd2a9ee
Merge branch 'main' into merge_dev
CormacKinsella Dec 1, 2025
db56baa
update arg
CormacKinsella Dec 1, 2025
9bba989
relocate nf-core module to subtool dir
CormacKinsella Dec 1, 2025
fa49f25
remove redundant file
CormacKinsella Dec 1, 2025
f28607a
update nf-core manifest & ensure all patches found. Update dv patch
CormacKinsella Dec 1, 2025
ea012b8
syntax
CormacKinsella Dec 1, 2025
b086005
Merge branch 'main' into merge_dev
CormacKinsella Dec 1, 2025
3027773
Merge branch 'main' into merge_dev
CormacKinsella Dec 2, 2025
94a838e
Merge branch 'main' into merge_dev
CormacKinsella Feb 23, 2026
781f659
Merge_dev suggestions (#343)
CormacKinsella Mar 3, 2026
b37c873
rm nac from nextflow.config
CormacKinsella Mar 3, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 55 additions & 6 deletions configs/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -259,7 +259,7 @@ process {
withName: 'MINIMAP2_ALIGN_READS' {
tag = { meta.assembly.build }
ext.args = "-x map-hifi"
ext.prefix = { "${meta.id}_" + (reads.getName().endsWith(".gz") ? reads.getBaseName(2) : reads.getBaseName()) }
ext.prefix = { "${meta.id}_${reads.getBaseName( reads.getName().endsWith(".gz") ? 2 : 1 )}" }
}
withName: 'MINIMAP2_ALIGN_ASSEMBLY_.*' {
tag = { meta.assembly.build }
Expand Down Expand Up @@ -307,13 +307,66 @@ process {
]
}

// POLISH
withName: 'DVPOLISH_PBMM2_ALIGN' {
ext.args = '-N 1 -l 4000'
cpus = 20
}
// the samtools view process was modified and the
// qname is commented out in the process, but is used to link
// a bed file for -L filtering
withName: 'SAMTOOLS_VIEW' {
ext.prefix = {"${input.baseName}_${index.baseName}"}
cpus = 10
ext.args = {"-bh -F 2308 -M -L ${qname}"}
}
withName: 'SAMTOOLS_MERGE' {
ext.prefix = {"${meta.id}_${meta.mergeID}"}
cpus = 4
}
withName: 'DEEPVARIANT' {
cpus = 20
ext.args = {"--model_type=PACBIO"}
ext.prefix = {"${meta.id}_${meta.mergeID}"}
}
withName: 'BCFTOOLS_VIEW' {
cpus = 6
ext.args = {"-f 'PASS' -i 'GT=\"1/1\"' --no-version -Oz"}
ext.prefix = {"${meta.id}_${meta.mergeID}_filt"}
}
withName: 'BCFTOOLS_MERGE' {
cpus = 4
ext.args = {"--force-samples -Oz"}
}
withName: 'BCFTOOLS_CONSENSUS' {
ext.prefix = {"${meta.id}_consensus"}
}
withName: 'DVPOLISH_CREATE_FINALASM' {
ext.prefix = {"${meta.id}_dvpolish"}
}
withName: 'DVPOLISH_CHUNKFA' {
// split assembly file into smaller pieces of size: chunk_size
ext.args = 'chunk_size = 100.MB'
}
withName: 'DVPOLISH_CHUNKFA' {
// split assembly file into smaller pieces of size: chunk_size
ext.args = 'chunk_size = 100.MB'
}
withName: 'DVPOLISH_.*' {
tag = { meta.assembly.build }
publishDir = [
path: { "$params.outdir/$stage.polish/dvpolish/$meta.assembly.build" },
mode: params.publish_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}

// SCAFFOLD
withName: 'BWAMEM2_MEM_SCAFFOLD' {
ext.prefix = { "${meta.id}_${meta.assembly.build}_${reads.head().getBaseName(reads.head().name.endsWith(".gz") ? 2 : 1)}" }
ext.args = '-SP -T0'
ext.args2 = { sort_bam ? "--write-index" : "" }
}

withName: 'PAIRTOOLS' {
ext.prefix = { "${meta.id}_${meta.assembly.build}" }
ext.args = { "--min-mapq 20 --walks-policy 5unique --max-inter-align-gap 30" }
Expand All @@ -326,7 +379,6 @@ process {
pattern: "*.stat"
]
}

withName: 'YAHS' {
ext.prefix = { "${meta.id}_${meta.assembly.build}" }
ext.args = { params.hic_type == "arima_v2" ? "-e GATC,GANTC,CTNAG,TTAA" : params.hic_type == "arima_v1" ? "-e GATC,GANTC" : "" }
Expand All @@ -352,16 +404,13 @@ process {
ext.args = { "${params.hic_map_qv}" } // min hic map quality
ext.args2 = '--write-index -l1'
}

withName: 'SCAFFOLD_CURATION:SAMTOOLS_MERGE_(HIFI|HIC)' {
ext.prefix = { "${meta.assembly.build}_merged" }
ext.args = '--write-index'
}

withName: 'SCAFFOLD_CURATION:BIOBAMBAM_BAMMARKDUPLICATES2' {
ext.prefix = { "${bam.getBaseName(1)}_dupMarked" }
}

withName: 'SCAFFOLD_CURATION:BAM2BED_SORT' {
ext.prefix = { "${bam.head().getBaseName(1)}" }
ext.args = '-u -F0x400 -h' // arguments for samtools view
Expand Down
11 changes: 10 additions & 1 deletion main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,8 @@ include { ASSEMBLE_ORGANELLES } from "./subworkflo
include { DECONTAMINATE } from "./subworkflows/local/04_decontaminate/main"
// Purge duplicates
include { PURGE_DUPLICATES } from "./subworkflows/local/05_purge_dups/main"
// Polish
include { DVPOLISH } from "./subworkflows/local/06_polish/main"
// Scaffold
include { SCAFFOLD } from "./subworkflows/local/07_scaffold/main"
// Curation
Expand Down Expand Up @@ -184,7 +186,14 @@ workflow {
).dump(tag: 'Assemblies: to polish', pretty: true)
if ( 'polish' in workflow_steps ) {
// Run polishers
ch_polished_assemblies = ch_to_polish
DVPOLISH (
ch_to_polish,
ch_hifi,
BUILD_MERYL_HIFI_DATABASE.out.uniondb
)
ch_evaluate_assemblies = ch_evaluate_assemblies.mix( DVPOLISH.out.assemblies )
ch_polished_assemblies = DVPOLISH.out.assemblies
//TODO: logs, versions
} else {
ch_polished_assemblies = ch_to_polish
}
Expand Down
68 changes: 68 additions & 0 deletions modules/local/dvpolish/chunkfa.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
process DVPOLISH_CHUNKFA {
tag "$meta.id"
label 'process_single'

conda "${moduleDir}/environment.yml"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/ubuntu:20.04' :
'nf-core/ubuntu:20.04' }"


input:
tuple val(meta), path(fai)

output:
tuple val(meta), path ("*.bed", arity: '1..*') , emit: bed
path "versions.yml" , emit: versions

when:
task.ext.when == null || task.ext.when

script:
def args = task.ext.args ?: ''
def prefix = task.ext.prefix ?: "${meta.id}"

// Split the string by spaces, then iterate to create a map
// assumption all arguments are provided like key1 = value1 key2 = value2 ...
def args_map = [:]
args.replaceAll(' = ', '=').split().each {
def (key, value) = it.split('=')
args_map[key.trim()] = value.trim()
}
def chunk_size = MemoryUnit.of(args_map['chunk_size'] ?: '90MB').toBytes()

"""
# convert chunk size into base pairs
awk -v chunk_size_inBases=${chunk_size} -v prefix=$prefix 'BEGIN {
block=1
cum_basecount=0
}{
output_file = sprintf("%s_chunk_%d.bed", prefix, block)
printf("%s\\t0\\t%s\\n", \$1, \$2) > output_file
cum_basecount+=\$2

if (cum_basecount >= chunk_size_inBases)
{
cum_basecount=0
block+=1
}
}' $fai

cat <<-END_VERSIONS > versions.yml
"${task.process}":
dvpolish: \$(awk --version |& sed '1!d')
END_VERSIONS
"""

stub:
def args = task.ext.args ?: ''
def prefix = task.ext.prefix ?: "${meta.id}"
"""
touch ${prefix}_chunk_1.bed

cat <<-END_VERSIONS > versions.yml
"${task.process}":
dvpolish: \$(awk --version |& sed '1!d')
END_VERSIONS
"""
}
78 changes: 78 additions & 0 deletions modules/local/dvpolish/createFinalAsm.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
process DVPOLISH_CREATE_FINALASM {
tag "$meta.id"
label 'process_single'

conda 'bioconda::seqkit=2.8.2'
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/seqkit:2.8.2--h9ee0642_0' :
'nf-core/seqkit:2.8.2--h9ee0642_0' }"

input:
tuple val(meta), path(unpol_fasta), path(unpol_merqury_csv) // meta map, unpolished assembly, corresponding merqury qv file
tuple val(meta2), path(pol_fasta), path(pol_merqury_csv) // meta map, polished assembly, corresponding merqury qv file

output:
tuple val(meta), path('*.fasta.gz') , emit: fasta_gz
path "versions.yml" , emit: versions

when:
task.ext.when == null || task.ext.when

script:
def args = task.ext.args ?: ''
def prefix = task.ext.prefix ?: "${meta.id}"
def unpol_name = unpol_fasta.getBaseName()
def unpol_ext = unpol_fasta.getExtension()
def pol_name = pol_fasta.getBaseName()
def pol_ext = pol_fasta.getExtension()

"""
nl_unpol_ASM=\$(wc -l < ${unpol_merqury_csv})
nl_pol_ASM=\$(wc -l < ${pol_merqury_csv})

if [[ \${nl_unpol_ASM} -ne \${nl_pol_ASM} ]]
then
>&2 echo "[ERROR] DVPOLISH_CREATE_FINALASM: merqury files have different lines: ${unpol_merqury_csv}: \${nl_unpol_ASM} != ${pol_merqury_csv}: \${nl_pol_ASM}"
exit 1
fi

#split unpolished assembly by sequence ID
seqkit split -i -O unpolished_asm ${unpol_fasta}

#split polished assembly by sequence ID
seqkit split -i -O polished_asm ${pol_fasta}

l=1
while [[ \$l -le \${nl_pol_ASM} ]]
do
# read qv string in line p from unpolished and polished merqury files into bash array l_uasm and l_pasm respectively
IFS='\t' read -r -a l_uasm <<< "\$(sed -n \${l}p ${unpol_merqury_csv})"
IFS='\t' read -r -a l_pasm <<< "\$(sed -n \${l}p ${pol_merqury_csv})"

# check if the contig names (column 1) are the same
if [[ "\${l_uasm[0]}" != "\${l_pasm[0]}" ]]
then
>&2 echo "[ERROR] DVPOLISH_CREATE_FINALASM: merqury files are not in the same order!"
>&2 echo "[ERROR] file: ${unpol_fasta} line \$l: \${l_uasm[*]}"
>&2 echo "[ERROR] file: ${pol_fasta} line \$l: \${l_pasm[*]}"
exit 2
fi

# compare number of errorneous kmers (column 2)
if [[ \${l_uasm[1]} -le \${l_pasm[1]} ]] # unpolished assembly has fewer errors, or no difference -> go with the unpolished assembly
then
cat unpolished_asm/${unpol_name}.part_\${l_uasm[0]}.${unpol_ext}
>&2 echo "[WARNING] DVPOLISH_CREATE_FINALASM: unpolished contig \${l_uasm[0]} has better or equal QV: \${l_uasm[3]} vs \${l_pasm[3]}"
else # polished assembly has fewer errors
cat polished_asm/${pol_name}.part_\${l_pasm[0]}.${pol_ext}
fi

l=\$((l+1))
done | gzip -c > ${prefix}.fasta.gz

cat <<-END_VERSIONS > versions.yml
"${task.process}":
dvpolish: \$(seqkit version)
END_VERSIONS
"""
}
7 changes: 7 additions & 0 deletions modules/local/dvpolish/environment.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
name: cat_fastq
channels:
- conda-forge
- bioconda
- defaults
dependencies:
- conda-forge::sed=4.7
42 changes: 42 additions & 0 deletions modules/local/dvpolish/pbmm2_align.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
process DVPOLISH_PBMM2_ALIGN {
tag "$meta.id"
label 'process_medium'

// Note: the versions here need to match the versions used in pbmm2/index
conda 'bioconda::pbmm2=1.13.1'
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/pbmm2:1.13.1--h9ee0642_0' :
'biocontainers/pbmm2:1.13.1--h9ee0642_0' }"

input:
tuple val(meta), path(reads)
tuple val(meta2), path(reference)

output:
tuple val(meta), path("*.bam"), path("*.bai"), emit: bam_bai
path "versions.yml" , emit: versions

when:
task.ext.when == null || task.ext.when

script:
def args = task.ext.args ?: ''
def args2 = task.ext.args2 ?: ''

def out_name_part1 = reference.name.endsWith(".gz") ? reference.getBaseName(2) : reference.baseName
def out_name_part2 = reads.name.endsWith(".gz") ? reads.getBaseName(2) : reads.baseName

"""
pbmm2 align --sort \\
$args \\
-j $task.cpus \\
"$reference" \\
"$reads" \\
${out_name_part1}_${out_name_part2}.bam

cat <<-END_VERSIONS > versions.yml
"${task.process}":
\$(pbmm2 --version 2>&1 | head -n 1)
END_VERSIONS
"""
}
34 changes: 34 additions & 0 deletions modules/local/dvpolish/pbmm2_index.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
process DVPOLISH_PBMM2_INDEX {
label 'process_medium'

// Note: the versions here need to match the versions used in pbmm2/align
conda 'bioconda::pbmm2=1.13.1'
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/pbmm2:1.13.1--h9ee0642_0' :
'biocontainers/pbmm2:1.13.1--h9ee0642_0' }"

input:
tuple val(meta), path(fasta)

output:
tuple val(meta), path("*.mmi"), emit: index
path "versions.yml" , emit: versions

when:
task.ext.when == null || task.ext.when

script:
def args = task.ext.args ?: ''
"""
pbmm2 index\\
-j $task.cpus \\
$args \\
$fasta \\
${fasta.baseName}.mmi

cat <<-END_VERSIONS > versions.yml
"${task.process}":
\$(pbmm2 --version 2>&1 | head -n 1)
END_VERSIONS
"""
}
7 changes: 7 additions & 0 deletions modules/nf-core/bcftools/consensus/environment.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading