-
Notifications
You must be signed in to change notification settings - Fork 12
Merge dev features to main #308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
CormacKinsella
wants to merge
82
commits into
main
Choose a base branch
from
merge_dev
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from 71 commits
Commits
Show all changes
82 commits
Select commit
Hold shift + click to select a range
6873749
initial commit of dvpolish workflow
MartinPippel 79cb1bd
add modules for error polishing workflow
MartinPippel 226e688
update modules.json file and add missing patch files
MartinPippel 635aefe
fix typo
MartinPippel a18184e
fix typo
MartinPippel 083bd6f
add some code pieces to get started on nac
MartinPippel 5acc002
add nac profile
MartinPippel 28f7061
first trial to include dvpolish workflow into EBP-pilot
MartinPippel 4bd979c
add some debug info to understand why consensus step is not executed
MartinPippel 6d0f383
call DVpolish subworkflow in refactored main.nf
MartinPippel 7e93ca0
bugfix, DVpolish use two input channels
MartinPippel fe8e8c4
bugfix input
MartinPippel 3ec6392
bugfix typo
MartinPippel 6981dc3
finished dvpolish pipeline - ready for testing
MartinPippel 5b62bbb
code cleanup, bugfix dvpolish when multiple read files are present
MartinPippel 0c2ea8c
bugfix dvpolish: use unique_assembly_ch
MartinPippel f9e2a2d
add default chunk_size of 50M
MartinPippel 6e9f78a
bugfix: remove optional flag for pbmm2 align output files
MartinPippel 08bc4a0
use better branch names muliples and singleton
MartinPippel 3ccafa7
add chunk_size via ext.args, convert size units via NextFlow getBytes…
MartinPippel d96ae59
bugfix for parsing chunk_size values
MartinPippel 945006b
refactor channel handling for samtools view process
MartinPippel d6abfbc
refactor channel handling for samtools view process v2
MartinPippel c379b77
refactor channel handling for samtools view process v3
MartinPippel 9a5965d
refactor pbmm2 align module - combine bam and bai into one output tuple
MartinPippel 98ca0c1
refactor dvpolish main.nf
MartinPippel 3c39a85
code cleanuo
MartinPippel ee64e8f
bugfix misplaced comma, rename channels
MartinPippel b8e142d
samtools view: provide qname to -L argument which is now the bed chun…
MartinPippel 62e9d2e
refactor input channel for samtools view process
MartinPippel 797532c
bugfix deepvariant_ch channel logic
MartinPippel 9608ab4
cleanup debug messages
MartinPippel 23241a2
add modified usage of samtools view process
MartinPippel 1d99b3a
remove nf-core todo comments
MartinPippel e692c24
bugfix: sync input channels of deepvariant process
MartinPippel 8457b41
bugfix typo
MartinPippel ff93e67
bugfix: sync input channels of bcftools merge process
MartinPippel d36c4aa
bugfix typo muliMap
MartinPippel 60d4b7c
bugfix use combineByMetaKeys instead of joinByMetaKeys for dv_input c…
MartinPippel eb469a9
start to integrate meryl + merqury into dvpolish pipeline
MartinPippel 180b5c7
run merqury on dv_input assembly
MartinPippel ca2f4d5
bufgix module path
MartinPippel a51e886
bugfix merquery integration into dvpolish
MartinPippel 2716ac2
bufgix correct input channel order for merqury call
MartinPippel 42ac63d
run merqury on polished assembly
MartinPippel ec88039
add new module that creates the best polished assembly
MartinPippel f5abfef
add new channels for testing createFinalAsm process
MartinPippel 28c0c12
comment out some code, that needs to be executed at runtime
MartinPippel 3341d20
comment out more code, that needs to be executed at runtime
MartinPippel 6793ecc
make unpolished and polished assembly names unique
MartinPippel 9205daf
add last step of dvpolish pipeline
MartinPippel 9a166ca
add publishDir for dvpolish processes
MartinPippel 663f79c
correct final output assembly of dvpolish subworkflow
MartinPippel 4019f77
bugfix - use scaffold_qv instead of assembly_qv
MartinPippel 6e0bf22
bugfix typos and comment out internal variable nl_pol_ASM
MartinPippel 94359bd
bugfix - FastK seems to work on fasta.gz files but not on fa.gz files…
MartinPippel ce01b41
swtich from getSimpleName to getBaseName, because polished was remove…
MartinPippel b95ffeb
bugfix file naming: _part -> .part
MartinPippel 40663ec
bugfix, createFinalAsm.nf should append contigs to result file, and n…
MartinPippel e06c279
bugfix, update expected output files of dvpolish: fa.gz -> fasta.gz
MartinPippel bf2b8f5
Update configs/modules.config
MartinPippel 2944627
add meta.assembly.build to dvpolish publish path
MartinPippel 39436c5
add small description to cryptic bash lines of createFinalAsm process
MartinPippel 8951f02
replace path_closure with transpose channel operator -> needs to be t…
MartinPippel 3379abf
fix typo in MINIMAP2_ALIGN_READS configuration
MartinPippel 04cbce0
manually reformat indentation
MartinPippel ee00abc
align polish subwf with refactored structure
CormacKinsella 6f655d9
move polish section, remove redundant evaluate section
CormacKinsella 555a08f
whitespace
CormacKinsella b03c661
mix polished with other assemblies for evaluate
CormacKinsella 6e46ab1
compatible with function refactor
CormacKinsella dd2a9ee
Merge branch 'main' into merge_dev
CormacKinsella db56baa
update arg
CormacKinsella 9bba989
relocate nf-core module to subtool dir
CormacKinsella fa49f25
remove redundant file
CormacKinsella f28607a
update nf-core manifest & ensure all patches found. Update dv patch
CormacKinsella ea012b8
syntax
CormacKinsella b086005
Merge branch 'main' into merge_dev
CormacKinsella 3027773
Merge branch 'main' into merge_dev
CormacKinsella 94a838e
Merge branch 'main' into merge_dev
CormacKinsella 781f659
Merge_dev suggestions (#343)
CormacKinsella b37c873
rm nac from nextflow.config
CormacKinsella File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,68 @@ | ||
| process DVPOLISH_CHUNKFA { | ||
| tag "$meta.id" | ||
| label 'process_single' | ||
|
|
||
| conda "${moduleDir}/environment.yml" | ||
| container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? | ||
| 'https://depot.galaxyproject.org/singularity/ubuntu:20.04' : | ||
| 'nf-core/ubuntu:20.04' }" | ||
|
|
||
|
|
||
| input: | ||
| tuple val(meta), path(fai) | ||
|
|
||
| output: | ||
| tuple val(meta), path ("*.bed", arity: '1..*') , emit: bed | ||
| path "versions.yml" , emit: versions | ||
|
|
||
| when: | ||
| task.ext.when == null || task.ext.when | ||
|
|
||
| script: | ||
| def args = task.ext.args ?: '' | ||
| def prefix = task.ext.prefix ?: "${meta.id}" | ||
|
|
||
| // Split the string by spaces, then iterate to create a map | ||
| // assumption all arguments are provided like key1 = value1 key2 = value2 ... | ||
| def args_map = [:] | ||
| args.replaceAll(' = ', '=').split().each { | ||
| def (key, value) = it.split('=') | ||
| args_map[key.trim()] = value.trim() | ||
| } | ||
| def chunk_size = MemoryUnit.of(args_map['chunk_size'] ?: '90MB').toBytes() | ||
|
|
||
| """ | ||
| # convert chunk size into base pairs | ||
| awk -v chunk_size_inBases=${chunk_size} -v prefix=$prefix 'BEGIN { | ||
| block=1 | ||
| cum_basecount=0 | ||
| }{ | ||
| output_file = sprintf("%s_chunk_%d.bed", prefix, block) | ||
| printf("%s\\t0\\t%s\\n", \$1, \$2) > output_file | ||
| cum_basecount+=\$2 | ||
|
|
||
| if (cum_basecount >= chunk_size_inBases) | ||
| { | ||
| cum_basecount=0 | ||
| block+=1 | ||
| } | ||
| }' $fai | ||
|
|
||
| cat <<-END_VERSIONS > versions.yml | ||
| "${task.process}": | ||
| dvpolish: \$(awk --version |& sed '1!d') | ||
| END_VERSIONS | ||
| """ | ||
|
|
||
| stub: | ||
| def args = task.ext.args ?: '' | ||
| def prefix = task.ext.prefix ?: "${meta.id}" | ||
| """ | ||
| touch ${prefix}_chunk_1.bed | ||
|
|
||
| cat <<-END_VERSIONS > versions.yml | ||
| "${task.process}": | ||
| dvpolish: \$(awk --version |& sed '1!d') | ||
| END_VERSIONS | ||
| """ | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,78 @@ | ||
| process DVPOLISH_CREATE_FINALASM { | ||
| tag "$meta.id" | ||
| label 'process_single' | ||
|
|
||
| conda 'bioconda::seqkit=2.8.2' | ||
| container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? | ||
| 'https://depot.galaxyproject.org/singularity/seqkit:2.8.2--h9ee0642_0' : | ||
| 'nf-core/seqkit:2.8.2--h9ee0642_0' }" | ||
|
|
||
| input: | ||
| tuple val(meta), path(unpol_fasta), path(unpol_merqury_csv) // meta map, unpolished assembly, corresponding merqury qv file | ||
| tuple val(meta2), path(pol_fasta), path(pol_merqury_csv) // meta map, polished assembly, corresponding merqury qv file | ||
|
|
||
| output: | ||
| tuple val(meta), path('*.fasta.gz') , emit: fasta_gz | ||
| path "versions.yml" , emit: versions | ||
|
|
||
| when: | ||
| task.ext.when == null || task.ext.when | ||
|
|
||
| script: | ||
| def args = task.ext.args ?: '' | ||
| def prefix = task.ext.prefix ?: "${meta.id}" | ||
| def unpol_name = unpol_fasta.getBaseName() | ||
| def unpol_ext = unpol_fasta.getExtension() | ||
| def pol_name = pol_fasta.getBaseName() | ||
| def pol_ext = pol_fasta.getExtension() | ||
|
|
||
| """ | ||
| nl_unpol_ASM=\$(wc -l < ${unpol_merqury_csv}) | ||
| nl_pol_ASM=\$(wc -l < ${pol_merqury_csv}) | ||
|
|
||
| if [[ \${nl_unpol_ASM} -ne \${nl_pol_ASM} ]] | ||
| then | ||
| >&2 echo "[ERROR] DVPOLISH_CREATE_FINALASM: merqury files have different lines: ${unpol_merqury_csv}: \${nl_unpol_ASM} != ${pol_merqury_csv}: \${nl_pol_ASM}" | ||
| exit 1 | ||
| fi | ||
|
|
||
| #split unpolished assembly by sequence ID | ||
| seqkit split -i -O unpolished_asm ${unpol_fasta} | ||
|
|
||
| #split polished assembly by sequence ID | ||
| seqkit split -i -O polished_asm ${pol_fasta} | ||
|
|
||
| l=1 | ||
| while [[ \$l -le \${nl_pol_ASM} ]] | ||
| do | ||
| # read qv string in line p from unpolished and polished merqury files into bash array l_uasm and l_pasm respectively | ||
| IFS='\t' read -r -a l_uasm <<< "\$(sed -n \${l}p ${unpol_merqury_csv})" | ||
| IFS='\t' read -r -a l_pasm <<< "\$(sed -n \${l}p ${pol_merqury_csv})" | ||
|
|
||
| # check if the contig names (column 1) are the same | ||
| if [[ "\${l_uasm[0]}" != "\${l_pasm[0]}" ]] | ||
| then | ||
| >&2 echo "[ERROR] DVPOLISH_CREATE_FINALASM: merqury files are not in the same order!" | ||
| >&2 echo "[ERROR] file: ${unpol_fasta} line \$l: \${l_uasm[*]}" | ||
| >&2 echo "[ERROR] file: ${pol_fasta} line \$l: \${l_pasm[*]}" | ||
| exit 2 | ||
| fi | ||
|
|
||
| # compare number of errorneous kmers (column 2) | ||
| if [[ \${l_uasm[1]} -le \${l_pasm[1]} ]] # unpolished assembly has fewer errors, or no difference -> go with the unpolished assembly | ||
| then | ||
| cat unpolished_asm/${unpol_name}.part_\${l_uasm[0]}.${unpol_ext} | ||
| >&2 echo "[WARNING] DVPOLISH_CREATE_FINALASM: unpolished contig \${l_uasm[0]} has better or equal QV: \${l_uasm[3]} vs \${l_pasm[3]}" | ||
| else # polished assembly has fewer errors | ||
| cat polished_asm/${pol_name}.part_\${l_pasm[0]}.${pol_ext} | ||
| fi | ||
|
|
||
| l=\$((l+1)) | ||
| done | gzip -c > ${prefix}.fasta.gz | ||
|
|
||
| cat <<-END_VERSIONS > versions.yml | ||
| "${task.process}": | ||
| dvpolish: \$(seqkit version) | ||
| END_VERSIONS | ||
| """ | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| name: cat_fastq | ||
| channels: | ||
| - conda-forge | ||
| - bioconda | ||
| - defaults | ||
| dependencies: | ||
| - conda-forge::sed=4.7 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,42 @@ | ||
| process DVPOLISH_PBMM2_ALIGN { | ||
| tag "$meta.id" | ||
| label 'process_medium' | ||
|
|
||
| // Note: the versions here need to match the versions used in pbmm2/index | ||
| conda 'bioconda::pbmm2=1.13.1' | ||
| container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? | ||
| 'https://depot.galaxyproject.org/singularity/pbmm2:1.13.1--h9ee0642_0' : | ||
| 'biocontainers/pbmm2:1.13.1--h9ee0642_0' }" | ||
|
|
||
| input: | ||
| tuple val(meta), path(reads) | ||
| tuple val(meta2), path(reference) | ||
|
|
||
| output: | ||
| tuple val(meta), path("*.bam"), path("*.bai"), emit: bam_bai | ||
| path "versions.yml" , emit: versions | ||
|
|
||
| when: | ||
| task.ext.when == null || task.ext.when | ||
|
|
||
| script: | ||
| def args = task.ext.args ?: '' | ||
| def args2 = task.ext.args2 ?: '' | ||
|
|
||
| def out_name_part1 = reference.name.endsWith(".gz") ? reference.getBaseName(2) : reference.baseName | ||
| def out_name_part2 = reads.name.endsWith(".gz") ? reads.getBaseName(2) : reads.baseName | ||
|
|
||
| """ | ||
| pbmm2 align --sort \\ | ||
| $args \\ | ||
| -j $task.cpus \\ | ||
| "$reference" \\ | ||
| "$reads" \\ | ||
| ${out_name_part1}_${out_name_part2}.bam | ||
|
|
||
| cat <<-END_VERSIONS > versions.yml | ||
| "${task.process}": | ||
| \$(pbmm2 --version 2>&1 | head -n 1) | ||
| END_VERSIONS | ||
| """ | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,34 @@ | ||
| process DVPOLISH_PBMM2_INDEX { | ||
| label 'process_medium' | ||
|
|
||
| // Note: the versions here need to match the versions used in pbmm2/align | ||
| conda 'bioconda::pbmm2=1.13.1' | ||
| container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? | ||
| 'https://depot.galaxyproject.org/singularity/pbmm2:1.13.1--h9ee0642_0' : | ||
| 'biocontainers/pbmm2:1.13.1--h9ee0642_0' }" | ||
|
|
||
| input: | ||
| tuple val(meta), path(fasta) | ||
|
|
||
| output: | ||
| tuple val(meta), path("*.mmi"), emit: index | ||
| path "versions.yml" , emit: versions | ||
|
|
||
| when: | ||
| task.ext.when == null || task.ext.when | ||
|
|
||
| script: | ||
| def args = task.ext.args ?: '' | ||
| """ | ||
| pbmm2 index\\ | ||
| -j $task.cpus \\ | ||
| $args \\ | ||
| $fasta \\ | ||
| ${fasta.baseName}.mmi | ||
|
|
||
| cat <<-END_VERSIONS > versions.yml | ||
| "${task.process}": | ||
| \$(pbmm2 --version 2>&1 | head -n 1) | ||
| END_VERSIONS | ||
| """ | ||
| } |
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.