diff --git a/CHANGELOG.md b/CHANGELOG.md
index d42fcdde3..47f3c00bf 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -10,6 +10,19 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Enhancements and fixes
- [PR #1608](https://github.com/nf-core/rnaseq/pull/1608) - Bump version after release 3.21.0
+- [PR #1616](https://github.com/nf-core/rnaseq/pull/1616) - Add Sylph for contamination detection.
+
+| Old parameter | New parameter |
+| ------------- | -------------------|
+| | `--sylph_db` |
+| | `--sylph_taxonomy` |
+
+### Software dependencies
+
+| Dependency | Old version | New version |
+| -----------| ----------- | ----------- |
+| `sylph` | | 0.7.0 |
+| `sylph-tax`| | 1.2.0 |
## [[3.21.0](https://github.com/nf-core/rnaseq/releases/tag/3.21.0)] - 2025-09-18
diff --git a/CITATIONS.md b/CITATIONS.md
index 5eaeea7f8..79a31ca13 100644
--- a/CITATIONS.md
+++ b/CITATIONS.md
@@ -88,6 +88,10 @@
> Kovaka S, Zimin AV, Pertea GM, Razaghi R, Salzberg SL, Pertea M. Transcriptome assembly from long-read RNA-seq alignments with StringTie2 Genome Biol. 2019 Dec 16;20(1):278. doi: 10.1186/s13059-019-1910-1. PubMed PMID: 31842956; PubMed Central PMCID: PMC6912988.
+- [Sylph](https://pubmed.ncbi.nlm.nih.gov/39379646/)
+
+ > Shaw J, Yu YW. Rapid species-level metagenome profiling and containment estimation with sylph. Nat Biotechnol. 2025 Aug;43(8):1348-1359. doi: 10.1038/s41587-024-02412-y. Epub 2024 Oct 8. PMID: 39379646; PMCID: PMC12339375.
+
- [Trim Galore!](https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/)
- [UMI-tools](https://pubmed.ncbi.nlm.nih.gov/28100584/)
diff --git a/README.md b/README.md
index 8b309334f..8c122d248 100644
--- a/README.md
+++ b/README.md
@@ -48,7 +48,9 @@
3. [`dupRadar`](https://bioconductor.org/packages/release/bioc/html/dupRadar.html)
4. [`Preseq`](http://smithlabresearch.org/software/preseq/)
5. [`DESeq2`](https://bioconductor.org/packages/release/bioc/html/DESeq2.html)
- 6. [`Kraken2`](https://ccb.jhu.edu/software/kraken2/) -> [`Bracken`](https://ccb.jhu.edu/software/bracken/) on unaligned sequences; _optional_
+ 6. Contamination detection on unaligned sequences; _optional_
+ 1. [`Kraken2`](https://ccb.jhu.edu/software/kraken2/) -> [`Bracken`](https://ccb.jhu.edu/software/bracken/)
+ 2. [`Sylph`](https://sylph-docs.github.io/)
15. Pseudoalignment and quantification ([`Salmon`](https://combine-lab.github.io/salmon/) or ['Kallisto'](https://pachterlab.github.io/kallisto/); _optional_)
16. Present QC for raw read, alignment, gene biotype, sample similarity, and strand-specificity checks ([`MultiQC`](http://multiqc.info/), [`R`](https://www.r-project.org/))
diff --git a/docs/output.md b/docs/output.md
index b47593a05..e985eae94 100644
--- a/docs/output.md
+++ b/docs/output.md
@@ -57,6 +57,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
- [featureCounts](#featurecounts)
- [DESeq2](#deseq2)
- [Kraken2/Bracken](#kraken2bracken)
+ - [Sylph](#Sylph)
- [MultiQC](#multiqc)
- [Pseudoalignment and quantification](#pseudoalignment-and-quantification)
- [Pseudoalignment](#pseudoalignment)
@@ -737,6 +738,21 @@ The plot on the left hand side shows the standard PC plot - notice the variable

+### Sylph
+
+
+Output files
+
+- `/contaminants/sylph`
+ - `*.tsv` Summary of containment ANI and abundances of detected species in the sample. See the [Sylph documentation](https://sylph-docs.github.io/Output-format/) for full details on the output format.
+ - `*.sylphmpa` Taxonomic report of unaligned reads from `sylph-tax`. See the [Sylph documentation](https://sylph-docs.github.io/sylph-tax-output-format/) for full details on the output format.
+
+
+
+[Sylph](https://sylph-docs.github.io/) is a metagenomic profiler that determines the species present in reads by statistically estimating containment ANI. These algorithms are run on unaligned sequences to detect potential contamination of samples. MultiQC TBD.
+
+#TODO add MultiQC info
+
### MultiQC
diff --git a/docs/usage.md b/docs/usage.md
index 00cac88b2..eb8300ab9 100644
--- a/docs/usage.md
+++ b/docs/usage.md
@@ -401,11 +401,15 @@ By default, the input GTF file will be filtered to ensure that sequence names co
The `--contaminant_screening` option is not currently available using ARM architecture ('-profile arm')
:::
-The pipeline provides the option to scan unaligned reads for contamination from other species using [Kraken2](https://ccb.jhu.edu/software/kraken2/), with the possibility of applying corrections from [Bracken](https://ccb.jhu.edu/software/bracken/). Since running Bracken is not computationally expensive, we recommend always using it to refine the abundance estimates generated by Kraken2.
+The pipeline provides the option to scan unaligned reads for contamination from other species using either [Sylph](https://sylph-docs.github.io/) or the [Kraken2](https://ccb.jhu.edu/software/kraken2/)/[Bracken](https://ccb.jhu.edu/software/bracken/) suite.
-It is important to note that the accuracy of Kraken2 is [highly dependent on the database](https://doi.org/10.1099/mgen.0.000949) used. Specifically, it is [crucial](https://doi.org/10.1128/mbio.01607-23) to ensure that the host genome is included in the database. If you are particularly concerned about certain contaminants, it may be beneficial to use a smaller, more focused database containing primarily those contaminants instead of the full standard database. Various pre-built databases [are available for download](https://benlangmead.github.io/aws-indexes/k2), and instructions for building a custom database can be found in the [Kraken2 documentation](https://github.com/DerrickWood/kraken2/blob/master/docs/MANUAL.markdown). Additionally, genomes of contaminants detected in previous sequencing experiments are available on the [OpenContami website](https://openlooper.hgc.jp/opencontami/help/help_oct.php).
+Sylph is a [faster and much more memory-efficient tool](https://doi.org/10.1038/s41587-024-02412-y) with about equal precision in species detection to Kraken2/Bracken. Sylph also has lower rates of false positives. However, Sylph does not assign specific reads to species; it only provides overall abundance estimates. Sylph abundance estimates also [cannot assign a certain percentage of reads as unclassified](https://github.com/bluenote-1577/sylph/issues/49).
-While Kraken2 is capable of detecting low-abundance contaminants in a sample, false positives can occur. Therefore, if only a very small number of reads from a contaminating species are detected, these results should be interpreted with caution.
+Pre-constructed sylph databases can be found [here](https://sylph-docs.github.io/pre%E2%80%90built-databases/) and taxonomies [here](https://sylph-docs.github.io/sylph-tax/). The [documentation](https://sylph-docs.github.io/sylph-tax/) also has instructions on creating custom databases/taxonomies. As a newer tool, the effect of database choice on Sylph's performance has not been explored as thoroughly as for Kraken2 or Bracken. However, the following comments on choosing databases for Kraken2 are very likely still applicable to an extent for Sylph.
+
+The accuracy of Kraken2 is [highly dependent on the database](https://doi.org/10.1099/mgen.0.000949) used. Specifically, it is [crucial](https://doi.org/10.1128/mbio.01607-23) to ensure that the host genome/transcriptome is included in the database. (Note that the pre-built sylph databases do _not_ appear to contain the human genome/transcriptome). If you are particularly concerned about certain contaminants, it may be beneficial to use a smaller, more focused database containing primarily those contaminants instead of the full standard database. Various pre-built databases [are available for download](https://benlangmead.github.io/aws-indexes/k2), and instructions for building a custom database can be found in the [Kraken2 documentation](https://github.com/DerrickWood/kraken2/blob/master/docs/MANUAL.markdown). Additionally, genomes of contaminants detected in previous sequencing experiments are available on the [OpenContami website](https://openlooper.hgc.jp/opencontami/help/help_oct.php).
+
+While Kraken2 is capable of detecting low-abundance contaminants in a sample, false positives can occur. Therefore, if only a very small number of reads from a contaminating species are detected, these results should be interpreted with caution. Lastly, while Kraken2 can be used without Bracken, since running Bracken is not computationally expensive, we recommend always using it to refine the abundance estimates generated by Kraken2.
## Running the pipeline
diff --git a/modules.json b/modules.json
index aa20e9320..07c7b56e2 100644
--- a/modules.json
+++ b/modules.json
@@ -262,6 +262,16 @@
"git_sha": "1f008221e451e7a4738226c49e69aaa2eb731369",
"installed_by": ["modules", "quantify_pseudo_alignment"]
},
+ "sylph/profile": {
+ "branch": "master",
+ "git_sha": "41dfa3f7c0ffabb96a6a813fe321c6d1cc5b6e46",
+ "installed_by": ["modules"]
+ },
+ "sylphtax/taxprof": {
+ "branch": "master",
+ "git_sha": "41dfa3f7c0ffabb96a6a813fe321c6d1cc5b6e46",
+ "installed_by": ["modules"]
+ },
"trimgalore": {
"branch": "master",
"git_sha": "05954dab2ff481bcb999f24455da29a5828af08d",
diff --git a/modules/nf-core/sylph/profile/environment.yml b/modules/nf-core/sylph/profile/environment.yml
new file mode 100644
index 000000000..ae8337cc8
--- /dev/null
+++ b/modules/nf-core/sylph/profile/environment.yml
@@ -0,0 +1,7 @@
+---
+# yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/environment-schema.json
+channels:
+ - conda-forge
+ - bioconda
+dependencies:
+ - bioconda::sylph=0.7.0
diff --git a/modules/nf-core/sylph/profile/main.nf b/modules/nf-core/sylph/profile/main.nf
new file mode 100644
index 000000000..1231bab39
--- /dev/null
+++ b/modules/nf-core/sylph/profile/main.nf
@@ -0,0 +1,51 @@
+process SYLPH_PROFILE {
+ tag "$meta.id"
+ label 'process_high'
+
+ conda "${moduleDir}/environment.yml"
+ container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
+ 'https://depot.galaxyproject.org/singularity/sylph:0.7.0--h919a2d8_0' :
+ 'biocontainers/sylph:0.7.0--h919a2d8_0' }"
+
+ input:
+ tuple val(meta), path(reads)
+ path(database)
+
+ output:
+ tuple val(meta), path('*.tsv'), emit: profile_out
+ path "versions.yml" , emit: versions
+
+ when:
+ task.ext.when == null || task.ext.when
+
+ script:
+ def args = task.ext.args ?: ''
+ def prefix = task.ext.prefix ?: "${meta.id}"
+ def input = meta.single_end ? "${reads}" : "-1 ${reads[0]} -2 ${reads[1]}"
+ """
+ sylph profile \\
+ -t $task.cpus \\
+ $args \\
+ $database\\
+ $input \\
+ -o ${prefix}.tsv
+
+ cat <<-END_VERSIONS > versions.yml
+ "${task.process}":
+ sylph: \$(sylph -V | awk '{print \$2}')
+ END_VERSIONS
+ """
+
+ stub:
+ def args = task.ext.args ?: ''
+ def prefix = task.ext.prefix ?: "${meta.id}"
+ def input = meta.single_end ? "${reads}" : "-1 ${reads[0]} -2 ${reads[1]}"
+ """
+ touch ${prefix}.tsv
+ cat <<-END_VERSIONS > versions.yml
+ "${task.process}":
+ sylph: \$(sylph -V | awk '{print \$2}')
+ END_VERSIONS
+ """
+
+}
diff --git a/modules/nf-core/sylph/profile/meta.yml b/modules/nf-core/sylph/profile/meta.yml
new file mode 100644
index 000000000..c78b0f33c
--- /dev/null
+++ b/modules/nf-core/sylph/profile/meta.yml
@@ -0,0 +1,59 @@
+name: "sylph_profile"
+description: Sylph profile command for taxonoming profiling
+keywords:
+ - profile
+ - metagenomics
+ - sylph
+ - classification
+tools:
+ - sylph:
+ description: Sylph quickly enables querying of genomes against even low-coverage
+ shotgun metagenomes to find nearest neighbour ANI.
+ homepage: https://github.com/bluenote-1577/sylph
+ documentation: https://github.com/bluenote-1577/sylph
+ tool_dev_url: https://github.com/bluenote-1577/sylph
+ doi: 10.1038/s41587-024-02412-y
+ licence: ["MIT"]
+ identifier: biotools:sylph
+input:
+ - - meta:
+ type: map
+ description: |
+ Groovy Map containing sample information
+ e.g. `[ id:'test', single_end:false ]`
+ - reads:
+ type: file
+ description: |
+ List of input FastQ/FASTA files of size 1 and 2 for single-end and paired-end data,
+ respectively. They are automatically sketched to .sylsp/.syldb
+ ontologies: []
+ - database:
+ type: file
+ description: Pre-sketched *.syldb/*.sylsp files. Raw single-end fastq/fasta are
+ allowed and will be automatically sketched to .sylsp/.syldb.
+ pattern: "*.{syldb,sylsp,fasta,fastq}"
+ ontologies:
+ - edam: http://edamontology.org/format_1930 # FASTQ
+output:
+ profile_out:
+ - - meta:
+ type: map
+ description: Groovy Map containing sample information
+ - "*.tsv":
+ type: file
+ description: Output file of species-level taxonomic profiling with abundances
+ and ANIs.
+ pattern: "*tsv"
+ ontologies: []
+ versions:
+ - versions.yml:
+ type: file
+ description: File containing software versions
+ pattern: "versions.yml"
+ ontologies:
+ - edam: http://edamontology.org/format_3750 # YAML
+authors:
+ - "@jiahang1234"
+ - "@sofstam"
+maintainers:
+ - "@sofstam"
diff --git a/modules/nf-core/sylph/profile/nextflow.config b/modules/nf-core/sylph/profile/nextflow.config
new file mode 100644
index 000000000..f54f711c0
--- /dev/null
+++ b/modules/nf-core/sylph/profile/nextflow.config
@@ -0,0 +1,12 @@
+if (!params.skip_qc) {
+ if (params.contaminant_screening in ['sylph']) {
+ process {
+ withName: 'SYLPH_PROFILE' {
+ publishDir = [
+ path: { "${params.outdir}/${params.aligner}/contaminants/sylph" },
+ mode: params.publish_dir_mode
+ ]
+ }
+ }
+ }
+}
\ No newline at end of file
diff --git a/modules/nf-core/sylph/profile/tests/main.nf.test b/modules/nf-core/sylph/profile/tests/main.nf.test
new file mode 100644
index 000000000..cfdddf685
--- /dev/null
+++ b/modules/nf-core/sylph/profile/tests/main.nf.test
@@ -0,0 +1,80 @@
+nextflow_process {
+
+ name "Test Process SYLPH_PROFILE"
+ script "../main.nf"
+ process "SYLPH_PROFILE"
+ tag "modules"
+ tag "modules_nfcore"
+ tag "sylph"
+ tag "sylph/profile"
+
+ test("sarscov2 illumina single-end [fastq_gz]") {
+ when {
+ process {
+ """
+ input[0] = [ [ id:'test',single_end:true ], // meta map
+ [
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true)
+ ]
+ ]
+ input[1] = file(params.modules_testdata_base_path +'genomics/sarscov2/genome/genome.fasta', checkIfExists: true)
+ """
+ }
+ }
+
+ then {
+ assert process.success
+ assert snapshot(
+ process.out.versions,
+ file(process.out.profile_out[0][1]).readLines()[0]
+ ).match()
+ }
+ }
+
+ test("sarscov2 illumina paired-end [fastq_gz]") {
+ when {
+ process {
+ """
+ input[0] = [ [ id:'test' ], // meta map
+ [
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true),
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_2.fastq.gz', checkIfExists: true)
+ ]
+ ]
+ input[1] = file(params.modules_testdata_base_path +'genomics/sarscov2/genome/genome.fasta', checkIfExists: true)
+ """
+ }
+ }
+
+ then {
+ assert process.success
+ assert snapshot(
+ process.out.versions,
+ file(process.out.profile_out[0][1]).readLines()[0]
+ ).match()
+ }
+ }
+
+ test("sarscov2 illumina paired-end [fastq_gz]-stub") {
+ options "-stub"
+
+ when {
+ process {
+ """
+ input[0] = [ [ id:'test' ], // meta map
+ [
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true),
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_2.fastq.gz', checkIfExists: true)
+ ]
+ ]
+ input[1] = file(params.modules_testdata_base_path +'genomics/sarscov2/genome/genome.fasta', checkIfExists: true)
+ """
+ }
+ }
+
+ then {
+ assert process.success
+ assert snapshot(process.out).match()
+ }
+ }
+}
diff --git a/modules/nf-core/sylph/profile/tests/main.nf.test.snap b/modules/nf-core/sylph/profile/tests/main.nf.test.snap
new file mode 100644
index 000000000..5541ce615
--- /dev/null
+++ b/modules/nf-core/sylph/profile/tests/main.nf.test.snap
@@ -0,0 +1,61 @@
+{
+ "sarscov2 illumina paired-end [fastq_gz]": {
+ "content": [
+ [
+ "versions.yml:md5,7b5a545483277cc0ff9189f8891e737f"
+ ],
+ "Sample_file\tGenome_file\tTaxonomic_abundance\tSequence_abundance\tAdjusted_ANI\tEff_cov\tANI_5-95_percentile\tEff_lambda\tLambda_5-95_percentile\tMedian_cov\tMean_cov_geq1\tContainment_ind\tNaive_ANI\tkmers_reassigned\tContig_name"
+ ],
+ "meta": {
+ "nf-test": "0.9.2",
+ "nextflow": "24.10.4"
+ },
+ "timestamp": "2025-03-05T11:07:00.061876287"
+ },
+ "sarscov2 illumina single-end [fastq_gz]": {
+ "content": [
+ [
+ "versions.yml:md5,7b5a545483277cc0ff9189f8891e737f"
+ ],
+ "Sample_file\tGenome_file\tTaxonomic_abundance\tSequence_abundance\tAdjusted_ANI\tEff_cov\tANI_5-95_percentile\tEff_lambda\tLambda_5-95_percentile\tMedian_cov\tMean_cov_geq1\tContainment_ind\tNaive_ANI\tkmers_reassigned\tContig_name"
+ ],
+ "meta": {
+ "nf-test": "0.9.2",
+ "nextflow": "24.10.4"
+ },
+ "timestamp": "2025-03-05T11:05:21.230604092"
+ },
+ "sarscov2 illumina paired-end [fastq_gz]-stub": {
+ "content": [
+ {
+ "0": [
+ [
+ {
+ "id": "test"
+ },
+ "test.tsv:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "1": [
+ "versions.yml:md5,7b5a545483277cc0ff9189f8891e737f"
+ ],
+ "profile_out": [
+ [
+ {
+ "id": "test"
+ },
+ "test.tsv:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ],
+ "versions": [
+ "versions.yml:md5,7b5a545483277cc0ff9189f8891e737f"
+ ]
+ }
+ ],
+ "meta": {
+ "nf-test": "0.9.2",
+ "nextflow": "24.10.4"
+ },
+ "timestamp": "2025-03-05T11:08:35.882851964"
+ }
+}
\ No newline at end of file
diff --git a/modules/nf-core/sylphtax/taxprof/environment.yml b/modules/nf-core/sylphtax/taxprof/environment.yml
new file mode 100644
index 000000000..517edcad5
--- /dev/null
+++ b/modules/nf-core/sylphtax/taxprof/environment.yml
@@ -0,0 +1,7 @@
+---
+# yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/environment-schema.json
+channels:
+ - conda-forge
+ - bioconda
+dependencies:
+ - "bioconda::sylph-tax=1.2.0"
diff --git a/modules/nf-core/sylphtax/taxprof/main.nf b/modules/nf-core/sylphtax/taxprof/main.nf
new file mode 100644
index 000000000..d7508b3a5
--- /dev/null
+++ b/modules/nf-core/sylphtax/taxprof/main.nf
@@ -0,0 +1,53 @@
+
+process SYLPHTAX_TAXPROF {
+ tag "$meta.id"
+ label 'process_medium'
+
+ conda "${moduleDir}/environment.yml"
+ container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
+ 'https://depot.galaxyproject.org/singularity/sylph-tax:1.2.0--pyhdfd78af_0':
+ 'biocontainers/sylph-tax:1.2.0--pyhdfd78af_0' }"
+
+ input:
+ tuple val(meta), path(sylph_results)
+ path taxonomy
+
+ output:
+ tuple val(meta), path("*.sylphmpa"), emit: taxprof_output
+ path "versions.yml" , emit: versions
+
+ when:
+ task.ext.when == null || task.ext.when
+
+ script:
+ def args = task.ext.args ?: ''
+ def prefix = task.ext.prefix ?: "${meta.id}"
+
+ """
+ export SYLPH_TAXONOMY_CONFIG="/tmp/config.json"
+ sylph-tax \\
+ taxprof \\
+ $sylph_results \\
+ $args \\
+ -t $taxonomy
+
+ mv *.sylphmpa ${prefix}.sylphmpa
+
+ cat <<-END_VERSIONS > versions.yml
+ "${task.process}":
+ sylph-tax: \$(sylph-tax --version)
+ END_VERSIONS
+ """
+
+ stub:
+ def prefix = task.ext.prefix ?: "${meta.id}"
+ """
+ export SYLPH_TAXONOMY_CONFIG="/tmp/config.json"
+ touch ${prefix}.sylphmpa
+
+ cat <<-END_VERSIONS > versions.yml
+ "${task.process}":
+ sylph-tax: \$(sylph-tax --version)
+ END_VERSIONS
+ """
+}
diff --git a/modules/nf-core/sylphtax/taxprof/meta.yml b/modules/nf-core/sylphtax/taxprof/meta.yml
new file mode 100644
index 000000000..c254b608b
--- /dev/null
+++ b/modules/nf-core/sylphtax/taxprof/meta.yml
@@ -0,0 +1,57 @@
+name: sylphtax_taxprof
+description: Incorporates taxonomy into sylph metagenomic classifier
+keywords:
+ - taxonomy
+ - sylph
+ - metagenomics
+tools:
+ - sylphtax:
+ description: Integrating taxonomic information into the sylph metagenome profiler.
+ homepage: https://github.com/bluenote-1577/sylph-tax?tab=readme-ov-file
+ documentation: https://sylph-docs.github.io/sylph-tax/
+ licence: ["MIT"]
+ identifier: ""
+input:
+ - - meta:
+ type: map
+ description: |
+ Groovy Map containing sample information
+ e.g. `[ id:'sample1', single_end:false ]`
+ - sylph_results:
+ type: file
+ description: Output results from sylph classifier. The database file(s) used
+ to create this file with sylph must be the same as those of the taxonomy input
+ channel of this module.
+ pattern: "*.{tsv}"
+ ontologies:
+ - edam: http://edamontology.org/format_3475 # TSV
+ - taxonomy:
+ type: file
+ description: A list of sylph-tax identifiers (e.g. GTDB_r220 or IMGVR_4.1). Multiple
+ taxonomy metadata files can be input. Custom taxonomy files are also possible.
+ ontologies: []
+output:
+ taxprof_output:
+ - - meta:
+ type: map
+ description: |
+ Groovy Map containing sample information
+ e.g. [ id:'test', single_end:false ]
+ pattern: "*{.sylphmpa}"
+ - "*.sylphmpa":
+ type: map
+ description: |
+ Groovy Map containing sample information
+ e.g. [ id:'test', single_end:false ]
+ pattern: "*{.sylphmpa}"
+ versions:
+ - versions.yml:
+ type: file
+ description: File containing software versions
+ pattern: "versions.yml"
+ ontologies:
+ - edam: http://edamontology.org/format_3750 # YAML
+authors:
+ - "@sofstam"
+maintainers:
+ - "@sofstam"
diff --git a/modules/nf-core/sylphtax/taxprof/nextflow.config b/modules/nf-core/sylphtax/taxprof/nextflow.config
new file mode 100644
index 000000000..505f70dc2
--- /dev/null
+++ b/modules/nf-core/sylphtax/taxprof/nextflow.config
@@ -0,0 +1,12 @@
+if (!params.skip_qc) {
+ if (params.contaminant_screening in ['sylph']) {
+ process {
+ withName: 'SYLPHTAX_TAXPROF' {
+ publishDir = [
+ path: { "${params.outdir}/${params.aligner}/contaminants/sylph" },
+ mode: params.publish_dir_mode
+ ]
+ }
+ }
+ }
+}
\ No newline at end of file
diff --git a/modules/nf-core/sylphtax/taxprof/tests/main.nf.test b/modules/nf-core/sylphtax/taxprof/tests/main.nf.test
new file mode 100644
index 000000000..0f0f9724d
--- /dev/null
+++ b/modules/nf-core/sylphtax/taxprof/tests/main.nf.test
@@ -0,0 +1,91 @@
+nextflow_process {
+
+ name "Test Process SYLPHTAX_TAXPROF"
+ script "../main.nf"
+ process "SYLPHTAX_TAXPROF"
+
+ tag "modules"
+ tag "modules_nfcore"
+ tag "sylph"
+ tag "sylph/profile"
+ tag "sylphtax"
+ tag "sylphtax/taxprof"
+
+
+ test("sarscov2 illumina single-end [fastq_gz]") {
+ setup {
+ run("SYLPH_PROFILE") {
+ script "../../../sylph/profile/main.nf"
+ process {
+ """
+ input[0] = [ [ id:'test', single_end:true ], // meta map
+ [
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true)
+ ]
+ ]
+ input[1] = file(params.modules_testdata_base_path +'genomics/sarscov2/genome/genome.fasta', checkIfExists: true)
+ """
+ }
+ }
+ }
+ when {
+ process {
+ """
+ input[0] = SYLPH_PROFILE.out.profile_out
+ input[1] = file('https://github.com/nf-core/test-datasets/raw/taxprofiler/data/database/sylph/test_taxonomy.tsv.gz', checkIfExists: true)
+ """
+ }
+ }
+
+ then {
+ assertAll(
+ { assert process.success },
+ { assert snapshot(
+ process.out.versions,
+ process.out.taxprof_output
+ ).match() }
+ )
+ }
+
+ }
+
+ test("stub sarscov2 illumina single-end [fastq_gz]") {
+
+ options '-stub'
+
+ setup {
+ run("SYLPH_PROFILE") {
+ script "../../../sylph/profile/main.nf"
+ process {
+ """
+ input[0] = [ [ id:'test' ], // meta map
+ [
+ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true)
+ ]
+ ]
+ input[1] = file(params.modules_testdata_base_path +'genomics/sarscov2/genome/genome.fasta', checkIfExists: true)
+ """
+ }
+ }
+ }
+ when {
+ process {
+ """
+ input[0] = SYLPH_PROFILE.out.profile_out
+ input[1] = file('https://github.com/nf-core/test-datasets/raw/taxprofiler/data/database/sylph/test_taxonomy.tsv.gz', checkIfExists: true)
+ """
+ }
+ }
+
+ then {
+ assertAll(
+ { assert process.success },
+ { assert snapshot(
+ process.out.versions,
+ process.out.taxprof_output
+ ).match() }
+ )
+ }
+ }
+
+}
diff --git a/modules/nf-core/sylphtax/taxprof/tests/main.nf.test.snap b/modules/nf-core/sylphtax/taxprof/tests/main.nf.test.snap
new file mode 100644
index 000000000..3c26e75ec
--- /dev/null
+++ b/modules/nf-core/sylphtax/taxprof/tests/main.nf.test.snap
@@ -0,0 +1,43 @@
+{
+ "stub sarscov2 illumina single-end [fastq_gz]": {
+ "content": [
+ [
+ "versions.yml:md5,bdbbd22b3e721ba2027d3e6cb1dc4bb4"
+ ],
+ [
+ [
+ {
+ "id": "test"
+ },
+ "test.sylphmpa:md5,d41d8cd98f00b204e9800998ecf8427e"
+ ]
+ ]
+ ],
+ "meta": {
+ "nf-test": "0.9.2",
+ "nextflow": "24.10.5"
+ },
+ "timestamp": "2025-04-07T15:28:04.026470884"
+ },
+ "sarscov2 illumina single-end [fastq_gz]": {
+ "content": [
+ [
+ "versions.yml:md5,bdbbd22b3e721ba2027d3e6cb1dc4bb4"
+ ],
+ [
+ [
+ {
+ "id": "test",
+ "single_end": true
+ },
+ "test.sylphmpa:md5,a9743c21a53ba766226e57d2a25f6167"
+ ]
+ ]
+ ],
+ "meta": {
+ "nf-test": "0.9.2",
+ "nextflow": "24.10.5"
+ },
+ "timestamp": "2025-04-07T15:27:55.45776116"
+ }
+}
\ No newline at end of file
diff --git a/nextflow.config b/nextflow.config
index 5aa7a7524..b48a0d780 100644
--- a/nextflow.config
+++ b/nextflow.config
@@ -100,6 +100,8 @@ params {
save_kraken_assignments = false
save_kraken_unassigned = false
bracken_precision = "S"
+ sylph_db = null
+ sylph_taxonomy = null
skip_rseqc = false
skip_biotype_qc = false
skip_deseq2_qc = false
diff --git a/nextflow_schema.json b/nextflow_schema.json
index d2bc59295..761ddd8d7 100644
--- a/nextflow_schema.json
+++ b/nextflow_schema.json
@@ -601,7 +601,7 @@
"type": "string",
"description": "Tool to use for detecting contaminants in unaligned reads - available options are 'kraken2' and 'kraken2_bracken'",
"fa_icon": "fas fa-virus-slash",
- "enum": ["kraken2", "kraken2_bracken"]
+ "enum": ["kraken2", "kraken2_bracken", "sylph"]
},
"kraken_db": {
"type": "string",
@@ -617,6 +617,20 @@
"description": "Taxonomic level for Bracken abundance estimations.",
"help_text": "Use the first letter of taxonomic levels: Domain, Phylum, Class, Order, Family, Genus, or Species.",
"enum": ["D", "P", "C", "O", "F", "G", "S"]
+ },
+ "sylph_db": {
+ "type": "string",
+ "format": "file-path",
+ "description": "Database when using Sylph for contamination detection",
+ "help_text": "See the usage documentation for more information on setting up and using Sylph databases.",
+ "fa_icon": "fas fa-database"
+ },
+ "sylph_taxonomy": {
+ "type": "string",
+ "format": "file-path",
+ "description": "Taxonomy when using Sylph for contamination detection/",
+ "help_text": "See the usage documentation for more information on Sylph taxonomies.",
+ "fa_icon": "fas fa-tree"
}
}
},
diff --git a/subworkflows/local/utils_nfcore_rnaseq_pipeline/main.nf b/subworkflows/local/utils_nfcore_rnaseq_pipeline/main.nf
index e5dd439ea..229572bca 100644
--- a/subworkflows/local/utils_nfcore_rnaseq_pipeline/main.nf
+++ b/subworkflows/local/utils_nfcore_rnaseq_pipeline/main.nf
@@ -303,7 +303,7 @@ def validateInputParameters() {
// Check that Kraken/Bracken database provided if using kraken2/bracken
if (params.contaminant_screening in ['kraken2', 'kraken2_bracken']) {
if (!params.kraken_db) {
- error("Contaminant screening set to kraken2 but not database is provided. Please provide a database with the --kraken_db option.")
+ error("Contaminant screening set to kraken2 but no database was provided. Please provide a database with the --kraken_db option.")
}
// Check that Kraken/Bracken parameters are not provided when Kraken2 is not being used
} else {
@@ -316,6 +316,20 @@ def validateInputParameters() {
}
}
+ // Check that Sylph database and taxonomy is provided if using Sylph
+ if (params.contaminant_screening == 'sylph') {
+ if (!params.sylph_db) {
+ error("Contaminant screening is set to Sylph but no database was provided. Please provide a database with the --sylph_db option.")
+ }
+ if (!params.sylph_taxonomy) {
+ error("Contaminant screening is set to Sylph but no taxonomy was provided. Please provide a taxonomy with the --sylph_taxonomy option.")
+ }
+ } else {
+ if (params.sylph_db || params.sylph_taxonomy) {
+ sylphArgumentsWithoutSylphUsageWarn()
+ }
+ }
+
// Check which RSeQC modules we are running
def valid_rseqc_modules = ['bam_stat', 'inner_distance', 'infer_experiment', 'junction_annotation', 'junction_saturation', 'read_distribution', 'read_duplication', 'tin']
def rseqc_modules = params.rseqc_modules ? params.rseqc_modules.split(',').collect{ it.trim().toLowerCase() } : []
@@ -533,17 +547,18 @@ def additionaFastaIndexWarn(index) {
}
//
-// Print a warning if --save_kraken_assignments or --save_kraken_unassigned is provided without --kraken_db
+// Print a warning if --save_kraken_assignments, --save_kraken_unassigned, or --kraken-db,
+// is provided without setting --contaminant-screening to 'kraken2' or 'kraken2_bracken'
//
def krakenArgumentsWithoutKrakenDBWarn() {
log.warn "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n" +
- " 'Kraken2 related arguments have been provided without setting contaminant\n" +
+ " Kraken2 related arguments have been provided without setting contaminant\n" +
" screening to Kraken2. Kraken2 is not being run so these will not be used.\n" +
"~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~"
}
///
-/// Print a warning if --bracken-precision is provided without --kraken_db
+/// Print a warning if --bracken-precision is provided without contaminant screening using kraken2
///
def brackenPrecisionWithoutKrakenDBWarn() {
log.warn "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n" +
@@ -552,6 +567,16 @@ def brackenPrecisionWithoutKrakenDBWarn() {
"~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~"
}
+//
+// Print a warning if --sylph_db or --sylph_taxonomy is provided without contaminant screening set to 'sylph'
+//
+def sylphArgumentsWithoutSylphUsageWarn() {
+ log.warn "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n" +
+ " Sylph related arguments have been provided without setting contaminant\n" +
+ " screening to Sylph. Sylph is not being run so these will not be used.\n" +
+ "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~"
+}
+
//
// Function to generate an error if contigs in genome fasta file > 512 Mbp
//
diff --git a/workflows/rnaseq/main.nf b/workflows/rnaseq/main.nf
index d3a3ded0c..753c5c493 100755
--- a/workflows/rnaseq/main.nf
+++ b/workflows/rnaseq/main.nf
@@ -43,6 +43,8 @@ include { STRINGTIE_STRINGTIE } from '../../modules/nf-core/stringtie/str
include { SUBREAD_FEATURECOUNTS } from '../../modules/nf-core/subread/featurecounts'
include { KRAKEN2_KRAKEN2 as KRAKEN2 } from '../../modules/nf-core/kraken2/kraken2/main'
include { BRACKEN_BRACKEN as BRACKEN } from '../../modules/nf-core/bracken/bracken/main'
+include { SYLPH_PROFILE } from '../../modules/nf-core/sylph/profile/main'
+include { SYLPHTAX_TAXPROF } from '../../modules/nf-core/sylphtax/taxprof/main'
include { MULTIQC } from '../../modules/nf-core/multiqc'
include { BEDTOOLS_GENOMECOV as BEDTOOLS_GENOMECOV_FW } from '../../modules/nf-core/bedtools/genomecov'
include { BEDTOOLS_GENOMECOV as BEDTOOLS_GENOMECOV_REV } from '../../modules/nf-core/bedtools/genomecov'
@@ -661,7 +663,21 @@ workflow RNASEQ {
ch_versions = ch_versions.mix(BRACKEN.out.versions)
ch_multiqc_files = ch_multiqc_files.mix(BRACKEN.out.txt.collect{it[1]})
}
- }
+ } else if (params.contaminant_screening == 'sylph') {
+ SYLPH_PROFILE (
+ ch_unaligned_sequences,
+ params.sylph_db
+ )
+ ch_sylph_profile = SYLPH_PROFILE.out.profile_out.filter{!it[1].isEmpty()}
+ ch_versions = ch_versions.mix(SYLPH_PROFILE.out.versions)
+
+ SYLPHTAX_TAXPROF (
+ ch_sylph_profile,
+ params.sylph_taxonomy
+ )
+ ch_versions = ch_versions.mix(SYLPHTAX_TAXPROF.out.versions)
+ ch_multiqc_files = ch_multiqc_files.mix(SYLPHTAX_TAXPROF.out.taxprof_output.collect{it[1]})
+ }
}
//
diff --git a/workflows/rnaseq/nextflow.config b/workflows/rnaseq/nextflow.config
index 7bd96fc31..10cf9f6bb 100644
--- a/workflows/rnaseq/nextflow.config
+++ b/workflows/rnaseq/nextflow.config
@@ -10,6 +10,8 @@ includeConfig "../../modules/nf-core/stringtie/stringtie/nextflow.config"
includeConfig "../../modules/nf-core/subread/featurecounts/nextflow.config"
includeConfig "../../modules/nf-core/kraken2/kraken2/nextflow.config"
includeConfig "../../modules/nf-core/bracken/bracken/nextflow.config"
+includeConfig "../../modules/nf-core/sylph/profile/nextflow.config"
+includeConfig "../../modules/nf-core/sylphtax/taxprof/nextflow.config"
includeConfig "../../subworkflows/local/align_star/nextflow.config"
includeConfig "../../subworkflows/local/quantify_rsem/nextflow.config"
includeConfig "../../subworkflows/nf-core/quantify_pseudo_alignment/nextflow.config"