nickp60
diff --git a/‎CITATIONS.md‎
Lines changed: 4 additions & 0 deletions b/‎CITATIONS.md‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎docs/output.md‎
Lines changed: 21 additions & 0 deletions b/‎docs/output.md‎
Lines changed: 21 additions & 0 deletions
diff --git a/‎docs/usage.md‎
Lines changed: 21 additions & 0 deletions b/‎docs/usage.md‎
Lines changed: 21 additions & 0 deletions
diff --git a/‎modules.json‎
Lines changed: 5 additions & 0 deletions b/‎modules.json‎
Lines changed: 5 additions & 0 deletions
diff --git a/‎modules/nf-core/diamond/blastx/environment.yml‎
Lines changed: 7 additions & 0 deletions b/‎modules/nf-core/diamond/blastx/environment.yml‎
Lines changed: 7 additions & 0 deletions
diff --git a/‎modules/nf-core/diamond/blastx/main.nf‎
Lines changed: 127 additions & 0 deletions b/‎modules/nf-core/diamond/blastx/main.nf‎
Lines changed: 127 additions & 0 deletions
diff --git a/‎modules/nf-core/diamond/blastx/meta.yml‎
Lines changed: 161 additions & 0 deletions b/‎modules/nf-core/diamond/blastx/meta.yml‎
Lines changed: 161 additions & 0 deletions
@@ -10,6 +10,10 @@
 
 ## Pipeline tools
 
+- [DIAMOND](https://github.com/bbuchfink/diamond/)
+
+> Buchfink B, Reuter K, Drost HG. Sensitive protein alignments at tree-of-life scale using DIAMOND. Nat Methods. 2021 Apr;18(4):366-368. doi: 10.1038/s41592-021-01101-x. PubMed PMID: 33828273.
+
 - [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)
 
 > Andrews, S. (2010). FastQC: A Quality Control Tool for High Throughput Sequence Data [Online].
 
@@ -10,6 +10,27 @@ The directories listed below will be created in the results directory after the
 
 The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes data using the following steps:
 
+- [FastQC](#fastqc) - Raw read QC
+- [DIAMOND blastx](#diamond-blastx) - Translated alignment against a protein database _(optional)_
+- [MultiQC](#multiqc) - Aggregate report describing results and QC from the whole pipeline
+- [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution
+
+### DIAMOND blastx
+
+<details markdown="1">
+<summary>Output files</summary>
+
+- `diamond/`
+  - `*.tsv`: Tabular alignment results (BLAST tabular format 6) with one row per query-subject hit.
+  - `*.log`: DIAMOND run log containing alignment statistics (query count, alignment rate, etc.).
+
+</details>
+
+[DIAMOND](https://github.com/bbuchfink/diamond/wiki/) performs fast translated alignment of metagenomic reads against a protein reference database. Each read is aligned in all six reading frames against the database and only significant hits are reported. The output is a tab-separated file compatible with standard BLAST tabular output parsers.
+
+Enable with `--run_diamond`. Requires a pre-built `.dmnd` database (see [usage docs](usage.md#diamond-blastx)).
+
+### FastQC
 - [Short reads QC and preprocessing](https://nf-co.re/subworkflows/fastq_shortreads_preprocess_qc/), see [Output section](https://nf-co.re/subworkflows/fastq_shortreads_preprocess_qc/#output) for details.
 - Long reads QC and preprocessing (WIP)
 - [HUMANn v3 / v4](#humann-v3--v4) — functional profiling via MetaPhlAn + HUMANn (`--run_humann_v3`, `--run_humann_v4`)
 
@@ -94,6 +94,27 @@ humann_v4,uniref90_v4,humann_utility,,,/data/databases/utility_mapping_v4
 fmhfunprofiler,kegg_v1,,,short;long,/data/databases/fmhfunprofiler_kegg.sig.zip
 ```
 
+### DIAMOND blastx
+
+[DIAMOND](https://github.com/bbuchfink/diamond/wiki/) is a high-throughput sequence aligner for translated (nucleotide-vs-protein) alignment. Enable it with `--run_diamond`.
+
+#### Database preparation
+
+The database supplied in the `--databases` CSV must already be in DIAMOND binary format (`.dmnd`). Build it from a protein FASTA using `diamond makedb`:
+
+```bash
+diamond makedb --in proteins.faa --db proteins
+# produces proteins.dmnd
+```
+
+See the [DIAMOND makedb documentation](https://github.com/bbuchfink/diamond/wiki/3.-Command-line-options#makedb-options) for all available options (e.g. adding taxonomy, setting block size).
+
+```
+
+> [!IMPORTANT]
+> The path should point to the **directory** containing the `.dmnd` file, not the file itself. The pipeline will automatically locate the `.dmnd` file within that directory.
+
+
 ## Running the pipeline
 
 The typical command for running the pipeline is as follows:
 
@@ -10,6 +10,11 @@
                         "git_sha": "41dfa3f7c0ffabb96a6a813fe321c6d1cc5b6e46",
                         "installed_by": ["modules"]
                     },
+                    "diamond/blastx": {
+                        "branch": "master",
+                        "git_sha": "4012b87ef8f242b7aa1eb17e165aa003b86b49c0",
+                        "installed_by": ["modules"]
+                    },
                     "fastqc": {
                         "branch": "master",
                         "git_sha": "41dfa3f7c0ffabb96a6a813fe321c6d1cc5b6e46",