Skip to content

Commit 56ca705

Browse files
authored
Merge pull request #17 from vinisalazar/add-diamond
feat: add module diamond/blastx from nf-core
2 parents 91215a3 + aed2b16 commit 56ca705

File tree

12 files changed

+489
-5
lines changed

12 files changed

+489
-5
lines changed

CITATIONS.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,10 @@
1010
1111
## Pipeline tools
1212

13+
- [DIAMOND](https://github.com/bbuchfink/diamond/)
14+
15+
> Buchfink B, Reuter K, Drost HG. Sensitive protein alignments at tree-of-life scale using DIAMOND. Nat Methods. 2021 Apr;18(4):366-368. doi: 10.1038/s41592-021-01101-x. PubMed PMID: 33828273.
16+
1317
- [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)
1418

1519
> Andrews, S. (2010). FastQC: A Quality Control Tool for High Throughput Sequence Data [Online].

docs/output.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,27 @@ The directories listed below will be created in the results directory after the
1010

1111
The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes data using the following steps:
1212

13+
- [FastQC](#fastqc) - Raw read QC
14+
- [DIAMOND blastx](#diamond-blastx) - Translated alignment against a protein database _(optional)_
15+
- [MultiQC](#multiqc) - Aggregate report describing results and QC from the whole pipeline
16+
- [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution
17+
18+
### DIAMOND blastx
19+
20+
<details markdown="1">
21+
<summary>Output files</summary>
22+
23+
- `diamond/`
24+
- `*.tsv`: Tabular alignment results (BLAST tabular format 6) with one row per query-subject hit.
25+
- `*.log`: DIAMOND run log containing alignment statistics (query count, alignment rate, etc.).
26+
27+
</details>
28+
29+
[DIAMOND](https://github.com/bbuchfink/diamond/wiki/) performs fast translated alignment of metagenomic reads against a protein reference database. Each read is aligned in all six reading frames against the database and only significant hits are reported. The output is a tab-separated file compatible with standard BLAST tabular output parsers.
30+
31+
Enable with `--run_diamond`. Requires a pre-built `.dmnd` database (see [usage docs](usage.md#diamond-blastx)).
32+
33+
### FastQC
1334
- [Short reads QC and preprocessing](https://nf-co.re/subworkflows/fastq_shortreads_preprocess_qc/), see [Output section](https://nf-co.re/subworkflows/fastq_shortreads_preprocess_qc/#output) for details.
1435
- Long reads QC and preprocessing (WIP)
1536
- [HUMANn v3 / v4](#humann-v3--v4) — functional profiling via MetaPhlAn + HUMANn (`--run_humann_v3`, `--run_humann_v4`)

docs/usage.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -94,6 +94,27 @@ humann_v4,uniref90_v4,humann_utility,,,/data/databases/utility_mapping_v4
9494
fmhfunprofiler,kegg_v1,,,short;long,/data/databases/fmhfunprofiler_kegg.sig.zip
9595
```
9696

97+
### DIAMOND blastx
98+
99+
[DIAMOND](https://github.com/bbuchfink/diamond/wiki/) is a high-throughput sequence aligner for translated (nucleotide-vs-protein) alignment. Enable it with `--run_diamond`.
100+
101+
#### Database preparation
102+
103+
The database supplied in the `--databases` CSV must already be in DIAMOND binary format (`.dmnd`). Build it from a protein FASTA using `diamond makedb`:
104+
105+
```bash
106+
diamond makedb --in proteins.faa --db proteins
107+
# produces proteins.dmnd
108+
```
109+
110+
See the [DIAMOND makedb documentation](https://github.com/bbuchfink/diamond/wiki/3.-Command-line-options#makedb-options) for all available options (e.g. adding taxonomy, setting block size).
111+
112+
```
113+
114+
> [!IMPORTANT]
115+
> The path should point to the **directory** containing the `.dmnd` file, not the file itself. The pipeline will automatically locate the `.dmnd` file within that directory.
116+
117+
97118
## Running the pipeline
98119
99120
The typical command for running the pipeline is as follows:

modules.json

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,11 @@
1010
"git_sha": "41dfa3f7c0ffabb96a6a813fe321c6d1cc5b6e46",
1111
"installed_by": ["modules"]
1212
},
13+
"diamond/blastx": {
14+
"branch": "master",
15+
"git_sha": "4012b87ef8f242b7aa1eb17e165aa003b86b49c0",
16+
"installed_by": ["modules"]
17+
},
1318
"fastqc": {
1419
"branch": "master",
1520
"git_sha": "41dfa3f7c0ffabb96a6a813fe321c6d1cc5b6e46",

modules/nf-core/diamond/blastx/environment.yml

Lines changed: 7 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

modules/nf-core/diamond/blastx/main.nf

Lines changed: 127 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

modules/nf-core/diamond/blastx/meta.yml

Lines changed: 161 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)