Skip to content

Commit 122b70e

Browse files
committed
update readme
1 parent bc30cee commit 122b70e

File tree

1 file changed

+9
-1
lines changed

1 file changed

+9
-1
lines changed

README.md

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -143,6 +143,7 @@ Downloaded files are stored in `OUTDIR/DB_NAME` directory, which can be provided
143143
---
144144

145145
## Classification
146+
> [!NOTE] We commend running software like `fastp` or `fastplong` to remove adapters and low-quality reads before classification.
146147
```
147148
metabuli classify <i:FASTA/Q> <i:DBDIR> <o:OUTDIR> <Job ID> [options]
148149
- INPUT : FASTA/Q file of reads you want to classify. (gzip supported)
@@ -160,6 +161,7 @@ metabuli classify --seq-mode 1 read.fna dbdir outdir jobid
160161
metabuli classify --seq-mode 3 read.fna dbdir outdir jobid
161162
162163
* Important parameters:
164+
--validate-input : Validate query file format (0 by default)
163165
--threads : The number of threads used (all by default)
164166
--max-ram : The maximum RAM usage. (128 GiB by default)
165167
--min-score : The minimum score to be classified
@@ -307,6 +309,7 @@ metabuli build --gtdb 1 <DBDIR> <FASTA_LIST> <GTDB_TAXDUMP/taxid.map> --taxonomy
307309
--max-ram : The maximum RAM usage. (128 GiB by default)
308310
--accession-level : Set 1 to creat a DB for accession level classification (0 by default).
309311
--cds-info : List of absolute paths to CDS files.
312+
--validate-input : Validate FASTA file format (0 by default)
310313
311314
```
312315
This will generate **diffIdx**, **info**, **split**, and **taxID_list** and some other files. You can delete `*_diffIdx` and `*_info` files.
@@ -333,6 +336,7 @@ metabuli updateDB --gtdb 1 <NEW DBDIR> <FASTA_LIST> <GTDB_TAXDUMP/taxid.map> <OL
333336
--max-ram: The maximum RAM usage. (128 GiB by default)
334337
--accession-level: Set 1 to add new sequences for accession level classification (0 by default).
335338
--cds-info: List of absolute paths to CDS files.
339+
--validate-input : Validate FASTA file format (0 by default)
336340
```
337341

338342
#### \<Add sequences of new taxa>
@@ -421,6 +425,7 @@ metabuli build <DBDIR> <FASTA_LIST> <accession2taxid> --taxonomy-path <TAXDUMP>
421425
--max-ram: The maximum RAM usage. (128 GiB by default)
422426
--accession-level: Set 1 to creat a DB for accession level classification (0 by default).
423427
--cds-info: List of absolute paths to CDS files.
428+
--validate-input : Validate FASTA file format (0 by default)
424429
```
425430
This will generate **diffIdx**, **info**, **split**, and **taxID_list** and some other files. You can delete `*_diffIdx` and `*_info` files and `DATE-TIME` folder (e.g., `2025-1-24-10-32`) if generated.
426431

@@ -456,6 +461,7 @@ metabuli updateDB <NEW DBDIR> <FASTA_LIST> <accession2taxid> <OLD DBDIR> [option
456461
--accession-level : Set 1 to create a DB for accession level classification (0 by default).
457462
--make-library : Make species library for faster execution (1 by default).
458463
--new-taxa : List of new taxa to be added.
464+
--validate-input : Validate FASTA file format (0 by default)
459465
```
460466
461467
#### \<Add sequences of new taxa> - Please refer [this section](#add-sequences-of-new-taxa).
@@ -489,4 +495,6 @@ fasterq-dump --split-files SRR14484345
489495
```
490496
491497
## Reference
492-
Shen, W., Ren, H., TaxonKit: a practical and efficient NCBI Taxonomy toolkit, Journal of Genetics and Genomics, https://doi.org/10.1016/j.jgg.2021.03.006
498+
- **Taxonomy dump**: [Shen W, Ren H. TaxonKit: a practical and efficient NCBI Taxonomy toolkit. Journal of Genetics and Genomics (2021).](https://doi.org/10.1016/j.jgg.2021.03.006)
499+
- **FASTA format validation**: [Edwards R.A. fasta_validate: a fast and efficient fasta validator written in pure C. Zenodo.](https://doi.org/10.5281/zenodo.2532044)
500+
- **FASTQ format validation**: [Fonseca N, Manning J. nunofonseca/fastq_utils: 0.25.2. Zenodo.](https://doi.org/10.5281/zenodo.7755574)

0 commit comments

Comments
 (0)