Skip to content

7. Output

Andreas Sjödin edited this page Nov 8, 2023 · 1 revision

Guide for Outputting FlexTaxD Databases

Outputting your FlexTaxD databases correctly is essential for their use in downstream applications like metagenomic classifiers. Below you will find instructions tailored for different tools.

Preparation

Before exporting your FlexTaxD database, ensure it's clean:

flextaxd --db yourdatabase.fdb --clean  # For NCBI-based databases
flextaxd --db yourdatabase.fdb --purge_database  # For GTDB-based databases

This step is crucial for removing unnecessary nodes that don't contribute valuable information for your specific analysis.

Exporting to Metagenomic Read Classifiers

After ensuring your FlexTaxD database is ready, follow these commands to create databases for different classifiers.

Kraken2 Database

flextaxd --database gtdb.fdb --genomes_path genomes --dbprogram kraken2 --create_db --db_name kraken2_gtdb --processes 20

Krakenuniq Database

flextaxd --database gtdb.fdb --genomes_path genomes --dbprogram krakenuniq --create_db --db_name krakenuniq_gtdb --processes 20

Ganon Database

flextaxd --database gtdb.fdb --genomes_path genomes --dbprogram ganon --create_db --db_name ganon_gtdb --processes 20

Remember to replace gtdb.fdb and genomes with the actual paths to your FlexTaxD database and genome directory, respectively.

Exporting in Other Formats

FlexTaxD allows for database exporting in different formats, suitable for various downstream applications:

Centrifuge Format

flextaxd --database gtdb.fdb --genomes_path genomes --dbprogram centrifuge --dump

Bracken Format

flextaxd --database gtdb.fdb --genomes_path genomes --dbprogram bracken --dump

Custom Formatting

You can customize the output format with a variety of flags:

  • --dump_sep: Define a custom separator for the output file (default mimics NCBI format).
  • --dump_descriptions: Output the textual descriptions rather than the numeric identifiers.
  • --dump_genomes: Produce a list of genomes with their sources in a separate file.
  • --dump_genome_annotations: Include taxonomic annotations alongside genome listings.

Example command with custom formatting:

flextaxd --database gtdb.fdb --dump_sep "\t" --dump_descriptions --dump_genomes --dump_genome_annotations

The above commands are structured to work with a FlexTaxD database named gtdb.fdb and a directory genomes that contains the genomic files. Adjust the paths and names as needed for your specific environment and database.

Lastly, ensure that the programs you're exporting to (Kraken2, Krakenuniq, Ganon, etc.) are installed and properly configured in your environment to recognize the databases you are creating. If you use Conda, these can be installed within your FlexTaxD environment, making sure they are available on your $PATH.

Clone this wiki locally