Skip to content

2. Workflow concepts and structures

Andreas Sjödin edited this page Nov 8, 2023 · 1 revision

Introduction

FlexTaxD is the multipurpose tool you need for managing taxonomic databases. It provides a streamlined solution for integrating taxonomies from various databases, such as NCBI and GTDB, as well as incorporating custom taxonomies. FlexTaxD also facilitates the preparation of data for downstream applications like metagenomic read classification. With FlexTaxD, you can consolidate taxonomic data into a FlexTaxD-database (Fdb), visualize it, and compile or format it for various classification tools.

Detailed pages for FlexTaxD operations are listed in the menu to the right.

Workflow Overview

Using FlexTaxD involves two primary commands:

  1. flextaxd: For importing taxonomies, modifying the database, and exporting data.
  2. flextaxd-create: For creating and compiling the Fdb.

These commands are modular and may be used repeatedly at different stages of your workflow to manage taxonomy data, including:

  • Importing external taxonomy.
  • Modifying taxonomy within your Fdb.
  • Adding external or custom taxonomy.
  • Visualizing taxonomic trees.
  • Downloading genome files.
  • Compiling data for metagenomic read classification tools.

Core FlexTaxD File Structure

A standard FlexTaxD directory includes:

  • Taxonomic input file(s) for creating the Fdb.
    • Additional taxonomic file(s) for modifying the Fdb.
  • The FlexTaxD database (Fdb) itself.
  • An optional directory containing genome files.
  • A temporary directory for intermediate files that can be retained post-execution.

Importing Taxonomy

FlexTaxD supports a range of taxonomy file formats:

  • NCBI Format:

    • names.dmp
    • nodes.dmp
    • Optional: *_accession2taxid files.
  • GTDB Format:

    • *_taxonomy.tsv
  • FlexTaxD Format:

    • tree2tax.tsv
    • genome2tax.tsv
  • CanSNPer Format:

    • tree2tax.txt
    • genome2tax.tsv

For detailed file format specifications, please refer to the "File Formats" page.

Importing Genomes

Genome sequences are essential for creating or formatting outputs for metagenomic classification databases. These sequences should be organized in a single folder and can be sourced as follows:

  • Manually or via external software, which includes:

    • NCBI genome sets.
    • GTDB representative genomes.
    • Custom genome files.
  • Through FlexTaxD's genome download functionality:

    • Using the NCBI datasets tool (Command-line tool details here).
    • From the GTDB representative genome tarball (available here).

Note: Some older versions may use the ncbi-genome-download tool.

Exporting the FlexTaxD Database

To compile the Fdb into a metagenomic read classification database or to create structured files for other software, use:

  • flextaxd-create with --create_db and --dbprogram arguments for compiling databases.
  • flextaxd with --outdir and --dump, and optionally --dbprogram arguments for exporting files.

For additional formatting options and further help, run flextaxd --help.

Clone this wiki locally