Release 2.1.0 · Ecogenomics/GTDBTk

Major changes:

GTDB-TK now uses a divide-and-conquer approach where the bacterial reference tree is split into multiple class-level subtrees. This reduces the memory requirements of GTDB-Tk from 320 GB of RAM when using the full GTDB R07-RS207 reference tree to approximately 55 GB. A manuscript describing this approach is in preparation. If you wish to continue using the full GTDB reference tree use the --full-tree flag. This is the main change from v2.0.0. The split tree approach has been modified from order-level trees to class-level trees to resolve specific classification issues (see #383).
Genomes that cannot be assigned to a domain (e.g. genomes with no bacterial or archaeal markers or genomes with no genes called by Prodigal) are now reported in the gtdbtk.bac120.summary.tsv as 'Unclassified'
Genomes filtered out during the alignment step are now reported in the gtdbtk.bac120.summary.tsv or gtdbtk.ar53.summary.tsv as 'Unclassified Bacteria/Archaea'
--write_single_copy_genes flag in now available in the classify_wf and de_novo_wf workflows.

Features:

(#392) --write_single_copy_genes flag available in workflows.
(#387) specific memory requirements set in classify_wf depending on the classification approach.

Important

This version is not backwards compatible with GTDB package R207 v1.
This version requires a new reference package

Provide feedback