Skip to content

Releases: Ecogenomics/GTDBTk

2.2.6

23 Mar 03:48
890835f

Choose a tag to compare

2.2.6

Bug Fixes:

  • (#493) Fix issue with --full-tree flag (related to skipping ANI steps)

Minor changes:

2.2.5

16 Mar 04:11
9ec7071

Choose a tag to compare

2.2.5

Bug Fixes:

  • gtdbtk.json is now reset when the pipeline is re run and the status of ani_screen is not 'complete'

Minor changes:

  • When using --genes , ANI steps are skipped and warnings are raised to the user to
    inform them that classification is less accurate.
  • (#486) Environment variables can be used in GTDBTK_DATA_PATH
  • is_consistent function in mash.py compares only the filenames, not the full paths
  • Add cutoff arguments to PfamScan ( Thanks @AroneyS for the contribution)

2.2.4

28 Feb 00:38

Choose a tag to compare

Bug Fixes:

  • (#475) If all genomes are classified using ANI, Tk will skip the identify step and align steps

Minor changes:

  • Add hidden '--skip_pplacer' flag to skip pplacer step ( useful for debugging)
  • Improve documentation
  • Convert stage_logger to a Singleton class
  • Use existing ANI results if available

2.2.3

15 Feb 04:20

Choose a tag to compare

2.2.3

Bug Fixes:

  • Fix prodigal_fail_counter issue

2.2.2

14 Feb 21:00

Choose a tag to compare

Bug Fix:

(#471): Fix pplacer issue

2.2.1

14 Feb 11:37

Choose a tag to compare

  • (#470) Add missing Pydantic dependency.

2.2.0

14 Feb 00:33
3d7e936

Choose a tag to compare

2.2.0

Minor changes:

  • (#433) Added additional checks to ensure that the --outgroup_taxon cannot be set to a domain (root, de_novo_wf).
  • (#459/ #462 ) Fix deprecated np.bool in prodigal_biolib.py. Special thanks to @neoformit for his contribution.
  • (#466 ) RED value has been rounded to 5 decimals after the comma.
  • (#451 ) Extra checks have been added when Prodigal fails.
  • (#448) Warning has been added when all the genomes are filtered out and not classified.

Bug Fixes:

  • (#420 ) Fixed an issue where GTDB-Tk might hang when classifying TIGRFAM markers (identify, classify_wf, de_novo_wf). Special thanks to @lfenske-93 and @sjaenick for their contribution.
  • (#428) Fixed an issue where the --gtdbtk_classification_file would raise an error trying to read the classify summary (root, de_novo_wf).
  • (#439) Fix the pipeline when using protein files instead of nucleotide files. symlink uses absolute path instead.

2.1.1

11 Jul 05:11
3656d71

Choose a tag to compare

  • (#399) Fix --genes options
  • (#400) Modify config.py file to resolve this issue
  • Updated documentation ( including #410 , documentation for itol)

2.1.0

12 May 03:23

Choose a tag to compare

Major changes:

  • GTDB-TK now uses a divide-and-conquer approach where the bacterial reference tree is split into multiple class-level subtrees. This reduces the memory requirements of GTDB-Tk from 320 GB of RAM when using the full GTDB R07-RS207 reference tree to approximately 55 GB. A manuscript describing this approach is in preparation. If you wish to continue using the full GTDB reference tree use the --full-tree flag. This is the main change from v2.0.0. The split tree approach has been modified from order-level trees to class-level trees to resolve specific classification issues (see #383).
  • Genomes that cannot be assigned to a domain (e.g. genomes with no bacterial or archaeal markers or genomes with no genes called by Prodigal) are now reported in the gtdbtk.bac120.summary.tsv as 'Unclassified'
  • Genomes filtered out during the alignment step are now reported in the gtdbtk.bac120.summary.tsv or gtdbtk.ar53.summary.tsv as 'Unclassified Bacteria/Archaea'
  • --write_single_copy_genes flag in now available in the classify_wf and de_novo_wf workflows.

Features:

  • (#392) --write_single_copy_genes flag available in workflows.
  • (#387) specific memory requirements set in classify_wf depending on the classification approach.

Important

This version is not backwards compatible with GTDB package R207 v1.
This version requires a new reference package

2.0.0

08 Apr 01:54

Choose a tag to compare

Major changes:

  • GTDB-TK now uses a divide-and-conquer approach where the bacterial reference tree is split into multiple order-level subtrees. This reduces the memory requirements of GTDB-Tk from 320 GB of RAM when using the full GTDB R07-RS207 reference tree to approximately 35 GB. A manuscript describing this approach is in preparation. If you wish to continue using the full GTDB reference tree use the --full-tree flag.
  • Archaeal classification now uses a refined set of 53 archaeal-specific marker genes based on the recent publication by Dombrowski et al., 2020. This set of archaeal marker genes is now used by GTDB for curating the archaeal taxonomy.
  • By default, all directories containing intermediate results are now removed by default at the end of the classify_wf and de_novo_wf pipelines. If you wish to retain these intermediates files use the --keep-intermediates flag.
  • All MSA files produced by the align step are now compressed with gzip.
  • The classification summary and failed genomes files are now the only files linked in the root directory of classify_wf.

Features:

  • convert_to_itol to convert trees into iTOL format (#373)
  • Output FASTA files are compressed by default (#369)
  • Intermediate files will be removed by default when using classify/de-novo workflows unless specified by --keep_intermediates (#369)
  • Add --genes flag for Error (#362)
  • A warning will be displayed if pplacer fails to place a genome (#360 / #356)

Important

  • This version is not backwards compatible with GTDB release 202.
  • This version requires a new reference package