You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+8-8Lines changed: 8 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,8 +7,6 @@
7
7
[](https://hub.docker.com/r/ecogenomic/gtdbtk)
<b>GTDB-Tk v2.1.0+ requires an updated reference package ([R207_v2](https://data.gtdb.ecogenomic.org/releases/latest/auxillary_files/gtdbtk_v2_data.tar.gz)), [read more](https://ecogenomics.github.io/GTDBTk/installing/index.html#gtdb-tk-reference-data).</b>
11
-
12
10
GTDB-Tk is a software toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes based
13
11
on the Genome Database Taxonomy ([GTDB](https://gtdb.ecogenomic.org/)). It is designed to work with recent advances that
14
12
allow hundreds or thousands of metagenome-assembled genomes (MAGs) to be obtained directly from environmental samples.
@@ -39,13 +37,15 @@ Documentation for GTDB-Tk can be found [here](https://ecogenomics.github.io/GTDB
39
37
40
38
## ✨ New Features
41
39
42
-
GTDB-Tk v2.1.0 includes the following new features:
43
-
- GTDB-TK now uses a **divide-and-conquer** approach where the bacterial reference tree is split into multiple **class**-level subtrees. This reduces the memory requirements of GTDB-Tk from **320 GB** of RAM when using the full GTDB R07-RS207 reference tree to approximately **55 GB**. A manuscript describing this approach is in preparation. If you wish to continue using the full GTDB reference tree use the `--full-tree` flag.
44
-
This is the main change from v2.0.0. The split tree approach has been modified from order-level trees to class-level trees to resolve specific classification issues (See [#383](https://github.com/Ecogenomics/GTDBTk/issues/383)).
45
-
- Genomes that cannot be assigned to a domain (e.g. genomes with no bacterial or archaeal markers or genomes with no genes called by Prodigal) are now reported in the `gtdbtk.bac120.summary.tsv` as 'Unclassified'
46
-
- Genomes filtered out during the alignment stepare now reported in the `gtdbtk.bac120.summary.tsv` or `gtdbtk.ar53.summary.tsv` as 'Unclassified Bacteria/Archaea'
47
-
-`--write_single_copy_genes` flag in now available in the `classify_wf` and `de_novo_wf` workflows.
40
+
GTDB-Tk v2.2.0+ includes the following new features:
41
+
- GTDB-TK `classify`and`classify_wf` have changed in version 2.2.0+. There is now an ANI classification stage (`ANI screen`) that precedes classification by placement in a reference tree.
42
+
-**This is now the default behavior for `classify` and `classify_wf`.**
43
+
- In `classify`, user genomes are first compared against a Mash database comprised of all GTDB representative genomes and genome pairs of sufficient similarity processed by FastANI. User genomes classified to a GTDB representative based on FastANI results are not run through pplacer.
44
+
- In the `classify_wf` workflow, genomes are classified using Mash and FastANI before executing the identify step. User genomes classified with FastANI are not run through the remainder of the pipeline (identify, align, classify).
45
+
- To classify genomes without the additional `ani_screen` step, use the `--skip_ani_screen` flag.
48
46
47
+
## 📈 Performance
48
+
Using ANI screen "can" reduce computation by >50%, although it depends on the set of input genomes. A set of input genomes consisting primarily of new species will not benefit from ANI screen as much as a set of genomes that are largely assigned to GTDB species clusters. In the latter case, the ANI screen will reduce the number of genomes that need to be classified by pplacer which reduces computation time subsantially (between 25% and 60% in our testing).
Copy file name to clipboardExpand all lines: docs/src/changelog.rst
+32Lines changed: 32 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,6 +2,38 @@
2
2
Change log
3
3
==========
4
4
5
+
2.2.0
6
+
-----
7
+
8
+
Minor changes:
9
+
10
+
* (`#433 <https://github.com/Ecogenomics/GTDBTk/issues/433>`_) Added additional checks to ensure that the `--outgroup_taxon` cannot be set to a domain (`root`, `de_novo_wf`).
11
+
* (`#459 <https://github.com/Ecogenomics/GTDBTk/issues/459>`_ / `#462 <https://github.com/Ecogenomics/GTDBTk/issues/462>`_ ) Fix deprecated np.bool in prodigal_biolib.py. Special thanks to @neoformit for his contribution.
12
+
* (`#466 <http://github.com/Ecogenomics/GTDBTk/issues/466>`_) RED value has been rounded to 5 decimals after the comma.
13
+
* (`#451 <http://github.com/Ecogenomics/GTDBTk/issues/451>`_) Extra checks have been added when Prodigal fails.
14
+
* (`#448 <http://github.com/Ecogenomics/GTDBTk/issues/448>`_) Warning has been added when all the genomes are filtered out and not classified.
15
+
16
+
Bug Fixes:
17
+
18
+
* (`#420 <https://github.com/Ecogenomics/GTDBTk/issues/420>`_) Fixed an issue where GTDB-Tk might hang when classifying TIGRFAM markers (`identify`, `classify_wf`, `de_novo_wf`). Special thanks to @lfenske-93 and @sjaenick for their contribution.
19
+
* (`#428 <https://github.com/Ecogenomics/GTDBTk/issues/428>`_) Fixed an issue where the `--gtdbtk_classification_file` would raise an error trying to read the `classify` summary (`root`, `de_novo_wf`).
20
+
* (`#439 <https://github.com/Ecogenomics/GTDBTk/issues/439>`_) Fix the pipeline when using protein files instead of nucleotide files. symlink uses absolute path instead.
21
+
22
+
23
+
24
+
25
+
2.1.1
26
+
-----
27
+
28
+
Documentation:
29
+
30
+
* (`#410 <https://github.com/Ecogenomics/GTDBTk/issues/410>`_) Add documentation for `convert_to_itol`
31
+
32
+
Bug Fixes:
33
+
34
+
* (`#399 <https://github.com/Ecogenomics/GTDBTk/issues/399>`_) Fix `--genes` option attempting to create a directory.
35
+
* (`#400 <https://github.com/Ecogenomics/GTDBTk/issues/400>`_) Updated contig.py to fix inconsistent pplacer paths causing the program to crash.
[2022-04-11 12:07:08] INFO: 2 genome(s) have been classified using FastANI and pplacer.
65
-
[2022-04-11 12:07:08] INFO: Note that Tk classification mode is insufficient for publication of new taxonomic designations. New designations should be based on one or more de novo trees, an example of which can be produced by Tk in de novo mode.
[2023-02-08 12:57:40] INFO: 0 genome(s) have been classified using FastANI and pplacer.
90
+
[2023-02-08 12:57:40] WARNING: 1 of 3 genome has a warning (see summary file).
91
+
[2023-02-08 12:57:40] INFO: Note that Tk classification mode is insufficient for publication of new taxonomic designations. New designations should be based on one or more de novo trees, an example of which can be produced by Tk in de novo mode.
Copy file name to clipboardExpand all lines: docs/src/commands/classify_wf.rst
+5-1Lines changed: 5 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,7 +12,11 @@ For arguments and output files, see each of the individual steps:
12
12
* :ref:`commands/align`
13
13
* :ref:`commands/classify`
14
14
15
-
The classify workflow consists of three steps: ``identify``, ``align``, and ``classify``.
15
+
The classify workflow consists of four steps: ``ani_screen``, ``identify``, ``align``, and ``classify``.
16
+
17
+
The ``ani_screen`` step compares user genomes against a `Mash <https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0997-x>`_ database composed of all GTDB representative genomes,
18
+
then verify the best mash hits using `FastANI <https://www.nature.com/articles/s41467-018-07641-9>`_. User genomes classified with FastANI are not run through the rest of the pipeline (``identify``, ``align``, ``classify``)
19
+
and are reported in the summary file.
16
20
17
21
The ``identify`` step calls genes using `Prodigal <http://compbio.ornl.gov/prodigal/>`_,
18
22
and uses HMM models and the `HMMER <http://hmmer.org/>`_ package to identify the
0 commit comments