Skip to content

Commit 3656d71

Browse files
authored
Merge pull request #414 from Ecogenomics/staging
Staging to Master for 2.1.1
2 parents c963d13 + d1085a1 commit 3656d71

File tree

18 files changed

+250
-58
lines changed

18 files changed

+250
-58
lines changed

docs/requirements.txt

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,3 +4,6 @@ sphinx-rtd-theme ~= 0.5.0
44
recommonmark ~= 0.7.0
55
sphinx-sitemap ~= 2.2.0
66
nbsphinx ~= 0.8.0
7+
matplotlib ~= 3.5.2
8+
linuxdoc == 20211220
9+
jupyter ~= 1.0.0

docs/src/announcements.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ Announcements
33

44

55
GTDB-Tk 2.1.0 available
6-
-------------------
6+
-----------------------
77

88
*May 11, 2022*
99

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
.. _commands/convert_to_itol:
2+
3+
convert_to_itol
4+
===============
5+
6+
The `convert_to_itol` command will remove internal labels from Newick tree, making it suitable for visualization in `iTOL <http://itol.embl.de/>`_.
7+
8+
Arguments
9+
---------
10+
11+
.. argparse::
12+
:module: gtdbtk.cli
13+
:func: get_main_parser
14+
:prog: gtdbtk
15+
:path: convert_to_itol
16+
:nodefaultconst:
17+
18+
Example
19+
-------
20+
21+
Input
22+
^^^^^
23+
24+
25+
.. code-block:: bash
26+
27+
gtdbtk convert_to_itol --input some_tree.tree --output itol.tree
28+
29+
30+
Output
31+
^^^^^^
32+
33+
34+
.. code-block:: text
35+
36+
[2022-06-30 18:44:54] INFO: GTDB-Tk v2.1.0
37+
[2022-06-30 18:44:54] INFO: gtdbtk convert_to_itol --input /tmp/decorated.tree --output new.tree
38+
[2022-06-30 18:44:54] INFO: Using GTDB-Tk reference data version r207: /gtdbtk-data
39+
[2022-06-30 18:44:54] INFO: Convert GTDB-Tk tree to iTOL format
40+
[2022-06-30 18:44:54] INFO: Done.
41+

docs/src/commands/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ Below is a list of all GTDB-Tk command line options:
1414
check_install
1515
classify
1616
classify_wf
17+
convert_to_itol
1718
de_novo_wf
1819
decorate
1920
export_msa

docs/src/conf.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -28,8 +28,8 @@
2828
# Add any Sphinx extension module names here, as strings. They can be
2929
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
3030
# ones.
31-
extensions = ['sphinxarg.ext', 'sphinx.ext.napoleon', 'sphinx.ext.autodoc',
32-
'recommonmark', 'sphinx_sitemap', 'nbsphinx']
31+
extensions = ['sphinxarg.ext', 'sphinx.ext.napoleon', 'sphinx.ext.autodoc', 'linuxdoc.rstFlatTable',
32+
'recommonmark', 'sphinx_sitemap', 'nbsphinx','matplotlib.sphinxext.plot_directive']
3333

3434
# Add any paths that contain templates here, relative to this directory.
3535
templates_path = ['_templates']

docs/src/files/summary.tsv.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,11 +12,11 @@ Classifications provided by the GTDB-Tk are in the files \<prefix>.bac120.summar
1212
* fastani_reference_radius: indicates the species-specific ANI circumscription radius of the reference genomes used to determine if a query genome should be classified to the same species as the reference.
1313
* fastani_taxonomy: indicates the GTDB taxonomy of the above reference genome.
1414
* fastani_ani: indicates the ANI between the query and above reference genome.
15-
* fastani_af: indicates the AF between the query and above reference genome.
15+
* fastani_af: indicates the alignment fraction (AF) between the query and above reference genome.
1616
* closest_placement_reference: indicates the accession number of the reference genome when a genome is placed on a terminal branch.
1717
* closest_placement_taxonomy: indicates the GTDB taxonomy of the above reference genome.
1818
* closest_placement_ani: indicates the ANI between the query and above reference genome.
19-
* closest_placement_af: indicates the AF between the query and above reference genome.
19+
* closest_placement_af: indicates the alignment fraction (AF) between the query and above reference genome.
2020
* pplacer_taxonomy: indicates the pplacer taxonomy of the query genome.
2121
* classification_method: indicates the rule used to classify the genome. This field will be one of: i) ANI, indicating a species assignement was based solely on the calculated ANI and AF with a reference genome; ii) ANI/Placement, indicating a species assignment was made based on both ANI and the placement of the genome in the reference tree; iii) taxonomic classification fully defined by topology, indicating that the classification could be determine based solely on the genome's position in the reference tree; or iv) taxonomic novelty determined using RED, indicating that the relative evolutionary divergence (RED) and placement of the genome in the reference tree were used to determine the classification.
2222
* note: provides additional information regarding the classification of the genome. Currently this field is only filled out when a species determination is made and indicates if the placement of the genome in the reference tree and closest reference according to ANI/AF are the same (congruent) or different (incongruent).

docs/src/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,7 @@ We encourage you to cite GTDB-Tk and the third-party dependencies as described i
4646
:caption: Running GTDB-Tk
4747
:maxdepth: 1
4848

49+
performance/index
4950
commands/index
5051
files/index
5152
examples/classify_wf

docs/src/installing/index.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ Hardware requirements
3838
- ~65 GB
3939
- ~1 hour / 1,000 genomes @ 64 CPUs
4040
* - Bacteria
41-
- ~320 GB ( 55GB for divide-and-conquer)
41+
- ~55GB (320 GB when using --full_tree)
4242
- ~65 GB
4343
- ~1 hour / 1,000 genomes @ 64 CPUs
4444

docs/src/performance/accuracy.rst

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
.. _performance/Accuracy:
2+
3+
4+
Accuracy
5+
========
6+
7+
The similarity of GTDB-Tk v1 and v2 classifications were first assessed using 16,710 bacterial genomes from the GEMs dataset (Nayfach et al., 2021) that represent novel taxa relative to GTDB R07-RS207.
8+
| Only 12 genomes (0.07%) did not have identical classifications between GTDB-Tk v1 and the divide-and-conquer approach used in GTDB-Tk v2.
9+
| The majority of incongruence was due to genomes being over- (6 genomes) or under-classified (4 genomes) by a single taxonomic rank. Only 2 genomes had conflicting taxonomic assignments, and these were both relatively poor-quality genomes assigned as new classes in alternative phyla.
10+
11+
.. flat-table:: Table 1. Novelty of GEM genomes relative to GTDB R07-RS207 based on GTDB-Tk v1 classifications.
12+
:header-rows: 2
13+
14+
* -
15+
-
16+
- :cspan:`4` GTDB-Tk v2 classifications relative to GTDB-Tk v1 classifications
17+
* - Toxon Novelty
18+
- No genomes
19+
- Congruent
20+
- Conflict
21+
- Underclassified
22+
- Overclassified
23+
* - Novel phylum
24+
- 3
25+
- 2
26+
- 0
27+
- 0
28+
- 1
29+
* - Novel class
30+
- 42
31+
- 35
32+
- 2
33+
- 2
34+
- 2
35+
* - Novel order
36+
- 144
37+
- 143
38+
- 0
39+
- 0
40+
- 1
41+
* - Novel family
42+
- 543
43+
- 540
44+
- 0
45+
- 1
46+
- 2
47+
* - Novel genus
48+
- 3,222
49+
- 3,219
50+
- 0
51+
- 1
52+
- 0
53+
* - Novel species
54+
- 12,756
55+
- 12,756
56+
- 0
57+
- 0
58+
- 0

docs/src/performance/index.rst

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
.. _performance:
2+
3+
############################
4+
Performance and Accuracy
5+
############################
6+
7+
8+
.. toctree::
9+
:maxdepth: 1
10+
11+
performance
12+
accuracy

0 commit comments

Comments
 (0)