Skip to content

Commit baf61c0

Browse files
authored
📝 Update databases docs
📝 Fix typos in databases docs 🔥📝 Remove misleading note on autometa being packaged with markers 📝 Add code block template commands to download/format markers
1 parent 70fd052 commit baf61c0

File tree

1 file changed

+35
-17
lines changed

1 file changed

+35
-17
lines changed

docs/source/databases.rst

Lines changed: 35 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,31 @@
22
Databases
33
=========
44

5+
If you are running Autometa for the first time you will need to download and format a few databases.
6+
You may do this manually or using a few Autometa helper scripts. If you would like to use Autometa's
7+
scripts for this, you will first need to install Autometa (See :ref:`Installation`).
8+
9+
The following sections use a pair of commands to configure autometa such that the database is updated
10+
according to its respective path.
11+
512
Markers
613
#######
714

8-
Autometa comes packaged with the necessary markers files. Links to these markers files and their associated cutoff values are below:
15+
.. code-block:: bash
16+
17+
# Point Autometa to where you would like your markers database directory
18+
autometa-config \
19+
--section databases --option markers \
20+
--value <path/to/your/markers/database/directory>
21+
22+
# Update your markers database directory
23+
autometa-update-databases --update-markers
24+
25+
.. alert::
26+
27+
Do NOT use a trailing slash, e.g. NO ``/`` for the database directory paths!
28+
29+
Links to these markers files and their associated cutoff values are below:
930

1031
- bacteria single-copy-markers - `link <https://raw.githubusercontent.com/KwanLab/Autometa/main/autometa/databases/markers/bacteria.single_copy.hmm>`__
1132
- bacteria single-copy-markers cutoffs - `link <https://raw.githubusercontent.com/KwanLab/Autometa/main/autometa/databases/markers/bacteria.single_copy.cutoffs>`__
@@ -15,10 +36,6 @@ Autometa comes packaged with the necessary markers files. Links to these markers
1536
NCBI
1637
####
1738

18-
If you are running Autometa for the first time you will need to download the NCBI databases.
19-
You may do this manually or using a few Autometa helper scripts. If you would like to use Autometa's
20-
scripts for this, you will first need to download Autometa (See :ref:`Installation`).
21-
2239
.. code-block:: bash
2340
2441
# First configure where you want to download the NCBI databases
@@ -31,7 +48,7 @@ scripts for this, you will first need to download Autometa (See :ref:`Installati
3148
3249
.. note::
3350

34-
You can check the default config paths using ``autometa-config --print``.
51+
You can check the config paths using ``autometa-config --print``.
3552

3653
See ``autometa-update-databases -h`` and ``autometa-config -h`` for full list of options.
3754

@@ -41,7 +58,7 @@ The previous command will download the following NCBI databases:
4158
- `ftp.ncbi.nlm.nih.gov/blast/db/FASTA/nr.gz <https://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/nr.gz>`_
4259
- prot.accession2taxid.gz
4360
- `ftp.ncbi.nih.gov/pub/taxonomy/accession2taxid/prot.accession2taxid.gz <https://ftp.ncbi.nih.gov/pub/taxonomy/accession2taxid/prot.accession2taxid.gz>`_
44-
- nodes.dmp, names.dmp and merged.dmp - Found within
61+
- nodes.dmp, names.dmp, merged.dmp and delnodes.dmp - Found within
4562
- `ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz <ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz>`_
4663

4764
After these files are downloaded, the ``taxdump.tar.gz`` tarball's files are extracted and the non-redundant protein database (``nr.gz``)
@@ -50,11 +67,8 @@ is formatted as a diamond database (i.e. ``nr.dmnd``). This will significantly s
5067
Genome Taxonomy Database (GTDB)
5168
###############################
5269

53-
If you would like to incorporate the benefits of using the Genome Taxonomy Database.
54-
You may do this manually or using a few Autometa helper scripts. If you would like to use Autometa's
55-
scripts for this, you will first need to install Autometa (See :ref:`Installation`).
56-
57-
You can either run the following script or manually download the respective databases.
70+
If you would like to incorporate the benefits of using the Genome Taxonomy Database,
71+
you can either run the following script or manually download the respective databases.
5872

5973
.. code-block:: bash
6074
@@ -87,16 +101,20 @@ The previous command will download the following GTDB databases and format the `
87101
- `gtdb-taxdump.tar.gz <https://github.com/shenwei356/gtdb-taxdump/releases/latest/download/gtdb-taxdump.tar.gz>`_
88102

89103

90-
Once unzipped `gtdb-taxdump.tar.gz` will have the taxdump files of all the respective GTDB releases. Make sure that the release you use is in line with the `gtdb_proteins_aa_reps.tar.gz` release version. It's better to always use the latest version.
104+
Once unzipped `gtdb-taxdump.tar.gz` will have the taxdump files of all the respective GTDB releases.
105+
Make sure that the release you use is in line with the `gtdb_proteins_aa_reps.tar.gz` release version.
106+
It's better to always use the latest version.
91107

92-
All the taxonomy files for a specific taxonomy database should be in a single directory. You can now copy the taxdump files of the desired release version in the sample directory as `gtdb.dmnd`
108+
All the taxonomy files for a specific taxonomy database should be in a single directory.
109+
You can now copy the taxdump files of the desired release version in the sample directory as `gtdb.dmnd`
93110

94-
Alternatively if you have manually downloaded `gtdb_proteins_aa_reps.tar.gz` and `gtdb-taxdump.tar.gz` you can run the following command to format the `gtdb_proteins_aa_reps.tar.gz` to generate `gtdb.dmnd` and make it ready for Autometa.
111+
Alternatively if you have manually downloaded `gtdb_proteins_aa_reps.tar.gz` and `gtdb-taxdump.tar.gz` you can run the
112+
following command to format the `gtdb_proteins_aa_reps.tar.gz` to generate `gtdb.dmnd` and make it ready for Autometa.
95113

96114
.. code-block:: bash
97115
98-
python -m autometa.taxonomy.gtdb --reps-faa <path/to/gtdb_proteins_aa_reps.tar.gz> --dbdir <path/to/output_directory> --cpus 20
116+
autometa-setup-gtdb --reps-faa <path/to/gtdb_proteins_aa_reps.tar.gz> --dbdir <path/to/output_directory> --cpus 20
99117
100118
.. note::
101119

102-
Again Make sure that the formatted `gtdb_proteins_aa_reps.tar.gz` databse and gtdb taxdump files are in the same directory.
120+
Again Make sure that the formatted `gtdb_proteins_aa_reps.tar.gz` database and gtdb taxdump files are in the same directory.

0 commit comments

Comments
 (0)