Skip to content

Commit 19610af

Browse files
authored
Update README.md
1 parent ccd707b commit 19610af

File tree

1 file changed

+8
-5
lines changed

1 file changed

+8
-5
lines changed

README.md

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -6,10 +6,10 @@ The taxdump files from NCBI, along with the 'nr' database, are often used in met
66
Many researchers in this situation will have custom databases of genomic/transcriptomic data and want to use it, but may still find their organism(s) unavailable within the NCBI taxonomy DB. If your organism does not have a valid TaxID in NCBI then you are unable to use many of the software packages that rely on 'taxdump' to extract taxonomic lineage and naming information with your custom DBs.
77

88
## What?
9-
This tool will allow you to modify the 'taxdump' (names.dmp and nodes.dmp) files from NCBI, to temporarily include your organisms - until they find represenration of their own in the NCBI taxonomy lineage.
9+
This tool will allow you to modify the 'taxdump' (appending new data to names.dmp and nodes.dmp) files from NCBI, to temporarily include your organisms - until they find represenration of their own in the NCBI taxonomy lineage.
1010

1111
## How?
12-
The script will automatically find the largest taxonomic ID in nodes.dmp and increment from that point (with a 10^length-1 addition) and assign it to your new taxa. This large addition is to avoid future conflicts with taxdump updates. If you are adding a group....
12+
The script will automatically find the largest taxonomic ID in nodes.dmp and increment from that point (with a 10^length-1 addition) and assign it to your new taxa. This large addition is to avoid future conflicts with taxdump updates. Once added, you can then run *makeblastdb* with the '-taxid' option and your newly assigned TaxID.
1313

1414
## Usage
1515
```
@@ -41,10 +41,11 @@ The script will automatically find the largest taxonomic ID in nodes.dmp and inc
4141
hidden subtree root flag = 1
4242
```
4343
## Example
44-
### New 'species'
45-
Adding a new 'species' lineage, for example, MAST-4A. We know by looking at the NCBI Taxonomy that there is a group for "Stramenopiles MAST-4" at TaxID:[1735725](https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=1735725) with a lineage of "cellular organisms; Eukaryota; Stramenopiles; unclassified stramenopiles". This is correct for our new organism, so we need to note down the TaxID of '1735725'. Then use the script as below:
44+
### New 'Species'
45+
Adding a new 'species' lineage, for example, MAST-4A. We know by looking at the NCBI Taxonomy that there is already a group for "Stramenopiles MAST-4" at TaxID:[1735725](https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=1735725) with a lineage of "cellular organisms; Eukaryota; Stramenopiles; unclassified stramenopiles". This is correct for our new organism, so we need to note down the TaxID of '1735725'. Then use the script as below, this assumes some default options which have been noted in the [Usage](https://github.com/guyleonard/taxdump_edit/blob/master/README.md#usage) section above:
4646

4747
taxdump_edit.pl -names names.dmp -nodes nodes.dmp -taxa MAST-4A -parent 1735725 -rank species -division 11
48+
We have given the script the location of both names.dmp and nodes.dmp, along with the new taxa name of 'MAST-4A'. We are saying that the parental lineage is TaxID:1735725 and that the rank of the organism is 'species'. The division number is from the [Division](https://github.com/guyleonard/taxdump_edit/blob/master/README.md#divisions) list below, and is 'Environmental Samples' - number 11 - to reflect the provenance of our sample and unlike many other Stramenopiles in NCBI which are listed as '4' - Plants and Fungi. :/
4849

4950
This will show the output:
5051

@@ -63,9 +64,11 @@ At the end of the names.dmp file, you will now have a new record:
6364
Along with the corresponding record in nodes.dmp
6465

6566
3304349 | 1735725 | species | | 11 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
66-
6767
The original nodes.dmp and names.dmp have been backed up in the same location as nodes_backup.dmp and names_backup.dmp.
6868

69+
### New Group
70+
This is done much in the same way, but you will have to add the different lineage levels one-by-one in order to build the taxonomic relationships. For example, imagine we found a new species
71+
6972
### Variable Options
7073
#### Divisions
7174
0 -> Bacteria

0 commit comments

Comments
 (0)