Skip to content

Commit e375305

Browse files
committed
dryad deposit: remove unneeded tarballs, trim down otts
1 parent 1f34ff3 commit e375305

File tree

6 files changed

+177
-77
lines changed

6 files changed

+177
-77
lines changed

doc/Interim-taxonomy-file-format.md

Lines changed: 19 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ Example (from NCBI):
4343
* _uniqueName_: a human-readable string that is unique to this taxon, typically the taxon name if it is unique, or taxon name followed by "([rank] in [ancestor])" where rank is the taxon's rank and ancestor is an ancestor that is unique to this taxon (among the taxa that have the same name).
4444
* _flags_: a comma-separated list of flags or markers. Usually these are generated by taxonomy synthesis and are used to decide whether a taxon is 'hidden' or not. For example, if there's an 'extinct' flag then it may be desirable to suppress the taxon in an application. See [here](https://github.com/OpenTreeOfLife/taxomachine/blob/master/src/main/java/org/opentree/taxonomy/OTTFlag.java).
4545

46-
### Synonyms
46+
### File `synonyms.tsv`
4747

4848
Usually there are synonyms. These go into a second file, `synonyms.tsv`. This file must have a header row
4949

@@ -60,11 +60,26 @@ Example from NCBI:
6060

6161
89373 | Flexibacteraceae | synonym | |
6262

63-
### Metadata
63+
### File `forwards.tsv`
6464

65-
Overall metadata for the taxonomy is placed in a separate file. The metadata format is currently under development. `Smasher` generates this in JSON format as `about.json`, but this file is currently not used programmatically, and is in the process of being overhauled. When generating a taxonomy according to this format in external tools, for now it is best to simply write a markdown or plain text file called `about.md` (in the same directory as `taxonomy.tsv` and `synonyms.tsv`).
65+
This file provides aliases, resulting from a situation where one taxon
66+
id has been discovered to be equivalent to another. This can be due
67+
to changes in the the way the taxonomy is processed, discovery of new
68+
synonyms, or due to merge events ("lumping"). For example:
6669

67-
The metadata provided in the file should include the source of the taxonomy (article or database) as a URL and any other descriptive information that's available. The purpose of the metadata is not just explanatory but also to explain how to check the correctness of the taxonomy against its source and make corrections and other improvements should the source be updated. When using information from changing sources (databases) the date or dates of retrieval should be recorded.
70+
id replacement
71+
3434315 3434301
72+
5255304 828663
73+
74+
The second line says that older id 3434315 (occurring in one or more
75+
previous versions) should be replaced by newer 3434301 (defined in
76+
this version).
77+
78+
79+
### File `version.txt`
80+
81+
When OTT is generated, the version number is placed in this file,
82+
e.g. `ott3.0draft6`.
6883

6984
***
7085

doc/method/data-package/Makefile

Lines changed: 56 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -1,29 +1,74 @@
11
# Creates a bunch of .tgz files, to be uploaded to Dryad one at a time.
22
# Be sure to upload README as well.
33

4-
# This Makefile needs to be run in a directory that has just successfully built OTT 3.0 draft 3.
4+
# This Makefile needs to be run in a directory that has successfully
5+
# built OTT 3.0, so that all the processed source taxonomies are available.
56

67
F=files
78
REF=../../..
89
DRAFT=
910

10-
all: $F/ott3.0.tgz $F/separation.tgz $F/silva.tgz $F/worms.tgz $F/fung.tgz $F/ncbi.tgz $F/gbif.tgz $F/irmng.tgz \
11-
$F/accessions.tgz $F/amendments.tgz $F/ott2.10.tgz $F/by_qid.tgz $F/amendments.tgz $F/README
11+
all: $F/README $F/ott3.0.tgz $F/separation.tgz \
12+
$F/accessions.tgz $F/amendments.tgz $F/ott2.10.tgz $F/by_qid.tgz
13+
14+
# Decided not to include these
15+
# $F/silva.tgz $F/worms.tgz $F/fung.tgz $F/ncbi.tgz $F/gbif.tgz $F/irmng.tgz
1216

1317
$F/README: README
18+
mkdir -p $F
1419
cp -p $< $@
1520

16-
$F/ott3.0.tgz:
17-
wget -O $@ http://files.opentreeoflife.org/ott/ott3.0/ott3.0.tgz
18-
19-
$F/ott2.10.tgz:
20-
wget -O $@ http://files.opentreeoflife.org/ott/ott2.10/ott2.10.tgz
21-
21+
$F/ott3.0.tgz: cache/ott3.0.tgz readme-ott-3.0.txt
22+
rm -rf ott tax/ott3.0
23+
mkdir -p tax $F
24+
tar xzf cache/ott3.0.tgz \
25+
ott/taxonomy.tsv ott/synonyms.tsv ott/forwards.tsv ott/version.txt
26+
mv ott tax/ott3.0
27+
cp -p readme-ott-3.0.txt tax/ott3.0/README
28+
tar czf $@ tax/ott3.0
29+
30+
# For assigning identifiers only
31+
32+
$F/ott2.10.tgz: cache/ott2.10.tgz readme-ott-2.10.txt
33+
rm -rf ott tax/ott2.10
34+
mkdir -p tax $F
35+
tar xzf cache/ott2.10.tgz \
36+
ott/taxonomy.tsv ott/synonyms.tsv ott/forwards.tsv ott/version.txt
37+
mv ott tax/ott2.10
38+
cp -p readme-ott-3.0.txt tax/ott2.10/README
39+
tar czf $@ tax/ott2.10
40+
41+
cache/ott3.0.tgz:
42+
mkdir -p cache
43+
wget -O $@.new http://files.opentreeoflife.org/ott/ott3.0/ott3.0.tgz
44+
mv $@.new $@
45+
46+
cache/ott2.10.tgz:
47+
mkdir -p cache
48+
wget -O $@.new $@ http://files.opentreeoflife.org/ott/ott2.10/ott2.10.tgz
49+
mv $@.new $@
50+
51+
# The file is misnamed on the server. It's really a .tar.gz file, not a .csv.gz file.
2252
$F/by_qid.tgz:
23-
wget -O $@ http://files.opentreeoflife.org/idlist/idlist-20161118/by_qid.csv.gz
53+
wget -O $@.new http://files.opentreeoflife.org/idlist/idlist-20161118/by_qid.csv.gz
54+
mv $@.new $@
2455

56+
# Copy the separation taxonomy from the git repo.
2557
$F/separation.tgz: $(REF)/tax/separation/taxonomy.tsv
26-
tar czf $@ -C $(REF) tax/separation
58+
tar czf $@ -C $(REF) --exclude "*~" tax/separation
59+
60+
# Genbank accessions
61+
$F/accessions.tgz: $(REF)/feed/silva/work/accessions.tsv
62+
tar czf $@ -C $(REF) feed/silva/work/accessions.tsv
63+
64+
# Curator additions
65+
$F/amendments.tgz: $(REF)/feed/amendments/amendments-1
66+
tar czf $@ -C $(REF) --exclude "\\.git" --exclude LICENSE feed/amendments
67+
68+
clean:
69+
rm -rf files tax
70+
71+
# Don't put these in the data package.
2772

2873
$F/silva.tgz: $(REF)/tax/silva/taxonomy.tsv
2974
tar czf $@ -C $(REF) tax/silva
@@ -43,15 +88,3 @@ $F/gbif.tgz: $(REF)/tax/gbif/taxonomy.tsv
4388
$F/irmng.tgz: $(REF)/tax/irmng/taxonomy.tsv
4489
tar czf $@ -C $(REF) tax/irmng
4590

46-
# genbank accessions
47-
$F/accessions.tgz: $(REF)/feed/silva/work/accessions.tsv
48-
tar czf $@ -C $(REF) feed/silva/work/accessions.tsv
49-
50-
# amendments
51-
$F/amendments.tgz: $(REF)/feed/amendments/amendments-1
52-
tar czf $@ -C $(REF) --exclude "\\.git" --exclude LICENSE feed/amendments
53-
54-
# /amendments-1
55-
# --exclude "\\.git" --exclude LICENSE
56-
57-
clean: rm *.tgz

doc/method/data-package/README

Lines changed: 11 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -17,62 +17,20 @@ not the original sources. Information about sources is provided.
1717
"Source date" refers to the write date of the source file. "Access
1818
date" is the date on which the source file was retrieved.
1919

20-
All files are GNU 'tar' archives, compressed using gzip compression.
20+
Except where noted, all files are GNU 'tar' archives, compressed using
21+
gzip compression.
2122

2223
-----
2324

2425
File: ott3.0.tgz
25-
Description: Open Tree Taxonomy version 3.0
26+
Description: Open Tree Taxonomy version 3.0 (draft 6)
27+
Created: 26 February 2017
28+
Note: See https://github.com/OpenTreeOfLife/reference-taxonomy/blob/master/doc/Interim-taxonomy-file-format.md
2629

2730
File: separation.tgz
2831
Description: Separation taxonomy
2932
Note: Same as what's found in the code repository.
3033

31-
File: silva.tgz
32-
Description: SILVA Taxonomy version 115
33-
Source URL: ftp://ftp.arb-silva.de/release_115/Exports/SSURef_NR99_115_tax_silva.fasta.tgz
34-
Source date: 7 September 2013
35-
Access date: 1 November 2013
36-
Source length: 816923384 bytes
37-
38-
File: fung.tgz
39-
Description: Index Fungorum
40-
Source URL: derived from database query result files provided by Paul Kirk
41-
Access date: 7 April 2014
42-
43-
File: worms.tgz
44-
Description: WoRMS taxonomy
45-
Source URL: accessed via web service described at http://www.marinespecies.org/aphia.php?p=webservice
46-
Access date: 1 October 2015
47-
48-
File: ncbi.tgz
49-
Description: NCBI Taxonomy
50-
Source URL: ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz
51-
Source date: 9 November 2016
52-
Access date: 9 November 2016
53-
Source length: 37595767 bytes
54-
55-
File: gbif.tgz
56-
Description: GBIF backbone taxonomy
57-
Source URL: http://rs.gbif.org/datasets/backbone/backbone.zip
58-
Source date: 29 July 2016
59-
Access date: 11 November 2016
60-
Source length: 347031944 bytes
61-
62-
File: irmng.tgz
63-
Description: IRMNG
64-
Source URL: http://www.cmar.csiro.au/datacentre/downloads/IRMNG_DWC.zip
65-
Source date: 31 January 2014
66-
Access date: 7 June 2014
67-
Note: We obtained a separate extinct/extant annotations file from Tony Rees of CSIRO
68-
Source length: IRMNG_DWC.zip 111780936, IRMNG_DWC_SP_PROFILE.csv 62845750 bytes
69-
70-
File: prev_ott.tgz
71-
Description: OTT version 2.10 (used only for id assignment)
72-
73-
File: by_qid.csv.gz
74-
Description: Cumulative source id to OTT id mappings (used only for id assignment)
75-
7634
File: amendments.tgz
7735
Description: Taxonomy additions submitted using Open Tree curation tool
7836
Source URL: https://github.com/opentreeoflife/additions-1
@@ -86,7 +44,10 @@ Source URL: ftp://ftp.ncbi.nlm.nih.gov/genbank/
8644
Access date: 28 June 2016
8745
Note: see script in source code repository
8846

89-
"The selected license applies to all of your files displayed in the
90-
top of the form. If you want to upload some files under a different
91-
license, please do so in two separate uploads."
47+
File: ott2.10.tgz
48+
Description: OTT version 2.10 (used by version 3.0 only for id assignment)
49+
Created: 1 October 2016
9250

51+
File: by_qid.tgz
52+
Description: Cumulative source id to OTT id mappings (used only for id assignment)
53+
Created: 18 November 2016
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
File: silva.tgz
2+
Description: SILVA Taxonomy version 115
3+
Source URL: ftp://ftp.arb-silva.de/release_115/Exports/SSURef_NR99_115_tax_silva.fasta.tgz
4+
Source date: 7 September 2013
5+
Access date: 1 November 2013
6+
Source length: 816923384 bytes
7+
Note: File ncbi_to_silva.tsv is a mapping from NCBI ids to SILVA
8+
reference sequence genbank ids.
9+
10+
File: fung.tgz
11+
Description: Index Fungorum
12+
Source URL: derived from database query result files provided by Paul Kirk
13+
Access date: 7 April 2014
14+
15+
File: worms.tgz
16+
Description: WoRMS taxonomy
17+
Source URL: accessed via web service described at http://www.marinespecies.org/aphia.php?p=webservice
18+
Access date: 1 October 2015
19+
20+
File: ncbi.tgz
21+
Description: NCBI Taxonomy
22+
Source URL: ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz
23+
Source date: 9 November 2016
24+
Access date: 9 November 2016
25+
Source length: 37595767 bytes
26+
27+
File: gbif.tgz
28+
Description: GBIF backbone taxonomy
29+
Source URL: http://rs.gbif.org/datasets/backbone/backbone.zip
30+
Source date: 29 July 2016
31+
Access date: 11 November 2016
32+
Source length: 347031944 bytes
33+
34+
File: irmng.tgz
35+
Description: IRMNG
36+
Source URL: http://www.cmar.csiro.au/datacentre/downloads/IRMNG_DWC.zip
37+
Source date: 31 January 2014
38+
Access date: 7 June 2014
39+
Note: We obtained a separate extinct/extant annotations file from Tony Rees of CSIRO
40+
Source length: IRMNG_DWC.zip 111780936, IRMNG_DWC_SP_PROFILE.csv 62845750 bytes
41+
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
Open Tree Taxonomy 2.10
2+
3+
Release notes for this version of the taxonomy:
4+
https://github.com/OpenTreeOfLife/reference-taxonomy/blob/master/doc/ott2.10.md
5+
6+
Files:
7+
8+
taxonomy.tsv
9+
The taxonomy itself, with taxon id, parent, rank, source links, and other
10+
information
11+
12+
synonyms.tsv
13+
Synonyms
14+
15+
forwards.tsv
16+
List of id aliases
17+
18+
version.txt
19+
Version number for this OTT version
20+
21+
For descriptions of the file formats, see
22+
23+
https://github.com/OpenTreeOfLife/reference-taxonomy/blob/master/doc/Interim-taxonomy-file-format.md
24+
https://github.com/OpenTreeOfLife/reference-taxonomy/blob/master/doc/taxon-flags.md
25+
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
Open Tree Taxonomy 3.0
2+
3+
Release notes for this version of the taxonomy:
4+
https://github.com/OpenTreeOfLife/reference-taxonomy/blob/master/doc/ott3.0.md
5+
6+
Files:
7+
8+
taxonomy.tsv
9+
The taxonomy itself, with taxon id, parent, rank, source links, and other
10+
information
11+
12+
synonyms.tsv
13+
Synonyms
14+
15+
forwards.tsv
16+
List of id aliases
17+
18+
version.txt
19+
Version number for this OTT version
20+
21+
For descriptions of the file formats, see
22+
23+
https://github.com/OpenTreeOfLife/reference-taxonomy/blob/master/doc/Interim-taxonomy-file-format.md
24+
https://github.com/OpenTreeOfLife/reference-taxonomy/blob/master/doc/taxon-flags.md
25+

0 commit comments

Comments
 (0)