You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Smasher is a Java program so it requires some version of Java to be installed. It has been tested with Java 1.6 and 1.7. To compile Smasher:
20
+
Smasher is a Java program so it requires some version of Java to be
21
+
installed. It has been tested with Java 1.6, 1.7, and 1.8. To
22
+
compile Smasher:
21
23
22
-
make compile
24
+
make
23
25
24
-
(Don't just say 'make' unless you want to build the Open Tree reference
25
-
taxonomy! That takes a while and is not to be done casually.)
26
-
27
-
You can test that Smasher functions with
26
+
If you make changes to Smasher and want to test that it still functions:
28
27
29
28
make test
30
29
@@ -33,10 +32,13 @@ Smasher is invoked as follows
33
32
bin/jython script.py
34
33
35
34
where the current directory is the home directory of the repository
36
-
clone, and script.py is the name of a script file.
37
-
Or if you like you can skip the script.py parameter, and you'll get an interactive jython prompt.
35
+
clone, and `script.py` is the name of a script file.
36
+
37
+
If you like you can skip the `script.py` parameter, and you'll get an interactive jython prompt.
38
+
39
+
bin/jython script.py
38
40
39
-
You may have a need to set the Java memory limit, which might be too large or too small for your purposes. To do this, edit JAVAFLAGS in the bin/jython script (or edit the Makefile and force re-creation of bin/jython). The default is currently 14G. I like to set it a bit smaller than the actual physical memory available on the machine.
41
+
You may have a need to set the Java memory limit, which might be too large or too small for your purposes. To do this, edit JAVAFLAGS in the bin/jython script (or edit the Makefile and force re-creation of bin/jython). The default is currently 14G. For memory-intensive runs it should be set near the actual physical memory available on the machine.
40
42
41
43
## Using the library
42
44
@@ -46,15 +48,20 @@ modules:
46
48
from org.opentreeoflife.taxa import Taxonomy
47
49
from org.opentreeoflife.smasher import UnionTaxonomy
48
50
49
-
50
51
## Taxonomies
51
52
52
53
If you want to synthesize a new taxonomy, initiate the build by creating a new UnionTaxonomy object:
53
54
54
-
tax = UnionTaxonomy.newTaxonomy()
55
+
winner = UnionTaxonomy.newTaxonomy('winner')
56
+
57
+
`winner` is just an arbitrary name; pick one that's appropriate for
58
+
your project.
59
+
The argument gives the taxonomy's 'idspace', which is a prefix applied to node
60
+
identifiers in log files and certain other places. E.g. for OTT, this
61
+
is `'ott'`.
55
62
56
63
Taxonomies are usually built starting with one or more existing
57
-
taxonomies (although they needn't be), obtained as follows:
64
+
source taxonomies (although they needn't be), obtained as follows:
@@ -64,9 +71,9 @@ see wiki). There can be as many of these retrievals as you like.
64
71
The first argument is a directory name and must end in a '/'.
65
72
The directory must contain a file 'taxonomy.tsv'.
66
73
67
-
The second argument is a short tag that will appear in the
68
-
'sourceinfo' column of the final taxonomy file to notate the source of taxa
69
-
that came from that taxonomy, e.g. 'ncbi:1234'.
74
+
The second argument is an 'idspace' prefix that will appear in the
75
+
'sourceinfo' column of the merged taxonomy file when a node derives from
76
+
this source taxonomy, e.g. 'ncbi:1234'.
70
77
71
78
The format of taxonomy files (taxonomy.tsv and so on) is given [here](https://github.com/OpenTreeOfLife/reference-taxonomy/wiki/Interim-taxonomy-file-format).
72
79
@@ -78,119 +85,131 @@ See [wikipedia](https://en.wikipedia.org/wiki/Newick_format) for a description o
Delete all the descendants of a given taxon: (this is useful for grafting one taxonomy into another)
208
+
Delete all the descendants of a given node: (this is useful for grafting one taxonomy into another)
190
209
191
210
taxon.trim()
192
211
193
-
Delete a taxon, moving all of its children up one level (e.g. delete a
212
+
Delete a node, moving all of its children up one level (e.g. delete a
194
213
subfamily making all of its genus children children of the family):
195
214
196
215
taxon.elide()
@@ -199,35 +218,31 @@ Select a subset of a taxonomy:
199
218
200
219
taxonomy.select(taxon)
201
220
202
-
This returns a new taxonomy whose root is (a copy of) the given taxon.
221
+
This returns a new taxonomy whose root is (a copy of) the given node.
203
222
204
223
(TBD: Need a way to add a root to the forest, or change the root.)
205
224
206
225
## Alignment
207
226
208
-
Taxonomy alignment ('absorb') establishes correspondences between taxa in taxonomy A with taxa in taxonomy B, based on taxon names and topology. Most of the complexity of this operation has to do with the handling of homonyms.
227
+
Taxonomy alignment establishes correspondences between nodes in taxonomy A with nodes in taxonomy B, based on node names and topology. Most of the complexity of this operation has to do with the handling of homonyms.
209
228
Sometimes the automatic alignment
210
229
logic makes mistakes. It is then desirable to manually specify that
211
-
taxon X in taxonomy A is the same as taxon X in taxonomy B (they are not homonyms), or not (they *are* homonyms).
230
+
node X in taxonomy A is the same as node X in taxonomy B (they are not homonyms).
0 commit comments