Skip to content

Commit d0e75d4

Browse files
committed
fix errors in scripting doc
1 parent 6b5a235 commit d0e75d4

File tree

1 file changed

+101
-88
lines changed

1 file changed

+101
-88
lines changed

doc/scripting.md

Lines changed: 101 additions & 88 deletions
Original file line numberDiff line numberDiff line change
@@ -17,14 +17,13 @@ To use Smasher, you should clone the reference-taxonomy repository on github:
1717
You can of course clone using https: instead of ssh, see
1818
[here](https://github.com/OpenTreeOfLife/reference-taxonomy).
1919

20-
Smasher is a Java program so it requires some version of Java to be installed. It has been tested with Java 1.6 and 1.7. To compile Smasher:
20+
Smasher is a Java program so it requires some version of Java to be
21+
installed. It has been tested with Java 1.6, 1.7, and 1.8. To
22+
compile Smasher:
2123

22-
make compile
24+
make
2325

24-
(Don't just say 'make' unless you want to build the Open Tree reference
25-
taxonomy! That takes a while and is not to be done casually.)
26-
27-
You can test that Smasher functions with
26+
If you make changes to Smasher and want to test that it still functions:
2827

2928
make test
3029

@@ -33,10 +32,13 @@ Smasher is invoked as follows
3332
bin/jython script.py
3433

3534
where the current directory is the home directory of the repository
36-
clone, and script.py is the name of a script file.
37-
Or if you like you can skip the script.py parameter, and you'll get an interactive jython prompt.
35+
clone, and `script.py` is the name of a script file.
36+
37+
If you like you can skip the `script.py` parameter, and you'll get an interactive jython prompt.
38+
39+
bin/jython script.py
3840

39-
You may have a need to set the Java memory limit, which might be too large or too small for your purposes. To do this, edit JAVAFLAGS in the bin/jython script (or edit the Makefile and force re-creation of bin/jython). The default is currently 14G. I like to set it a bit smaller than the actual physical memory available on the machine.
41+
You may have a need to set the Java memory limit, which might be too large or too small for your purposes. To do this, edit JAVAFLAGS in the bin/jython script (or edit the Makefile and force re-creation of bin/jython). The default is currently 14G. For memory-intensive runs it should be set near the actual physical memory available on the machine.
4042

4143
## Using the library
4244

@@ -46,15 +48,20 @@ modules:
4648
from org.opentreeoflife.taxa import Taxonomy
4749
from org.opentreeoflife.smasher import UnionTaxonomy
4850

49-
5051
## Taxonomies
5152

5253
If you want to synthesize a new taxonomy, initiate the build by creating a new UnionTaxonomy object:
5354

54-
tax = UnionTaxonomy.newTaxonomy()
55+
winner = UnionTaxonomy.newTaxonomy('winner')
56+
57+
`winner` is just an arbitrary name; pick one that's appropriate for
58+
your project.
59+
The argument gives the taxonomy's 'idspace', which is a prefix applied to node
60+
identifiers in log files and certain other places. E.g. for OTT, this
61+
is `'ott'`.
5562

5663
Taxonomies are usually built starting with one or more existing
57-
taxonomies (although they needn't be), obtained as follows:
64+
source taxonomies (although they needn't be), obtained as follows:
5865

5966
ncbi = Taxonomy.getTaxonomy('t/tax/ncbi_aster/', 'ncbi')
6067

@@ -64,9 +71,9 @@ see wiki). There can be as many of these retrievals as you like.
6471
The first argument is a directory name and must end in a '/'.
6572
The directory must contain a file 'taxonomy.tsv'.
6673

67-
The second argument is a short tag that will appear in the
68-
'sourceinfo' column of the final taxonomy file to notate the source of taxa
69-
that came from that taxonomy, e.g. 'ncbi:1234'.
74+
The second argument is an 'idspace' prefix that will appear in the
75+
'sourceinfo' column of the merged taxonomy file when a node derives from
76+
this source taxonomy, e.g. 'ncbi:1234'.
7077

7178
The format of taxonomy files (taxonomy.tsv and so on) is given [here](https://github.com/OpenTreeOfLife/reference-taxonomy/wiki/Interim-taxonomy-file-format).
7279

@@ -78,119 +85,131 @@ See [wikipedia](https://en.wikipedia.org/wiki/Newick_format) for a description o
7885

7986
('Subclass=Sordariomycetidae','Subclass=Hypocreomycetidae')
8087

81-
'absorb' merges the given taxonomy (e.g. ncbi) into the one under construction (e.g. the reference taxonomy).
88+
To align and merge a source taxonomy (e.g. NCBI) into the one under
89+
construction (e.g. `winner`):
8290

83-
tax.absorb(ncbi)
91+
alignment = winner.alignment(source)
92+
winner.align(alignment)
93+
winner.merge(alignment)
8494

85-
To write the taxonomy to a directory:
95+
To write out the taxonomy:
8696

87-
tax.dump('mytaxonomy/')
97+
winner.dump('winner/')
8898

8999
The directory name must end with '/'. The taxonomy is written to
90-
'mytaxonomy/taxonomy.tsv', synonyms file to 'mytaxonomy/synonyms.tsv',
100+
'winner/taxonomy.tsv', synonyms file to 'winner/synonyms.tsv',
91101
and so on.
92102

93-
The one-argument form dump(filename) generates files with columns
103+
The one-argument form `winner.dump(filename)` generates files with columns
94104
separated by tab-verticalbar-tab. To generate with columns simply
95105
separated by tabs, use
96106

97-
tax.dump('mytaxonomy/', '\t')
107+
winner.dump('winner/', '\t')
98108

99109
Taxonomies can also be written as Newick:
100110

101-
tax.dumpNewick('mytaxonomy.tre')
111+
winner.dumpNewick('winner.tre')
102112

103113
but beware that this loses all information about synonyms and sources.
104114
Also beware that if the taxonomy contains homonyms, the Newick file
105115
will contain multiple nodes with the same label, and most tools that
106116
consume Newick don't like this.
107117

108-
## Referring to taxa
109118

110-
A number of scripting commands take taxa as parameters. There are two
111-
ways to specify a taxon: by finding it in a taxonomy, or by creating
119+
## Referring to nodes
120+
121+
A number of scripting commands take nodes as parameters. There are two
122+
ways to specify a node: by finding it in a taxonomy, or by creating
112123
it anew.
113124

114-
The taxon() method looks up a taxon in a taxonomy. It takes two
125+
The `taxon()` method looks up a node in a taxonomy by name. It takes two
115126
forms:
116127

117-
ott.taxon('Pseudacris')
118-
ott.taxon('Pseudacris', 'Anura')
128+
winner.taxon('Pseudacris', 'Anura')
129+
winner.taxon('Pseudacris')
119130

120-
Use the first form if that name is unique within the taxonomy. If the
121-
name is ambiguous (a homonym), use the second form, which provides
122-
context. The context can be any ancestor of the intended taxon that is not shared with the other homonyms.
131+
Use the second form if you're in a hurry and sure the name is unique
132+
within the taxonomy. If the name might be ambiguous (a homonym), use
133+
the first form, which provides context. The context can be any
134+
ancestor of the intended node that is not shared with the other
135+
homonyms - usually something at the class or phylum level.
123136

124-
A variant on this is to specify any descendent of the taxon, as
125-
opposed to ancestor:
137+
If there is no such taxon, the `taxon` method throws an exception.
138+
To return null instead, use `maybeTaxon`:
126139

127-
ott.taxonThatContains('Anura', 'Pseudacris') #designates Anura
140+
winner.maybeTaxon('Pseudacris', 'Anura')
141+
winner.maybeTaxon('Pseudacris')
128142

129-
It is also possible to use a taxon identifier in a source taxonomy:
143+
A variant on `taxon` is to name a descendent of the node, as opposed to
144+
an ancestor:
145+
146+
winner.taxonThatContains('Anura', 'Pseudacris') #designates Anura
147+
148+
It is also possible to use a node identifier relative to a source taxonomy:
130149

131150
ncbi.taxon('173133')
132151

133152
but this is brittle as identifiers may change from one version of a
134-
source taxonomy to another.
153+
source taxonomy to the next.
135154

136-
To add a new taxon, provide its name, rank, and source information to
137-
the newTaxon() method. The source information should be a URL or
138-
CURIE that is specific to that taxon.
155+
To add a new node, provide its name, rank, and source information to
156+
the `newTaxon()` method. The source information should be a URL or
157+
CURIE that is specific to that node.
139158

140-
ott.newTaxon('Euacris', 'genus', 'http://mongotax.org/12345')
159+
winner.newTaxon('Euacris', 'genus', 'http://mongotax.org/12345')
141160

142-
If the taxon has no particular rank, put 'no rank'.
161+
If the node has no particular rank, put 'no rank'.
143162

144163
## Counts
145164

146165
taxon.count() => integer
147166
taxon.tipCount() => integer
148167

149-
count() returns the number of taxa (nodes) tipward of the given taxon.
168+
count() returns the number of nodes tipward of the given node, including the node itself.
150169

151-
tipCount() returns the number of tips (leaf nodes) tipward of the given taxon.
170+
tipCount() returns the number of tips (leaf nodes) tipward of the given node.
152171

153172
## Surgery
154173

155174
Whenever making ad hoc modifications to the taxonomy please leave a pointer (i.e. a URL) to some
156-
evidence or source of evidence for the correctness of the change. If
175+
source of evidence for the correctness of the change. If
157176
the evidence doesn't go in as the source information in a newTaxon() call, put
158177
it in a comment in the script file. (Probably the evidence should be an argument to the
159178
various surgery commands; maybe later.)
160179

161-
Add a new taxon as a daughter of a given one: (would be used with newTaxon)
180+
Add a new node as a daughter of a given one: (would be used with `newTaxon`)
162181

163-
taxon.add(othertaxon)
182+
taxon.addChild(othertaxon)
164183
e.g.
165-
ott.taxon('Parentia').add(ott.newTaxon('Parentia daughtera',
184+
winner.taxon('Parentia').addChild(winner.newTaxon('Parentia daughtera',
166185
'species', 'http://www.marinespecies.org/aphia.php?p=taxdetails&id=557120'))
167186

168-
Detach an existing taxon from its current location, and add it as a
187+
Detach an existing node from its current location, and add it as a
169188
daughter of a different parent:
170189

171190
taxon.take(othertaxon)
172191
e.g.
173192
# From http://www.marinespecies.org/aphia.php?p=taxdetails&id=556811
174-
ott.taxon('Ammoniinae').take(ott.taxon('Asiarotalia'))
193+
winner.taxon('Ammoniinae').take(winner.taxon('Asiarotalia'))
175194

176-
Move the children of taxon A into taxon B, and make B be a synonym of
195+
Move the children of node A into node B, and make B be a synonym of
177196
A: (I.e. the names are synonyms, but not previously recorded as
178197
such):
179198

180199
taxon.absorb(othertaxon)
181200
e.g.
182201
# From http://www.marinespecies.org/aphia.php?p=taxdetails&id=557120
183-
ott.taxon('Parentia').absorb(ott.taxon('Parentiola'))
202+
winner.taxon('Parentia').absorb(winner.taxon('Parentiola'))
184203

185-
Delete a taxon and all of its descendants:
204+
Delete a node and all of its descendants:
186205

187206
taxon.prune()
188207

189-
Delete all the descendants of a given taxon: (this is useful for grafting one taxonomy into another)
208+
Delete all the descendants of a given node: (this is useful for grafting one taxonomy into another)
190209

191210
taxon.trim()
192211

193-
Delete a taxon, moving all of its children up one level (e.g. delete a
212+
Delete a node, moving all of its children up one level (e.g. delete a
194213
subfamily making all of its genus children children of the family):
195214

196215
taxon.elide()
@@ -199,35 +218,31 @@ Select a subset of a taxonomy:
199218

200219
taxonomy.select(taxon)
201220

202-
This returns a new taxonomy whose root is (a copy of) the given taxon.
221+
This returns a new taxonomy whose root is (a copy of) the given node.
203222

204223
(TBD: Need a way to add a root to the forest, or change the root.)
205224

206225
## Alignment
207226

208-
Taxonomy alignment ('absorb') establishes correspondences between taxa in taxonomy A with taxa in taxonomy B, based on taxon names and topology. Most of the complexity of this operation has to do with the handling of homonyms.
227+
Taxonomy alignment establishes correspondences between nodes in taxonomy A with nodes in taxonomy B, based on node names and topology. Most of the complexity of this operation has to do with the handling of homonyms.
209228
Sometimes the automatic alignment
210229
logic makes mistakes. It is then desirable to manually specify that
211-
taxon X in taxonomy A is the same as taxon X in taxonomy B (they are not homonyms), or not (they *are* homonyms).
230+
node X in taxonomy A is the same as node X in taxonomy B (they are not homonyms).
212231

213-
tax.same(tax1, tax2)
214-
tax.notSame(tax1, tax2)
232+
alignment.same(tax1, tax2)
215233
e.g.
216-
tax.same(A.taxon("X"), B.taxon("X"))
234+
alignment.same(ncbi.taxon('X'), winner.taxon('X'))
217235

218-
These methods are a bit fussy.
219-
One of the arguments to same or notSame should be a taxon from a taxonomy that is about to be
220-
'absorbed' but hasn't been yet, and the other should be from the
221-
taxonomy under construction, after it has had other source taxonomies
222-
absorbed into it. (Equivalently it is possible to specify a taxon in a taxonomy that has already been 'absorbed'.) The taxa may occur in either order.
236+
The first argument must be in the source taxonomy, and the second must be in the merged taxonomy.
223237

224-
same(gbif.taxon('Plantae'), ott.taxon('Viridiplantae'))
225-
ott.absorb(gbif)
238+
alignment = winner.alignment(source)
239+
alignment.same(gbif.taxon('Plantae'), winner.taxon('Archaeplastida'))
240+
winner.align(alignment)
241+
winner.merge(alignment)
226242

227-
Should the need arise you can find out what a source taxon maps to, using image():
243+
Should the need arise you can find out what a source node maps to, using `image()`:
228244

229-
tax2.absorb(tax1)
230-
tax2.image(tax1.taxon('Sample'))
245+
alignment.image(gbif.taxon('Sample'))
231246

232247

233248
## Annotation
@@ -236,50 +251,48 @@ Add a synonym:
236251

237252
taxon.synonym('Alternate name')
238253

239-
Rename a taxon, leaving old name behind as a synonym:
254+
Rename a node, leaving old name behind as a synonym:
240255

241256
taxon.rename('Newname')
242257

243-
Mark a taxon as being 'incertae sedis' i.e. not classified. It will
244-
be retained for use in OTU matching but will not show up in the
245-
browsable tree unless mentioned in a source tree:
258+
Mark a node as being 'incertae sedis' i.e. not fully classified:
246259

247260
taxon.incertaeSedis()
248261

249-
Mark as extinct:
262+
Mark as extinct or extant:
250263

251264
taxon.extinct()
265+
taxon.extant()
252266

253-
Force an 'incertae sedis' or otherwise hidden taxon to become visible
254-
in spite of other information:
255-
256-
taxon.forceVisible()
257-
258-
Mark a taxon as 'hidden' so that it can be suppressed by tools downstream:
267+
Mark a node as 'hidden' so that it can be suppressed by tools downstream:
259268

260269
taxon.hide()
261270

262271
## Looking at taxonomies
263272

264-
Taxonomies are iterable.
273+
One can iterate over the nodes of a taxonomy:
265274

266-
for taxon in taxonomy: ...
275+
for taxon in taxonomy.taxa(): ...
267276

268-
Taxa have lots of properties you might want to look at in a script.
277+
Nodes have lots of properties you might want to look at in a script.
269278

270279
taxon.parent
271-
taxon.children
280+
taxon.children # null, or a List of nodes
272281
taxon.isHidden()
273282

274-
You can select only the visible (non-hidden) taxa:
283+
You can make a copy of a taxonomy, selecting only the visible (non-hidden) nodes:
275284

276285
taxonomy.selectVisible("my taxonomy but only visible")
277286

287+
or make a new taxonomy that is a subtree of a given one:
288+
289+
rana = ncbi.select('Rana')
290+
278291
## Debugging
279292

280293
Not much here yet, just
281294

282295
taxon.show()
283296

284-
Displays information about this taxon: its lineage and children, sources,
297+
Displays information about this node: its lineage and children, sources,
285298
flags, etc.

0 commit comments

Comments
 (0)