fix errors in scripting doc

jar398 · jar398 · commit d0e75d4023ab · 2017-03-28T18:02:42.000-04:00
diff --git a/doc/scripting.md b/doc/scripting.md
@@ -17,14 +17,13 @@ To use Smasher, you should clone the reference-taxonomy repository on github:
 You can of course clone using https: instead of ssh, see 
 [here](https://github.com/OpenTreeOfLife/reference-taxonomy).
 
-Smasher is a Java program so it requires some version of Java to be installed.  It has been tested with Java 1.6 and 1.7.  To compile Smasher:
+Smasher is a Java program so it requires some version of Java to be
+installed.  It has been tested with Java 1.6, 1.7, and 1.8.  To
+compile Smasher:
 
-    make compile
+    make
 
-(Don't just say 'make' unless you want to build the Open Tree reference
-taxonomy!  That takes a while and is not to be done casually.)
-
-You can test that Smasher functions with
+If you make changes to Smasher and want to test that it still functions:
 
     make test
 
@@ -33,10 +32,13 @@ Smasher is invoked as follows
     bin/jython script.py
 
 where the current directory is the home directory of the repository
-clone, and script.py is the name of a script file.  
-Or if you like you can skip the script.py parameter, and you'll get an interactive jython prompt.
+clone, and `script.py` is the name of a script file.  
+
+If you like you can skip the `script.py` parameter, and you'll get an interactive jython prompt.
+
+    bin/jython script.py
 
-You may have a need to set the Java memory limit, which might be too large or too small for your purposes.  To do  this, edit JAVAFLAGS in the bin/jython script (or edit the Makefile and force re-creation of bin/jython).  The default is currently 14G.  I like to set it a bit smaller than the actual physical memory available on the machine.
+You may have a need to set the Java memory limit, which might be too large or too small for your purposes.  To do  this, edit JAVAFLAGS in the bin/jython script (or edit the Makefile and force re-creation of bin/jython).  The default is currently 14G.  For memory-intensive runs it should be set near the actual physical memory available on the machine.
 
 ## Using the library
 
@@ -46,15 +48,20 @@ modules:
     from org.opentreeoflife.taxa import Taxonomy
     from org.opentreeoflife.smasher import UnionTaxonomy
 
-
 ## Taxonomies
 
 If you want to synthesize a new taxonomy, initiate the build by creating a new UnionTaxonomy object:
 
-    tax = UnionTaxonomy.newTaxonomy()
+    winner = UnionTaxonomy.newTaxonomy('winner')
+
+`winner` is just an arbitrary name; pick one that's appropriate for
+your project.
+The argument gives the taxonomy's 'idspace', which is a prefix applied to node
+identifiers in log files and certain other places.  E.g. for OTT, this
+is `'ott'`.
 
 Taxonomies are usually built starting with one or more existing
-taxonomies (although they needn't be), obtained as follows:
+source taxonomies (although they needn't be), obtained as follows:
 
     ncbi = Taxonomy.getTaxonomy('t/tax/ncbi_aster/', 'ncbi')
 
@@ -64,9 +71,9 @@ see wiki).  There can be as many of these retrievals as you like.
 The first argument is a directory name and must end in a '/'.  
 The directory must contain a file 'taxonomy.tsv'.
 
-The second argument is a short tag that will appear in the
-'sourceinfo' column of the final taxonomy file to notate the source of taxa
-that came from that taxonomy, e.g. 'ncbi:1234'.
+The second argument is an 'idspace' prefix that will appear in the
+'sourceinfo' column of the merged taxonomy file when a node derives from
+this source taxonomy, e.g. 'ncbi:1234'.
 
 The format of taxonomy files (taxonomy.tsv and so on) is given [here](https://github.com/OpenTreeOfLife/reference-taxonomy/wiki/Interim-taxonomy-file-format).
 
@@ -78,119 +85,131 @@ See [wikipedia](https://en.wikipedia.org/wiki/Newick_format) for a description o
 
     ('Subclass=Sordariomycetidae','Subclass=Hypocreomycetidae')
 
-'absorb' merges the given taxonomy (e.g. ncbi) into the one under construction (e.g. the reference taxonomy).
+To align and merge a source taxonomy (e.g. NCBI) into the one under
+construction (e.g. `winner`):
 
-    tax.absorb(ncbi)
+    alignment = winner.alignment(source)
+    winner.align(alignment)
+    winner.merge(alignment)
 
-To write the taxonomy to a directory:
+To write out the taxonomy:
 
-    tax.dump('mytaxonomy/')
+    winner.dump('winner/')
 
 The directory name must end with '/'.  The taxonomy is written to
-'mytaxonomy/taxonomy.tsv', synonyms file to 'mytaxonomy/synonyms.tsv',
+'winner/taxonomy.tsv', synonyms file to 'winner/synonyms.tsv',
 and so on.
 
-The one-argument form dump(filename) generates files with columns
+The one-argument form `winner.dump(filename)` generates files with columns
 separated by tab-verticalbar-tab.  To generate with columns simply
 separated by tabs, use
 
-    tax.dump('mytaxonomy/', '\t')
+    winner.dump('winner/', '\t')
 
 Taxonomies can also be written as Newick:
 
-    tax.dumpNewick('mytaxonomy.tre')
+    winner.dumpNewick('winner.tre')
 
 but beware that this loses all information about synonyms and sources.
 Also beware that if the taxonomy contains homonyms, the Newick file
 will contain multiple nodes with the same label, and most tools that
 consume Newick don't like this.
 
-## Referring to taxa
 
-A number of scripting commands take taxa as parameters.  There are two
-ways to specify a taxon: by finding it in a taxonomy, or by creating
+## Referring to nodes
+
+A number of scripting commands take nodes as parameters.  There are two
+ways to specify a node: by finding it in a taxonomy, or by creating
 it anew.
 
-The taxon() method looks up a taxon in a taxonomy.  It takes two
+The `taxon()` method looks up a node in a taxonomy by name.  It takes two
 forms:
 
-    ott.taxon('Pseudacris')
-    ott.taxon('Pseudacris', 'Anura')
+    winner.taxon('Pseudacris', 'Anura')
+    winner.taxon('Pseudacris')
 
-Use the first form if that name is unique within the taxonomy.  If the
-name is ambiguous (a homonym), use the second form, which provides
-context.  The context can be any ancestor of the intended taxon that is not shared with the other homonyms.
+Use the second form if you're in a hurry and sure the name is unique
+within the taxonomy.  If the name might be ambiguous (a homonym), use
+the first form, which provides context.  The context can be any
+ancestor of the intended node that is not shared with the other
+homonyms - usually something at the class or phylum level.
 
-A variant on this is to specify any descendent of the taxon, as
-opposed to ancestor:
+If there is no such taxon, the `taxon` method throws an exception.
+To return null instead, use `maybeTaxon`:
 
-    ott.taxonThatContains('Anura', 'Pseudacris') #designates Anura
+    winner.maybeTaxon('Pseudacris', 'Anura')
+    winner.maybeTaxon('Pseudacris')
 
-It is also possible to use a taxon identifier in a source taxonomy:
+A variant on `taxon` is to name a descendent of the node, as opposed to
+an ancestor:
+
+    winner.taxonThatContains('Anura', 'Pseudacris') #designates Anura
+
+It is also possible to use a node identifier relative to a source taxonomy:
 
     ncbi.taxon('173133')
 
 but this is brittle as identifiers may change from one version of a
-source taxonomy to another.
+source taxonomy to the next.
 
-To add a new taxon, provide its name, rank, and source information to
-the newTaxon() method.  The source information should be a URL or
-CURIE that is specific to that taxon.
+To add a new node, provide its name, rank, and source information to
+the `newTaxon()` method.  The source information should be a URL or
+CURIE that is specific to that node.
 
-    ott.newTaxon('Euacris', 'genus', 'http://mongotax.org/12345')
+    winner.newTaxon('Euacris', 'genus', 'http://mongotax.org/12345')
 
-If the taxon has no particular rank, put 'no rank'.
+If the node has no particular rank, put 'no rank'.
 
 ## Counts
 
     taxon.count()  =>  integer
     taxon.tipCount()  =>  integer
     
-count() returns the number of taxa (nodes) tipward of the given taxon.
+count() returns the number of nodes tipward of the given node, including the node itself.
 
-tipCount() returns the number of tips (leaf nodes) tipward of the given taxon.
+tipCount() returns the number of tips (leaf nodes) tipward of the given node.
 
 ## Surgery
 
 Whenever making ad hoc modifications to the taxonomy please leave a pointer (i.e. a URL) to some
-evidence or source of evidence for the correctness of the change.  If
+source of evidence for the correctness of the change.  If
 the evidence doesn't go in as the source information in a newTaxon() call, put
 it in a comment in the script file.  (Probably the evidence should be an argument to the
 various surgery commands; maybe later.)
 
-Add a new taxon as a daughter of a given one: (would be used with newTaxon)
+Add a new node as a daughter of a given one: (would be used with `newTaxon`)
 
-    taxon.add(othertaxon)
+    taxon.addChild(othertaxon)
     e.g.
-    ott.taxon('Parentia').add(ott.newTaxon('Parentia daughtera', 
+    winner.taxon('Parentia').addChild(winner.newTaxon('Parentia daughtera', 
        'species', 'http://www.marinespecies.org/aphia.php?p=taxdetails&id=557120'))
 
-Detach an existing taxon from its current location, and add it as a
+Detach an existing node from its current location, and add it as a
 daughter of a different parent:
 
     taxon.take(othertaxon)
     e.g. 
     # From http://www.marinespecies.org/aphia.php?p=taxdetails&id=556811
-    ott.taxon('Ammoniinae').take(ott.taxon('Asiarotalia'))
+    winner.taxon('Ammoniinae').take(winner.taxon('Asiarotalia'))
 
-Move the children of taxon A into taxon B, and make B be a synonym of
+Move the children of node A into node B, and make B be a synonym of
 A:  (I.e. the names are synonyms, but not previously recorded as
 such):
 
     taxon.absorb(othertaxon)
     e.g. 
     # From http://www.marinespecies.org/aphia.php?p=taxdetails&id=557120
-    ott.taxon('Parentia').absorb(ott.taxon('Parentiola'))
+    winner.taxon('Parentia').absorb(winner.taxon('Parentiola'))
 
-Delete a taxon and all of its descendants:
+Delete a node and all of its descendants:
 
     taxon.prune()
 
-Delete all the descendants of a given taxon: (this is useful for grafting one taxonomy into another)
+Delete all the descendants of a given node: (this is useful for grafting one taxonomy into another)
 
     taxon.trim()
 
-Delete a taxon, moving all of its children up one level (e.g. delete a
+Delete a node, moving all of its children up one level (e.g. delete a
 subfamily making all of its genus children children of the family):
 
     taxon.elide()
@@ -199,35 +218,31 @@ Select a subset of a taxonomy:
 
     taxonomy.select(taxon)
 
-This returns a new taxonomy whose root is (a copy of) the given taxon.
+This returns a new taxonomy whose root is (a copy of) the given node.
 
 (TBD: Need a way to add a root to the forest, or change the root.)
 
 ## Alignment
 
-Taxonomy alignment ('absorb') establishes correspondences between taxa in taxonomy A with taxa in taxonomy B, based on taxon names and topology.  Most of the complexity of this operation has to do with the handling of homonyms.
+Taxonomy alignment establishes correspondences between nodes in taxonomy A with nodes in taxonomy B, based on node names and topology.  Most of the complexity of this operation has to do with the handling of homonyms.
 Sometimes the automatic alignment
 logic makes mistakes.  It is then desirable to manually specify that
-taxon X in taxonomy A is the same as taxon X in taxonomy B (they are not homonyms), or not (they *are* homonyms).
+node X in taxonomy A is the same as node X in taxonomy B (they are not homonyms).
 
-    tax.same(tax1, tax2)
-    tax.notSame(tax1, tax2)
+    alignment.same(tax1, tax2)
       e.g.
-    tax.same(A.taxon("X"), B.taxon("X"))
+    alignment.same(ncbi.taxon('X'), winner.taxon('X'))
 
-These methods are a bit fussy.
-One of the arguments to same or notSame should be a taxon from a taxonomy that is about to be
-'absorbed' but hasn't been yet, and the other should be from the
-taxonomy under construction, after it has had other source taxonomies
-absorbed into it.  (Equivalently it is possible to specify a taxon in a taxonomy that has already been 'absorbed'.)  The taxa may occur in either order.
+The first argument must be in the source taxonomy, and the second must be in the merged taxonomy.
 
-    same(gbif.taxon('Plantae'), ott.taxon('Viridiplantae'))
-    ott.absorb(gbif)
+    alignment = winner.alignment(source)
+    alignment.same(gbif.taxon('Plantae'), winner.taxon('Archaeplastida'))
+    winner.align(alignment)
+    winner.merge(alignment)
 
-Should the need arise you can find out what a source taxon maps to, using image():
+Should the need arise you can find out what a source node maps to, using `image()`:
 
-    tax2.absorb(tax1)
-    tax2.image(tax1.taxon('Sample'))
+    alignment.image(gbif.taxon('Sample'))
 
 
 ## Annotation
@@ -236,50 +251,48 @@ Add a synonym:
 
     taxon.synonym('Alternate name')
 
-Rename a taxon, leaving old name behind as a synonym:
+Rename a node, leaving old name behind as a synonym:
 
     taxon.rename('Newname')
 
-Mark a taxon as being 'incertae sedis' i.e. not classified.  It will
-be retained for use in OTU matching but will not show up in the
-browsable tree unless mentioned in a source tree:
+Mark a node as being 'incertae sedis' i.e. not fully classified:
 
     taxon.incertaeSedis()
 
-Mark as extinct:
+Mark as extinct or extant:
 
     taxon.extinct()
+    taxon.extant()
 
-Force an 'incertae sedis' or otherwise hidden taxon to become visible
-in spite of other information:
-
-    taxon.forceVisible()
-
-Mark a taxon as 'hidden' so that it can be suppressed by tools downstream:
+Mark a node as 'hidden' so that it can be suppressed by tools downstream:
 
     taxon.hide()
     
 ## Looking at taxonomies
 
-Taxonomies are iterable.
+One can iterate over the nodes of a taxonomy:
 
-    for taxon in taxonomy: ...
+    for taxon in taxonomy.taxa(): ...
     
-Taxa have lots of properties you might want to look at in a script.
+Nodes have lots of properties you might want to look at in a script.
     
     taxon.parent
-    taxon.children
+    taxon.children     # null, or a List of nodes
     taxon.isHidden()
     
-You can select only the visible (non-hidden) taxa:
+You can make a copy of a taxonomy, selecting only the visible (non-hidden) nodes:
 
     taxonomy.selectVisible("my taxonomy but only visible")
 
+or make a new taxonomy that is a subtree of a given one:
+
+    rana = ncbi.select('Rana')
+
 ## Debugging
 
 Not much here yet, just
 
     taxon.show()
 
-Displays information about this taxon: its lineage and children, sources,
+Displays information about this node: its lineage and children, sources,
 flags, etc.