Skip to content

Wish list

Jonathan A Rees edited this page Apr 6, 2020 · 6 revisions

Wish list for 'checklist diff' tool

Please read README first

  • Documentation of method
  • Try out more taxonomic sources e.g. CoL, MSW 2/3/4
  • Translation to RCC-5
  • Method
    • Handle 'unclassified' ('incertae sedis') containers and child rank inconsistencies
    • Compute RCC-5 > vs. the usual = when B-TNU has no exact A-TNU MRCA match
    • Eliminate NCBI 'containers' without losing implicit disjointness claims (e.g. via ranks or via 'mutexes')
    • Quantified uncertainty (confidence in RCC-5 judgments)...
    • Match even when the genus differs (if closely related, e.g. a sibling also matches in the same way, and unambiguous)
    • Match even when there is a gender error (-us vs. -a)
    • Enlist genetic and occurrence data, and perhaps other data, in understanding TNUs and/or assigning data records to TNUs
  • Display
    • Fix display of synonyms; maybe 3 columns: accepted TNU in A hierarchy, a name that's in both A and B that enables the match (it's either A or an A synonym, and also either B or a B synonym), accepted TNU in B hierarchy
    • Stop eliding repeated names with =; maybe put name comparison status (= vs. blank) in a separate column
    • Show B nesting level somehow? Show A nesting better (indentation??)? Deal with ranks better? (NCBI ranks are not always consistent, sometimes not given at all)
    • Group A-TNU children according to their B-TNU parent, when the implied A-group is split into multiple B-groups
    • Maybe show all consistent RCC-5 options, as opposed to just suggesting one? Hmm...
    • Nice HTML output similar to Avibase checklist comparator
    • Graphviz? (that might be better done by separate fdownstream tools)
  • Questions
    • Is there a sensible way to distinguish 'wholly new' from, say, splitting? Or to suggest to the user that either is a possibility for 'no match'? (in the case of Genbank, we could look at the sequence records maybe, as a heuristic??)

Clone this wiki locally