Skip to content

Poor matches against SO, likely due to use of underscores #57

@cmungall

Description

@cmungall

Searching for splice site, I would expect HIGH confidence matches for SIO and SO, as these exactly match the main name:

$ curl -L -s 'http://www.ebi.ac.uk/spot/zooma/v2/api/services/annotate?propertyValue=splice+site' | jq '.[] | .confidence, .semanticTags, .annotatedProperty.propertyValue'
"MEDIUM"
[
  "http://semanticscience.org/resource/SIO_010451"
]
"splice site"
"MEDIUM"
[
  "http://purl.obolibrary.org/obo/SO_0000162"
]
"splice_site"

SO uses underscores in names (arguably a bug in SO, which I may be partly to blame for.. but it is how it is), and indeed if I search using underscores:

$ curl -L -s 'http://www.ebi.ac.uk/spot/zooma/v2/api/services/annotate?propertyValue=splice_site' | jq '.[] | .confidence, .semanticTags, .annotatedProperty.propertyValue'
"GOOD"
[
  "http://purl.obolibrary.org/obo/SO_0000162"
]
"splice_site"

However, a poor user is not likely to know to use underscores when searching SO

Recommendations/questions:

  1. treat underscore identical to space when both indexing and searching
  2. the first hit should return a high confidence match to SIO

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions