Skip to content

Suggestions for improving Getting Started guide #435

@graybeal

Description

@graybeal

I'm having a problem with the generally excellent documentation. For people who think they might like to do mappings with SSSOM, but are not ontology/software experts, there is no great place to start. Let's call our example non-expert Antoniv (onto-naïve, get it?).

The Introduction is very long and goes into publication-level detail about challenges and approaches. Antoniv doesn't need or care to know about those things. (I suggest this is more of a Technical Introduction or Research Introduction.)

The How to (no stand-alone link; it's just the second possibility starting from the master menu) starts with Mapping Justifications and gets more obscure from there. None of them are actually about starting to do mappings, so Antoniv won't find joy there. (I would call these 'Advanced How-To Topics' rather than simply 'How to'.)

The Overview is quite promising, with an elevator speech and a link to Tutorials and Guides and if that fails {Related tutorials](https://mapping-commons.github.io/sssom/training/#related). Unfortunately none of those entries are actually a tutorial about how to use SSSOM, so Antoniv is again frustrated. (Making the first thing under Overview a section called 'Basic Tutorial on Using SSSOM` and pointing to the following Tutorial would be really good. But it would also be good to have an even more basic Quick Start item, and to point to both that and the Basic Tutorial from (a) the main documentation page, (b) the main GitHub page, and (c) the Quick links page.)

The Tutorial starts off as the document closest to what Antoniv wants. But it is a very detailed tutorial; in fact I would call it an Advanced Tutorial in places. Everything here is really valuable, mostly as someone digs a little deeper into SSSOM, or wants understand how to be a competent mapper. But Antoniv isn't there yet, Antoniv just wants to get practice trying out the tooling to see if it is understandable, useful, and low-friction. And there is some really basic stuff in this Tutorial too ("What are we mapping?"), that is not appropriate if you expect the reader to have a basic understanding of ontologies (pre-requisites at top of the document). Mostly this Tutorial suffers from putting everything on on page, including many advanced things, so it is very intimidating. (I'd include these subjects as their own training pages, reordered from yours: Creating SSSOM Identifiers; Mapping Metadata; Basic Concept Mapping with SSSOM; Metadata Describing Your Mapping Set; Metadata Describing Each Mapping; Mapping Automation; Ontology Alignment and Matching. Then the beginning is the Basic Tutorial as described below.)

I'd argue that what most people want is a quick basic tutorial with examples. These are the sections I, I mean Antoniv, needs. (This assumes the user knows what mapping is and what a semantic identifier looks like.) A few links here are fake; Ed indicates an editorial comment by me.

  1. Mapping format and examples ("Mappings are CSV files, like this one. They can also be expressed as RDF files, or [JSON files]. For detailed information see The SSSOM/TSV serialization format.)
  2. Identifiers "As you can see in the example, SSSOM CSV files require the use of CURIEs (compact URIs) as the subject and object entities being mapped. CURIE prefixes are specified in the header. If you want to use full IRIs, you have to use the RDF format for your SSSOM mappings." (Ed: Really??)
  3. Mapping relations "The predicate_ID specifies the mapping relation. Any predicate identifier may be specified, though it's best to use the more common predicates at first until you have expertise in mapping.
  4. Mapping set metadata "The required and optional metadata fields for the mapping set are as follows (see Mapping Set metadata for the description of these fields): …"
  5. Mappings metadata "The required and optional fields for each mapping are as follows ((see Quick reference for mapping metadata for the description of these fields): …"
  6. 'Note that the ID of the subject of the mapping is not required, because …" (Ed: am really surprised it is not required per the mapping metadata page, but it seems important to explain why so the user can take advantage of it.)
  7. Mapping justifications "You can find mapping justifications (the only other required term aside from the mapping object ID) in the semapv vocabulary. (Ed: Except I can't, there is no word 'justification' in that document so how can one look up the justifications? eventually found the following), specifically the terms under the 'matching process' concept', also named 'Matching'. For manually created mappings a simple justification is semapv:ManualMappingCuration." (Ed: Except said identifier/link doesn't resolve directly to the term, only to the vocabulary. A general problem with the format used for terms in the semapv vocabulary.)

To validate your SSSOM files you will need to install and work with the SSSOM Toolkit, or the related Ontology Development Kit documented on that page.

Of course your mapping sets (CSV, RDF, or JSON) are text files, and can be stored in a repository such as GitHub. If your mappings are converted to valid RDF format, you can store the RDF in any suitable ontology repository. For now no other mapping-dedicated repositories exist, though some are under development or proposed.

Of course you may choose to develop your mapping file in some other columnar format like Excel or Google sheets, and then convert them to CSV. For many people this will be an easiest way to work with mapping files, and those with GitHub Actions experience can convert their files from those formats into CSV files in GitHub automatically whenever the source files are changed.

With this basic information Antoniv is quite happy, as work can proceed apace even if there is no immediate tool for working easily with the mappings, at least the format will be nominally computable.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions