Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
- Add `mapping_tool_id` slot to the `Mapping` and `MappingSet` classes ([issue](https://github.com/mapping-commons/sssom/issues/449)).
- Add `record_id` slot to the `Mapping` class ([issue](https://github.com/mapping-commons/sssom/issues/359)).
- Change all URI-typed slots to clarify that they expect _non-relative_ URIs as values ([issue](https://github.com/mapping-commons/sssom/issues/448)).
- Add specification for the RDF serialisation ([discussion](https://github.com/mapping-commons/sssom/discussions/454)).

## SSSOM version 1.0.0

Expand Down
3 changes: 2 additions & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -39,8 +39,9 @@ nav:
- Serialisations:
- Introduction: spec-formats.md
- SSSOM/TSV serialisation: spec-formats-tsv.md
- SSSOM/JSON serialisation: spec-formats-json.md
- SSSOM/RDF serialisation: spec-formats-rdf.md
- OWL/RDF serialisation: spec-formats-owl.md
- JSON serialisation: spec-formats-json.md
- Resources for contributors: contributing.md
- Resources for users:
- FAQ: faq.md
Expand Down
366 changes: 366 additions & 0 deletions src/docs/spec-formats-rdf.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,366 @@
# The SSSOM/RDF serialisation format

This section defines how to represent a SSSOM mapping set as a [RDF
model](https://www.w3.org/TR/rdf11-concepts/).


## RDF formats
The RDF model that represents a SSSOM mapping set is independent of the
concrete format that may be used to serialise the model.

It is RECOMMENDED that implementations support reading and writing a
SSSOM set from and to the [RDF Turtle](https://www.w3.org/TR/turtle/)
format at least. They MAY support any other RDF concrete format (e.g.
RDF/XML, TriG, N-Triples, etc.).

This specification does not mandate how a concrete RDF syntax is to be
used. For example, if the RDF syntax allows named resources and
predicates to be serialised as either IRIs or CURIEs, if is left at the
discretion of the implementations (or their users) to decide which form
to use.

<a id="sssom-slots"></a>
## Representation of slots
A metadata slot on any given SSSOM object (such as a `Mapping` or a
`MappingSet`) is represented as a RDF triple where:

* the subject is the resource representing the SSSOM object;
* the predicate is either:
* the property indicated by the `URI` field in the LinkML
description of the slot, if such a field is present;
* or a property constructed by catenating the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* or a property constructed by catenating the
* or a property constructed by concatenating the

`https://w3id.org/sssom/` namespace and the name of the slot;
* the object is the value of the slot.

### Representation of slot values
The following rules determine how the value of a slot is represented as
the object of a RDF triple.

#### For slots typed as `sssom:EntityReference`
(e.g. `subject_id`, `mapping_justification`, `subject_source`…)

The value is rendered as a named RDF resource (IRI).

#### For slots typed as `sssom:NonRelativeURI`
(e.g. `license`, `mapping_provider`, `issue_tracker`…)

The value is rendered as a named RDF resource (IRI).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why use "is" instead of MUST BE lingo here?


#### For slots typed as `linkml:date`
(e.g. `mapping_date`, `publication_date`)

The value is represented as a `xsd:date` literal.

#### For slots typed as `linkml:double`
(e.g. `mapping_set_confidence`, `confidence`, `similarity_score`)

The value is represented as a `xsd:double` literal.

#### For slots typed as an enumeration
(e.g. `sssom_version`, `mapping_cardinality`, `subject_type`…)

If the permissible values for the enumeration are defined in the LinkML
model as having an associated `meaning` property, then the value is
represented as a named RDF resource with the indicated property.
Otherwise, the value is represented as a string literal.

#### For slots typed as a SSSOM object
(e.g. `mappings`, `extension_definitions`)

The value is represented as a RDF resource. Whether the resource is
named (IRI) or not (blank node) will depend on the type of the object,
see the [section on representing SSSOM objects](#sssom-objects) below
for details.

### Representation of multi-valued slots
(e.g. `creator_id`, `see_also`, `object_match_field`…)

As an exception to the general principle that slots are represented by a
single RDF triple, multi-valued slots are represented by as many
triples as there are values, each value being the object of one triple.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i stumbled across this sentence multiple times. Maybe this can be written clearer like:

for each value {v1,v..,vn} represented by a set of individual triples {a,b,v1; a,b,v2,...a,b,vn}.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, is this true for mappings slot as well? Probably not right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for each value {v1,v..,vn} represented by a set of individual triples {a,b,v1; a,b,v2,...a,b,vn}.

Does not sound any clearer to me. On the contrary, it sounds like each value is represented by a set of triples, which is certainly not the case.

Also, is this true for mappings slot as well? Probably not right?

Of course it is. Mappings are represented as follows:

?mappingset sssom:mappings [ a owl:Axiom ;
                               owl:annotatedSource ...
                           ] ,
                           [ a owl:Axiom ;
                               owl:annotatedSource ...
                           ] .

which fits the description for multi-valued slots: one triple per value.

This is what SSSOM-Py has always done, so I had assumed you were fine with that.

Let me guess: you are no longer happy with that and want to radically change the format?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it seems I misunderstood

multi-valued slots are represented by as many
triples as there are values, each value being the object of one triple.

I thought this literally meant:

for each value {v1,v..,vn} represented by a set of individual triples {a,b,v1; a,b,v2,...a,b,vn}.

Which is why I was confused.

Maybe just add an example here?


> Non-normative notes:
>
> 1. This means, in particular, that RDF complex structures intended to
> represent collections of values, such as `rdfs:Container` or
> `rdfs:List`, MUST NOT be used to represent multi-valued SSSOM
> slots.
> 2. This also implies that values in multi-valued slots are _not_
> ordered.

The other rules above apply to determine how each single value is to be
represented.

<a id="extension-slots"></a>
### Representation of extension slots
An [extension slot](spec-model.md#non-standard-slots) is represented in
a similar way to a standard slot, with the following specific rules.

The predicate is the property associated to the extension slot, as
indicated by the `property` slot in the set’s
[definition](ExtensionDefinition.md) of the extension.

The value of the extension is represented:

* as a named RDF resource, if the `type_hint` of the extension
definition is `linkml:uriOrCurie`;
* otherwise, as a literal of the type indicated by the `type_hint`.


<a id="sssom-objects"></a>
## Representation of SSSOM objects

### Representation of a `Mapping` object
The RDF type of a `Mapping` object is `owl:Axiom`.

If the `Mapping` object has a `record_id` slot, then the value of that
slot is used as the named RDF resource that represents the object (and
consequently, that slot MUST NOT be represented using the [general
rules](#sssom-slots) for the representation of slots as defined above).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe better to phrase this postively, like "rules don't apply".

Otherwise, the `Mapping` object is represented as a blank node.

### Representation of a `MappingSet` object
The RDF type of a `MappingSet` object is `sssom:MappingSet`.

A `MappingSet` object is represented by a named RDF resource
corresponding to the value of the `mapping_set_id` slot (which
consequently MUST NOT be represented using the [general
rules](#sssom-slots) for the representation slots as defined above).

The `curie_map` slot MUST NOT be represented using the [general
rules](#sssom-slots). Instead, if it is needed it MUST be represented
using whatever mechanism is provided by the concrete RDF serialisation
format (e.g. `@prefix` declarations in [RDF
Turtle](https://www.w3.org/TR/turtle/) or [RDF
TriG](https://www.w3.org/TR/trig/), or `xmlns` namespace declarations in
[RDF/XML](https://www.w3.org/TR/rdf-syntax-grammar/)).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit against any relationship between curie map and the various RDF prefix syntaxes you list. cc @cthoyt

My main concern is this: two of the most important serialisations (RDF/XML, OWL/XML) can't even accurately represent a sssom:curie_map. Because in these XML serializations the prefix system is hooked into the XML namespacing system, the local identifier part has severe syntactic constraints. In particular, it must correspond to an NCNAME, which means it MUST start with a letter (not a number, so you cant actually represent UBERON:123 in RDF XML).

My vote is to represent the prefix map using the SHACL prefixmap:

<?xml version="1.0"?>
<rdf:RDF
    xmlns:sh="http://www.w3.org/ns/shacl#"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
    xmlns:ex="http://example.org/"
    xmlns:foaf="http://xmlns.com/foaf/0.1/"
    xml:base="http://example.org/shapes">

  <!-- Define the prefix map -->
  <sh:NodeShape rdf:about="#MyPrefixMap">
    <sh:prefixes>
      <sh:PrefixDeclaration>
        <sh:prefix>ex</sh:prefix>
        <sh:namespace>http://example.org/</sh:namespace>
      </sh:PrefixDeclaration>
      <sh:PrefixDeclaration>
        <sh:prefix>foaf</sh:prefix>
        <sh:namespace>http://xmlns.com/foaf/0.1/</sh:namespace>
      </sh:PrefixDeclaration>
    </sh:prefixes>
  </sh:NodeShape>

</rdf:RDF>

This ensures 100% faithful roundtripping.

Copy link
Contributor Author

@gouttegd gouttegd Jul 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main concern is this: two of the most important serialisations (RDF/XML, OWL/XML) can't even accurately represent a sssom:curie_map. Because in these XML serializations the prefix system is hooked into the XML namespacing system, the local identifier part has severe syntactic constraints. In particular, it must correspond to an NCNAME, which means it MUST start with a letter (not a number, so you cant actually represent UBERON:123 in RDF XML).

Point taken, but that’s a limitation of RDF/XML. Other prefix-supporting formats such as Turtle don’t have that limitation.

You want to preserve the CURIE map accurately and preserve the ability to roundtrip? Then don’t use RDF/XML.

My vote is to represent the prefix map using the SHACL prefixmap:

You are joking. You are joking, right?

You are not seriously proposing something completely different from what we currently have? Something for which we don’t have the inkling of the beginning of support in neither SSSOM-Py nor SSSOM-Java?

Because if you are not joking, I give up.

Design your dream format all you want, and give me a sign when you’ll be done moving the goalposts from one day to the next.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are joking. You are joking, right?

I did not expect such a strong disagreement :P

Alright, instead of this suggestion, we should then make SSSOM-RDF explicitly write only format.. I am totally fine with this as well, and we don't need to concern ourselves then at all with the curie_map.. This way we can circumvent the limitations outlined above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not expect such a strong disagreement :P

Try sparing some thoughts for the people who have to implement your last-minute ideas that you sprout out of nowhere, maybe you’ll understand.

Two days ago you were reluctant to just changing the predicates to use to represent the core triple (e.g. changing from owl:annotatedSource to sssom:subject_id, which would have been a much smaller change and easier change than what you’re suggesting here). I thought we were in agreement that the SSSOM/RDF format should be as close as possible to what we already have, both to avoid needlessly breaking things and to keep the work needed to make SSSOM-Py support the updated format minimal (given that we have very little developer time available to work on SSSOM-Py). The spec I proposed is tailored for that; it deviates only minimally from the existing, de facto SSSOM/RDF format that SSSOM-Py has been producing since forever; it is already completely supported by SSSOM-Java and I am hopeful that SSSOM-Py could be made to support it relatively easily (precisely due to the minimal deviations). You were fine with the initial proposal, on which you said that you had merely “minor stylistic comments”.

And now, you’re here suggesting that we should in fact do something completely different, that has never even been casually mentioned in any discussion related to the SSSOM/RDF format. Because all of a sudden you are concerned about roundtripping back from RDF/XML, even though the RDF/XML produced by SSSOM-Py has never been roundtrippable and AFAIK nobody has ever complained about that.

So yeah, I disagree with your proposal, because it contradicts everything that you seemed to want just two days ago.

So what do you want now?

A. A SSSOM/RDF format that is minimally different from what we already have, that can be supported rapidly (that is already supported by one implementation), but that (oh, the horror!) does not guarantee that a set written in RDF/XML can be roundtripped back to another SSSOM format?

B. A SSSOM/RDF format that is a clean break from the existing stuff, that will initially not be supported by any implementation (and in fact I doubt it will ever be implemented by SSSOM-Py, given the lack of activity on that implementation)?

If you want B, fine. But then I’ll leave you to design the format. I won’t get involved in any of it, I’ll just wait until you have designed the perfect format of your dream, and then I promise I’ll do my best to implement it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should then make SSSOM-RDF explicitly write only format..

Now you are throwing the baby with the bathwater. Just because the RDF/XML concrete serialisation may not guarantee that the prefix map is preserved does not mean that we should give up on SSSOM/RDF being a read/write format.

Again: As currently written, the spec does allow a SSSOM/RDF set to be fully converted back to any other SSSOM format, provided that:

  • you do not serialise into RDF/XML;
  • you serialise identifiers as CURIEs and make sure to include the appropriate prefix declarations.

As I said in another comment below, I wrote the spec to be flexible (“à la carte”): if you want the ability to roundtrip between RDF and another format, you can have it; if you are not interested in that ability, you can ignore it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • you do not serialise into RDF/XML;
  • you serialise identifiers as CURIEs and make sure to include the appropriate prefix declarations.

In fact even if you do serialise into RDF/XML and write identifiers as IRIs, you will still be able convert back to SSSOM/TSV, unless you happen to run your RDF/XML file into a RDF processor that decides to strip away any unused namespace declarations. Not sure if that is a common behaviour among RDF tools, but Jena and RDFLib do not seem to do it – they are happy to let unused namespace declarations pass through unchanged.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets tackle the NCname issue with a comment in the documentation: XML formats might not be able to roundtrip

(@gouttegd says: longest URI extension will win)

Maybe just document a rule on conflicting uri prefixes for roundtrip

  • cant control curiefication in RDF based formats
  • add "longest uri expansion" assumption to canonicalisation and refer to canonicalisation as "preprocessing" for RDF generation.


> Non-normative notes
>
> 1. The CURIE map may not be needed at all if all named resources and
> predicates are always serialised as full-length IRIs.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only true if we decide that SSSOM-RDF is export only format. I know we said this for SSSOM-OWL, forgot where we ended up with RDF.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is why there is a subsection ”serialisation of identifiers” in the “Special considerations” section.

The SSSOM/RDF format can be both a read/write format that is equivalent to SSSOM/TSV or SSSOM/JSON (meaning that can roundtrip between all those formats without loss of information) and an export format. It all depends on what you want to do with the output file – something the spec cannot know in advance.

In a sense, SSSOM/RDF is a “à la carte” format. You want to preserve the ability to roundtrip back to SSSOM/TSV? You can, just make sure to include the CURIE map and any extension definitions. You are not interested in being able to come back (say, because all you want is to load the set into a graph database and forget about it)? Then you don’t have to worry about the CURIE map or extension definitions at all.

> 2. If at least some named resources or predicates are serialised as
> CURIEs, the RDF requirement that all used prefix names must be
> declared (using the appropriate mechanism for the chosen concrete
> syntax) takes precedence over the possibility of omitting the
> declarations of prefix names that are considered
> [built-in](spec-intro.md#iri-prefixes) in the context of SSSOM.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would compact this sentence which has many redundant parts (the RDF requirement that all used prefix names must be declared) to:

Prefixes considered built-in by other serialisations MUST be rendered using their fully qualified name (IRI / CURIE).


### Representation of a `ExtensionDefinition` object
The RDF type of a `ExtensionDefinition` object is
`sssom:ExtensionDefinition`.

A `ExtensionDefinition` object has no identifier of any kind and is
always represented by a blank node.

## Special considerations for serialising to RDF
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mark as not normative?

When serialising a mapping set to SSSOM/RDF, implementations should
consider how the resulting RDF file is intended to be used. In
particular, they should ponder whether it is expected that the RDF
serialisation can at any time be converted back to any other SSSOM
format (e.g. SSSOM/TSV), or if it is only intended to be used by
“generic”, non-SSSOM-aware RDF applications.

Depending on that intended usage (if it is known), implementations may
adopt slightly different behaviours as described in the following
subsections.

### Serialisations of identifiers
If the serialisation is intended to be convertible back to another SSSOM
format (especially the SSSOM/TSV format), implementations MUST
serialise identifiers as CURIEs and include the required prefix
declarations.

> Non-normative explanation
>
> This is because, if all identifiers are serialised as full-length
> IRIs, then even if the RDF file includes prefix declarations, they may
> be stripped away by a RDF reader, since they are not needed. And
> without those prefix declarations, it would not be possible to
> serialise the set back as a SSSOM/TSV file (remember that the
> SSSOM/TSV format _requires_ that identifiers be serialised as CURIEs).

Conversely, if the ability to convert the RDF file back to another SSSOM
format is not required, implementations can freely decide whether to
serialise identifiers as IRIs or CURIEs (assuming the concrete RDF
syntax allows that of course).

### Extension definitions
Extension definitions MAY be omitted if the RDF file is only intended to
be used by RDF applications.

Conversely, they SHOULD be included if the set is intended to be
convertible back to another SSSOM format.

> Non-normative explanation
>
> The whole point of an extension definition in SSSOM is to provide (1)
> a property that confers some meaning to the extension, and (2) the
> type of the expected values. In RDF, as described
> [above](#extension-slots), those two bits of information are already
> contained in the triple that represent the extension slot, so there is
> no need for an additional definition.
>
> But the extension definition also provides the `slot_name` which is
> used to represent the extension slot in other formats (especially
> SSSOM/TSV), so if conversion back to other SSSOM formats is required,
> ensuring that the extension definitions are present in the RDF
> serialisation is helpful.

### Propagation and condensation
Propagatable slots can be represented in RDF indifferently in their
propagated or condensed form, following the [normal
rules](spec-model.md##propagation-of-mapping-set-slots) for propagation
and condensation.

But if the RDF file is intended to be used by generic, non-SSSOM-aware
RDF applications, then implementations SHOULD serialise propagatable
slots in their propagated form.

> Non-normative explanation
>
> Propagation is a SSSOM-specific concept. If a RDF application is
> provided with a RDF file representing a set with condensed slots, the
> application will not know to propagate the condensed slots at the set
> level down to the level of the individual mappings, which will result
> in the application having an incomplete view of the mappings.


## Compatibility with pre-standard RDF representations
The present specification of the SSSOM/RDF format differs slightly from
what several implementations of SSSOM have been producing before the
format was formally specified.

In the name of backward compatibility, implementations MAY support the
alternative rules described in the following subsections when
deserialising from RDF.

Implementations MUST NOT follow these rules when serialising to RDF.

### Representation of slots typed as `sssom:NonRelativeURI`
Implementations MAY accept a value represented as a `xsd:anyURI`
literal.

### Representation of slots typed as an enumeration
Implementations MAY accept a value represented as a string literal, even
if the value is defined in the LinkML model as having an associated
`meaning` property.

For example, implementations MAY accept

```ttl
?mapping sssom:predicate_modifier "Not"^^xsd:string .
```

as an alternative to

```ttl
?mapping sssom:predicate_modifier sssom:NegatedPredicate .
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just decide on one and say "this is the standard"?

Copy link
Contributor Author

@gouttegd gouttegd Aug 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That’s what we did. This:

?mapping sssom:predicate_modifier sssom:NegatedPredicate .

is the standard.

But the decision to standardize that form has only been made a few weeks ago. Before that, both SSSOM-Java and SSSOM-Py have been producing the string literal form (SSSOM-Java for the past 8 months – since version 1.1, which introduced RDF support – and SSSOM-Py for as long as it has existed). In fact SSSOM-Py still produces the string literal form to this day.

So for backwards compatibility (which is the entire point of this section, “compatibility with pre-standard representations”), implementations MAY support the old string literal form, even though it is not the standard form.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense.

```

### Representation of a `MappingSet` object
Implementations MAY accept a `MappingSet` object represented as a blank
node, with the `mapping_set_id` slot being represented as any other
slot.

For example, instead of

```ttl
<https://example.org/myset> a sssom:MappingSet .
```

implementations MAY accept

```ttl
[] a sssom:MappingSet ;
sssom:mapping_set_id <https://example.org/myset> .
```

or even (by also applying the alternative rule regarding the
representation of slots typed as `sssom:NonRelativeURI`)

```ttl
[] a sssom:MappingSet ;
sssom:mapping_set_id "https://example.org/myset"^^xsd:anyURI .
```

## Examples

> This section is non-normative.

Considering the following set in the SSSOM/TSV format:

```
#curie_map:
# EXT: https://example.org/properties/
# FOODON: http://purl.obolibrary.org/obo/FOODON_
# KF_FOOD: https://kewl-foodie.inc/food/
# ORCID: https://orcid.org/
#mapping_set_id: https://example.org/sample-set
#mapping_set_description: Manually curated alignment of KEWL FOODIE INC internal food and nutrition database with Food Ontology (FOODON). Intended to be used for ontological analysis and grouping of KEWL FOODIE INC related data.
#license: https://creativecommons.org/licenses/by/4.0/
#mapping_date: 2025-07-14
#extension_definitions:
# - slot_name: ext_fooable
# property: EXT:isFooable
# type_hint: xsd:boolean
subject_id subject_label predicate_id object_id object_label mapping_justification author_id confidence ext_fooable
KF_FOOD:F001 apple skos:exactMatch FOODON:00002473 apple (whole) semapv:ManualMappingCuration ORCID:0000-0002-7356-1779 0.95 true
KF_FOOD:F002 gala skos:exactMatch FOODON:00003348 Gala apple (whole) semapv:ManualMappingCuration ORCID:0000-0002-7356-1779 1 false
```

A valid serialisation of that set in RDF/Turtle would be:

```ttl
@prefix EXT: <https://example.org/properties/> .
@prefix FOODON: <http://purl.obolibrary.org/obo/FOODON_> .
@prefix KF_FOOD: <https://kewl-foodie.inc/food/> .
@prefix ORCID: <https://orcid.org/> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix pav: <http://purl.org/pav/> .
@prefix semapv: <https://w3id.org/semapv/vocab/> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix sssom: <https://w3id.org/sssom/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<https://example.org/sample-set> a sssom:MappingSet;
dcterms:description "Manually curated alignment of KEWL FOODIE INC internal food and nutrition database with Food Ontology (FOODON). Intended to be used for ontological analysis and grouping of KEWL FOODIE INC related data.";
dcterms:license <https://creativecommons.org/licenses/by/4.0/>;
sssom:extension_definitions [
sssom:property EXT:isFooable;
sssom:slot_name "ext_fooable";
sssom:type_hint xsd:boolean
];
sssom:mappings [ a owl:Axiom;
pav:authoredBy ORCID:0000-0002-7356-1779;
dcterms:created "2025-07-14"^^xsd:date;
owl:annotatedProperty skos:exactMatch;
owl:annotatedSource KF_FOOD:F001;
owl:annotatedTarget FOODON:00002473;
EXT:isFooable true;
sssom:confidence 9.5E-1;
sssom:mapping_justification semapv:ManualMappingCuration;
sssom:object_label "apple (whole)";
sssom:subject_label "apple"
], [ a owl:Axiom;
pav:authoredBy ORCID:0000-0002-7356-1779;
dcterms:created "2025-07-14"^^xsd:date;
owl:annotatedProperty skos:exactMatch;
owl:annotatedSource KF_FOOD:F002;
owl:annotatedTarget FOODON:00003348;
EXT:isFooable false;
sssom:confidence 1.0E0;
sssom:mapping_justification semapv:ManualMappingCuration;
sssom:object_label "Gala apple (whole)";
sssom:subject_label "gala"
] .
```
Copy link
Member

@cthoyt cthoyt Aug 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect that the triples represented by the axioms also to show up somewhere in the RDF

Suggested change
```
KF_FOOD:F001 skos:exactMatch FOODON:00002473 .
KF_FOOD:F002 skos:exactMatch FOODON:00003348 .
```

though it's not clear what to do for negated triples

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They appear, but only as reified OWL axioms.

This has been the RDF output produced by SSSOM-Py since the beginning.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its a valid question though. Without the direct triple, triples stores might not be able to return all terms mapped to ?x with a simple triple pattern - this is only possible because in many of the pipelines I had build that are doing gathering dust, I loaded the sssom file into robot and saved it, which I believe automatically injects that triple.

In the OWL serialisation it seems I have injected them in sssom py:
https://github.com/mapping-commons/sssom-py/blob/master/tests/validate_data/cob-to-external.tsv.owl

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without the direct triple, triples stores might not be able to return all terms mapped to ?x with a simple triple pattern

If that is needed, once the set has been exported to RDF it shouldn’t be hard to process it with some SPARQL to construct a ?subject_id ?predicate_id ?object_id triple from every ?mapping owl:annotatedSource ?subject_id ; owl:annotatedPredicate ?predicate_id ; owl:annotatedTarget ?object_id set of triples, before loading it into a triple store.

I wouldn’t mind adding that as a OPTIONAL behaviour for RDF writer, on the condition that it is really optional – that is, if those ?subject_id ?predicate_it ?object_id triples are absent the set must still be accepted by a SSSOM/RDF reader, as long as the ?mapping owl:annotatedSource ?subject_id ; owl:annotatedPredicate ?predicate_id ; owl:annotatedTarget ?object_id triples are present.

Something like:

A SSSOM/RDF writer MAY additionally inject for every mapping record a triple of the form ?subject_id ?predicate_id ?object_id. A SSSOM/RDF reader MUST NOT expect the presence of such triples.

If we do allow that, this raises the question (as hinted by @cthoyt) of what to do about negated mappings. Two possibilities:

(A) Don’t care. Negated mappings are treated in the same way as any other mappings. If users don’t want ?subject_id ?predicate_id ?object_id triples for negated mappings, it’s up to them to filter out negated mappings before exporting the set to RDF.

(B) Explicitly exclude negated mappings, as in

A SSSOM/RDF writer MAY additionally inject for every mapping record a triple of the form ?subject_id ?predicate_id ?object_id, only for mapping records that do not have a predicate_modifier of Not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also the question of mappings with sssom:NoTermFound, e.g.

record_id subject_id predicate_id object_id object_source
MYMAP:1 HP:1234 skos:exactMatch sssom:NoTermFound obo:doid.owl

Rendering them as

MYMAP:1 a owl:Axiom ;
           owl:annotatedSource HP:1234 ;
           owl:annotatedPredicate skos:exactMatch
           owl:annotatedTarget sssom:NoTermFound ;
           sssom:object_source obo:doid.owl .

should be perfectly fine, but do we also want a

HP:1234 skos:exactMatch sssom:NoTermFound .

triple as well?

I’d say, we should do for them the same thing as we do for negated mappings (the two possibilities outlined in my previous message).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am now on board with:

I wouldn’t mind adding that as a OPTIONAL behaviour for RDF writer, on the condition that it is really optional – that is, if those ?subject_id ?predicate_it ?object_id triples are absent the set must still be accepted by a SSSOM/RDF reader, as long as the ?mapping owl:annotatedSource ?subject_id ; owl:annotatedPredicate ?predicate_id ; owl:annotatedTarget ?object_id triples are present.

With regards to the special cases:

I’d say, we should do for them the same thing as we do for negated mappings (the two possibilities outlined in my previous message).

I agree. Just because both are very different use cases I would favour it if implementations would be injecting triples to sssom:NoTermFound versus negated mappings based on separate conditionals; I would probably never add either one, but there may be some use cases to do so.


Note that the two `Mapping` objects are represented as blank nodes,
since the original set does not contain any `record_id` slot.

Note also that (1) identifiers are serialised as CURIEs whenever
possible, and (2) the definition for the `EXT:isFooable` extension is
included. This means that the set can be fully converted back to
SSSOM/TSV without any loss of information.
3 changes: 2 additions & 1 deletion src/docs/spec-formats.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,8 @@
The SSSOM standard defines the following serialisation formats for storing and exchanging mapping sets:

* the [SSSOM/TSV](spec-formats-tsv.md) format;
* the [SSSOM JSON](spec-formats-json.md) format;
* the [SSSOM/JSON](spec-formats-json.md) format;
* the [SSSOM/RDF](spec-formats-rdf.md) format;
* and the [OWL/RDF](spec-formats-owl.md) format.

Implementations MUST support the SSSOM/TSV format. They MAY support the other formats.
Loading