-
Notifications
You must be signed in to change notification settings - Fork 8
Annotating API to make it queryable
This page describes how you annotate your OpenAPI/Swagger definition to be queryable by the OpenRiskNet SPARQL service.
An important concept of OpenRiskNet is the idea that we should try to make REST or similar HTTP based APIs semantically understandable for machines and humans. To fulfill this purpose, we combine several technologies:
- OpenAPI for documenting the technical API structure (HTTP endpoints, json structure of responses and requests, ...). This can be parsed by various tools to e.g. generate client libraries automatically for specific APIs but also to generate interactive documentation for human developers (e.g. you can follow the View in SwaggerUI link on any service in the ORN registry.
- JsonLD in the OpenAPI json document to be able to transform the entire OpenAPI document into the RDF data model and use ontology annotations. This can be done on a high level of operations as well as on a fine-grained level of individual json keys in the request and response descriptions by mapping json keys to ontology terms.
The basic premise of JsonLd is that it allows you to map a tree like Json data model to a triple based RDF data model. This mapping basically works by starting with a root node and using this as the initial subject, then mapping every json key to a predicate to arrive at the next logical nesting which is either a value or again a node (if the json entity is an object). From here, nested json keys are mapped in a similar fasion. Arrays are resolved simply as repeated uses of triples with the same predicate. Nodes (like the root node) can have a URI if one was assigned using the @id
syntax (or rather x-orn-@id
in OpenRiskNet as described below), but are otherwise assigned anonymous node ids.
Here is a simple plain JsonLD example that should hopefully clarify the concept:
{
"@context": {"@vocab":"http://openrisknet.org/schema#"} ,
"name": "Jane Doe",
"address": {
"street": "Wall street"
}
}
This will be transcribed to the following triples:
subject | predicate | object |
---|---|---|
_:b0 | http://openrisknet.org/schema#address | _:b1 |
_:b0 | http://openrisknet.org/schema#name | Jane Doe |
_:b1 | http://openrisknet.org/schema#street | Wall street |
The @vocab definition in the @context header helps us here because it gives a default mapping of json keys to URIs if no explicit mapping has been given (like a concrete ontology term for a json key - this is described in more detail in the fine grained annotation section below).
There are no node ids given so anonymous ids are used in the form _:b0
, _:b1
etc. From the root node we map with predicate "http://openrisknet.org/schema#name" to the value "Jane Doe" and with the predicate "http://openrisknet.org/schema#address" to another node, _:b1
. From here another triple links with predicate http://openrisknet.org/schema#street
to the value "Wall street".
One important requirement is that the combined OpenAPI/JsonLD document is a valid OpenAPI document. Unfortunately, JsonLD requires certain annotations that are not allowed under the OpenAPI spec (e.g. a top level "@context":{...}
key). To reconcile these two worlds, we prefix such invalid JsonLD keys with x-orn-
as most entities in the OpenAPI spec can be extended with arbitrary json if they use the x-
prefix. The minimum recommended JsonLD context looks like this and should be included in every combined OpenAPI/JsonLD document at the root level:
"x-orn-@context": {
"@vocab": "http://openrisknet.org/schema#",
"x-orn": "http://openrisknet.org/schema#",
"x-orn-@id": "@id",
"x-orn-@type":"@type"
}
The x-orn-@context
definition has to be renamed to @context
manually if you want to parse the document as valid JsonLD (e.g. in the JsonLD playground). The registry does this step as part of the automated preprocessing.
Here again we use the @vocab definition to give a default mapping of Json keys to URIs. We then alias x-orn-@id
to be functionally equivalent to the JsonLD construct @id
and the same with x-orn-@type
to @type
. Finally, we define a convenience URI shortcut x-orn
to mean "http://openrisknet.org/schema#" so we can map write shorter mappings to the default URIs.
When starting to annotate a service on a semantic level, it can be tricky to find good ontology terms to describe concepts. The Ontology Lookup Service by the EBI is a very useful tool for finding terms via fulltext search and to understand the categorization of various ontologies further. By convention, the members of the OpenRiskNet consortium preferentially use the Enanomapper ontology that re-uses many other ontologies to define terms relevant for Toxicology.
When the search for a fitting ontology term comes up short or if there is too much ambiguity, a temporary solution is to define an ad-hoc term with the orn:my-ad-hoc-term
prefix (which will be expanded to a URI in the form "http://openrisknet.org/schema#my-ad-hoc-term"). Such a term can then be queried via SPARQL etc, even if it is not defined in an ontology and thus can't be reasoned about further.
The high level annotations are concerned with the basic building blocks of the API: The id of the service and the inputs and outputs of the endpoints (on a coarse level). Here we don't use fine-grained mappings inside the top level x-orn-@context
section but instead use explicit annotations like x-orn-@id
to specify ids for nodes like the top level or x-orn:returns
to describe the high level output of an endpoint.
Below we give a top level x-orn-@id
definition so that triples on the top level will have an explicit URI instead of using an anonymous node identifier (i.e. the root node id resolves to the given url instead of a blank node like in the examples above). We also define the RDF type of the top level identity to be a "http://openrisknet.org/schema#Service". This is an example of an ad-hoc term being used here instead of a URI from an ontology as a temporary measure until a fitting definition is found.
openapi: 3.0.0
x-orn-@id: 'https://lazar.prod.openrisknet.org'
x-orn-@type: 'x-orn:Service'
and will be translated into:
subject | predicate | object |
---|---|---|
https://lazar.prod.openrisknet.org/ | http://www.w3.org/1999/02/22-rdf-syntax-ns#type | http://openrisknet.org/schema#Service |
/molWeight:
x-orn-@type: "x-orn:Prediction"
x-orn:expects:
x-orn-@id: "x-orn:Compound"
x-orn:returns:
x-orn-@id: "http://semanticscience.org/resource/CHEMINF_000088" # Enanomapper ontology term for molecular weight
get:
summary: Returns the molecular weight (calculated with rdkit)
parameters:
- "$ref": "#/parameters/smiles"
Here the json-ld type of the /molWEight endpoint is set to the ad-hoc term http://openrisknet.org/schema#Prediction, and the high level concepts of "expects" and "returns" as high level semantic inputs and outputs are defined (using an ad-hoc and an actual ontology term respectively).
For parameters
an OpenAPI reference was used. Below is the definition of this in this example. Note that the semantic annotations (an @type definition that give the ontology term for smiles in the CHEMINF ontology) are used and are preserved (i.e. even though the semantic annotation is given in a different place in the OpenAPI definition, it will show up in the place where you would expect it logically in the JSON-LD)
parameters:
smiles:
x-orn-@type: "http://semanticscience.org/resource/CHEMINF_000018"
name: Smiles
in: path
description: Smiles String
required: true
schema:
type: string
The above high level annotation is a useful overview, but it can be very useful for services to give semantic annotations on a finer level, namely that of the individual json object keys. Using the Json-ld context, every json key can be mapped to an ontology term that will then be used as a predicate in the rdf data model. This allows developers of APIs to annotate complex json response/return types on the level of individual keys and thus at a much finer granularity onr:expects and orn:returns structures above.
Here inchi
is a json key and is described in the JSON-LD definition with an ontology URI it will be automatically converted.
{"x-orn-@context": {
"@vocab": "http://openrisknet.org/schema#",
"x-orn": "http://openrisknet.org/schema#",
"x-orn-@id": "@id",
"x-orn-@type":"@type",
"smiles": "http://semanticscience.org/resource/CHEMINF_000018",
"inchi": "http://semanticscience.org/resource/CHEMINF_000113",
"inchikey": "http://semanticscience.org/resource/CHEMINF_000059",
"cas": "http://semanticscience.org/resource/CHEMINF_000446"
},
...
"properties": {
"inchi": {
"description": "Compound structure notated using InChI notation",
"type": "string"
}
}
}
subject | predicate | object |
---|---|---|
_:b98 | http://semanticscience.org/resource/CHEMINF_000113 | _:b99 |
_:b99 | http://openrisknet.org/schema#description | Compound structure notated using InChI notation^^http://www.w3.org/2001/XMLSchema#string |
A good source of exmaples is to look at the existing services running the OpenRiskNet reference instance by visiting the OpenRiskNet registry at [http://orn-registry-openrisknet-registry.prod.openrisknet.org/]. There, you can see active services and look at their OpenRiskNet annotated OpenAPI definitions (both the raw definition provided by the services and the "dereferenced" one after the preprocessing has been done to turn it into a valid JSON-LD document); you can look at SwaggerUI renderings of the OpenAPI definitions as a more human friendly view onto the APIs; and you can run SparQL queries to experiment with the kind of queries that can be done using the semantic annotations that were added to the services. At the time of writing, OpenRiskNet is still an ongoing project and different services still experiment with slightly different approaches to semantic annotation - so you will see some services doing things slightly different from others. For annotating your own services, the best guide is to think about what queries could be interesting to run for potential users and plan for annotations along these lines.