-
Notifications
You must be signed in to change notification settings - Fork 8
Annotating API to make it queryable
This page describes how you annotate your OpenAPI/Swagger definition to be queryable by the OpenRiskNet SPARQL service.
An important concept of OpenRiskNet is the idea that we should try to make REST or similar HTTP based APIs semantically understandable for machines and humans. To fulfill this purpose, we combine several technologies:
- OpenAPI for documenting the technical API structure (HTTP endpoints, json structure of responses and requests, ...). This can be parsed by various tools to e.g. generate client libraries automatically for specific APIs but also to generate interactive documentation for human developers (e.g. you can follow the "View in SwaggerUI" link on any service in the ORN registry ).
- JsonLD in the OpenAPI json document to be able to transform the entire OpenAPI document into the RDF data model and use ontology annotations. This can be done on a high level of operations as well as on a fine-grained level of individual json keys in the request and response descriptions by mapping json keys to ontology terms.
The basic premise of JsonLd is that it allows you to map a tree like Json data model to a triple based RDF data model. This mapping basically works by starting with a root node and using this as the initial subject, then mapping every json key to a predicate to arrive at the next logical nesting which is either a value or again a node (if the json entity is an object). From here, nested json keys are mapped in a similar fashion. Arrays are resolved simply as repeated uses of triples with the same predicate. Nodes (like the root node) can have a URI if one was assigned using the @id
syntax (or rather x-orn-@id
in OpenRiskNet as described below), but are otherwise assigned anonymous node ids.
Here is a simple plain JsonLD example that should hopefully clarify the concept:
{
"@context": {"@vocab":"http://openrisknet.org/schema#"} ,
"name": "Jane Doe",
"address": {
"street": "Wall street"
}
}
This will be transcribed to the following triples:
subject | predicate | object |
---|---|---|
_:b0 | http://openrisknet.org/schema#address | _:b1 |
_:b0 | http://openrisknet.org/schema#name | Jane Doe |
_:b1 | http://openrisknet.org/schema#street | Wall street |
The @vocab definition in the @context header helps us here because it gives a default mapping of json keys to URIs if no explicit mapping has been given (like a concrete ontology term for a json key - this is described in more detail in the fine grained annotation section below).
There are no node ids given so anonymous ids are used in the form _:b0
, _:b1
etc. From the root node we map with predicate "http://openrisknet.org/schema#name" to the value "Jane Doe" and with the predicate "http://openrisknet.org/schema#address" to another node, _:b1
. From here another triple links with predicate http://openrisknet.org/schema#street
to the value "Wall street".
One important requirement is that the combined OpenAPI/JsonLD document is a valid OpenAPI document. Unfortunately, JsonLD requires certain annotations that are not allowed under the OpenAPI spec (e.g. a top level "@context":{...}
key). To reconcile these two worlds, we prefix such invalid JsonLD keys with x-orn-
as most entities in the OpenAPI spec can be extended with arbitrary json if they use the x-
prefix. The minimum recommended JsonLD context looks like this and should be included in every combined OpenAPI/JsonLD document at the root level:
"x-orn-@context": {
"@vocab": "http://openrisknet.org/schema#",
"x-orn": "http://openrisknet.org/schema#",
"x-orn-@id": "@id",
"x-orn-@type":"@type"
}
The x-orn-@context
definition has to be renamed to @context
manually if you want to parse the document as valid JsonLD (e.g. in the JsonLD playground). The registry does this step as part of the automated preprocessing.
Here again we use the @vocab definition to give a default mapping of Json keys to URIs. We then alias x-orn-@id
to be functionally equivalent to the JsonLD construct @id
and the same with x-orn-@type
to @type
. Finally, we define a convenience URI shortcut x-orn
to mean "http://openrisknet.org/schema#" so we can map write shorter mappings to the default URIs.
When starting to annotate a service on a semantic level, it can be tricky to find good ontology terms to describe concepts. The Ontology Lookup Service by the EBI is a very useful tool for finding terms via fulltext search and to understand the categorization of various ontologies further. By convention, the members of the OpenRiskNet consortium preferentially use the Enanomapper ontology that re-uses many other ontologies to define terms relevant for Toxicology.
When the search for a fitting ontology term comes up short or if there is too much ambiguity, a temporary solution is to define an ad-hoc term with the orn:my-ad-hoc-term
prefix (which will be expanded to a URI in the form "http://openrisknet.org/schema#my-ad-hoc-term"). Such a term can then be queried via SPARQL etc, even if it is not defined in an ontology and thus can't be reasoned about further.
The OpenRiskNet Consortium has developed two strategies for semantically annotating OpenAPI service descriptions. One called the "high level annotation" is intended to be used to give a single ontology term for the input and output of every HTTP endpoint. The "fine grained annotation" on the other hand can be used to attach semantic annotations to individual json keys in the request or response body. Either or both approaches can be used to describe any given service and both are explained in more detail below. An example of a simple service using both approaches is the ChemIDConvert service.
The high level annotations are concerned with the basic building blocks of the API: The id of the service and the inputs and outputs of the endpoints (on a coarse level). Basically we want to give a single ontology term for the input and output of every HTTP endpoint. For this we don't use fine-grained mappings inside the top level x-orn-@context
section but instead use explicit annotations like x-orn-@id
to specify ids for nodes like the top level or x-orn:returns
to describe the high level output of an endpoint.
Below we give a top level x-orn-@id
definition so that triples on the top level will have an explicit URI instead of using an anonymous node identifier (i.e. the root node id resolves to the given url instead of a blank node like in the examples above). We also define the RDF type of the top level identity to be a "http://openrisknet.org/schema#Service". This is an example of an ad-hoc term being used here instead of a URI from an ontology as a temporary measure until a fitting definition is found.
openapi: 3.0.0
x-orn-@id: 'https://lazar.prod.openrisknet.org'
x-orn-@type: 'x-orn:Service'
and will be translated into:
subject | predicate | object |
---|---|---|
https://lazar.prod.openrisknet.org/ | http://www.w3.org/1999/02/22-rdf-syntax-ns#type | http://openrisknet.org/schema#Service |
Now to annotate an actual HTTP endpoint (annotations like this can be usefully added either at the endpoint level or one level down at the request verb):
/molWeight:
x-orn-@type: "x-orn:Prediction"
x-orn:expects:
x-orn-@id: "x-orn:Compound"
x-orn:returns:
x-orn-@id: "http://semanticscience.org/resource/CHEMINF_000088" # Enanomapper ontology term for molecular weight
get:
summary: Returns the molecular weight (calculated with rdkit)
Here the json-ld type of the /molWeight endpoint is set to the ad-hoc term http://openrisknet.org/schema#Prediction, and the high level concepts of "expects" and "returns" as high level semantic inputs and outputs are defined (using an ad-hoc and an actual ontology term respectively for demonstration).
parameters:
smiles:
name: Smiles
in: path
description: Smiles String
required: true
schema:
type: string
The above high level annotation is a useful overview, but it can be very useful for services to give semantic annotations on a finer level, namely that of the individual json object keys. Using the Json-ld context, every json key can be mapped to an ontology term that will then be used as a predicate in the rdf data model. This allows developers of APIs to annotate complex json response/return types on the level of individual keys and thus at a much finer granularity onr:expects and orn:returns structures above.
Here inchi
is a json key and is described in the JSON-LD definition with an ontology URI it will be automatically converted.
{"x-orn-@context": {
"@vocab": "http://openrisknet.org/schema#",
"x-orn": "http://openrisknet.org/schema#",
"x-orn-@id": "@id",
"x-orn-@type":"@type",
"smiles": "http://semanticscience.org/resource/CHEMINF_000018",
"inchi": "http://semanticscience.org/resource/CHEMINF_000113",
"inchikey": "http://semanticscience.org/resource/CHEMINF_000059",
"cas": "http://semanticscience.org/resource/CHEMINF_000446"
},
...
"properties": {
"inchi": {
"description": "Compound structure notated using InChI notation",
"type": "string"
}
}
}
subject | predicate | object |
---|---|---|
_:b98 | http://semanticscience.org/resource/CHEMINF_000113 | _:b99 |
_:b99 | http://openrisknet.org/schema#description | Compound structure notated using InChI notation^^http://www.w3.org/2001/XMLSchema#string |
This section describes a few details and "gotchas" that you should be aware of when annotating services.
The annotation approach described under Fine Grained Annotation
works great for json formatted request bodies and responses (since here the json keys can easily be mapped in the x-orn-@context section), but they fall short when it comes to describing request parameters passed in the query string. These are of course commonly used to pass parameters for http requests. While often very technical in nature and thus not very interesting targets for semantic annotation (e.g. paging parameters for searches like limit or offset), there are cases where e.g. a chemical identifier is passed in this way. However, because the parameter section of an OpenAPI query parameter description does not contain the parameter as a json key but instead as a string value, the fine grained annotation approach does not work. To clarify, look at this paramter description:
name: smiles
in: path
description: Smiles String
required: true
schema:
type: string
"smiles" only occurs as a value and thus will not be mapped to the ontology term http://semanticscience.org/resource/CHEMINF_000018
if using the above x-orn-@context.
To enable semantic annotations also for request parameters, what you can do insetad is assign a named id, like this:
name: smiles
x-orn-@id: http://semanticscience.org/resource/CHEMINF_000018
in: path
description: Smiles String
required: true
schema:
type: string
This way, SPARQL queries can be constructed that will pick up nodes with this id as well as predicates that are generated using the usual x-orn-@context mapping.
When authoring OpenAPI documents it is common practice to use references to define elements like schema definitions, parameters or entire responses once in the components
section and then re-use these elements like this:
parameters:
- "$ref": "#/components/parameters/smiles"
Such a reference is a problem for a naive Json-LD interpretation because the referencing exists only on the logical level of the OpenAPI document model. This means that if the smiles parameter in the above example were annotated with the x-orn-@id as described in the previous section it would only be understood like this for the components section, not for each of the endpoints referencing it.
To enable such use, the OpenRiskNet registry will dereference references where possible by copying the contents to the place of use. However, there is one important caveat - OpenAPI schemas can be recursive (reference themselves) or be mutually recursive. An example of the former is a tree structure response where the tree schema is described in the components section and as one of the properties the children are again defined by referencing the same tree schema (using a $ref like above). Such recursive types are not expanded and instead the original $ref are left in place. This means that for endpoints etc using such definitions, the RDF triples based data representation will not contain these schema definitions and thus might not show up in queries as expected. Queries that want to discover semantic annotations with recursive schema types have to be implemented based on parsing the OpenAPI structure instead of with the SPARQL query language.
A good source of exmaples is to look at the existing services running the OpenRiskNet reference instance by visiting the OpenRiskNet registry at [http://orn-registry-openrisknet-registry.prod.openrisknet.org/]. There, you can see active services and look at their OpenRiskNet annotated OpenAPI definitions (both the raw definition provided by the services and the "dereferenced" one after the preprocessing has been done to turn it into a valid JSON-LD document); you can look at SwaggerUI renderings of the OpenAPI definitions as a more human friendly view onto the APIs; and you can run SPARQL queries to experiment with the kind of queries that can be done using the semantic annotations that were added to the services. At the time of writing, OpenRiskNet is still an ongoing project and different services still experiment with slightly different approaches to semantic annotation - so you will see some services doing things slightly different from others. For annotating your own services, the best guide is to think about what queries could be interesting to run for potential users and plan for annotations along these lines.