-
Notifications
You must be signed in to change notification settings - Fork 8
Modelling IC50 results
At the OpenRiskNet Hackathon in Brussels on 13-14 Dec 2018 we undertook an exercise on how to model IC50 assay with Json Schema (with respect to OpenAPI definitions) and how to semantically annotate this using Json-LD. OpenAPI (along with Json schema) provides the structured definition of the data whilst Json-LD is used to add semantic meaning to this payload using ontologies.
For:
- IC50 (in ENM, from BAO)
- hill model fitting (Hill equation in BAO; added)
- tcpl (too specific, won't do)
- SubclassOf (not equivalent to, since we are not including the information about TCPL) ic50 AND 'has curve fit specification' some 'Hill equation'
- SubclassOf http://www.bioassayontology.org/bao#BAO_0000190 AND ( http://www.bioassayontology.org/bao#BAO_0000335 SOME http://www.bioassayontology.org/bao#BAO_0000435 )
There is now a pull request.
The idea was to define some key components of an OpenAPI data model for describing IC50 results that can be used by applications.
It is not mandatory to use the whole model. For instance, assays are highly variable in nature so it might be more appropriate to create your own custom assay model that uses the model for an IC50 result. Alternatively the quantity
and quantityRange
schema object can be re-used in many other cases.
The Json Schema we defined is this (note: this is not currently syntactically correct - this will be updated soon):
components:
schemas:
quantity:
required:
- value
- unit
type: object
description: A value with units
properties:
value:
type: number
format: float
unit:
type: string
modifier:
type: string
enum:
- '>'
- '<'
- '~'
quantityRange:
type: object
properties:
lowerValue:
$ref: #components/schemas/quantity
upperValue:
$ref: #components/schemas/quantity
ic50:
type: object
required:
- value
properties:
ic50:
$ref: #components/schemas/quantity
slope:
type: number
format: float
description: The calculated Hill constant
numReplicates:
type: number
format: int32
description: The number of replicates that generate the value.
rmse:
type: number
format: float
description: Optional RMSE value for the replicates. The units are implicitly the same as those of the value
standardDeviation:
type: number
format: float
description: Optional standard deviation value for the replicates. The units are implicitly the same as those of the value
assay: # very limited description of an assay, primarily as a holder for a set of ic50 values
properties:
protocol:
type: string
description: The description of the assay
assayRange:
$ref: #components/schemas/quantityRange
description: The min and max concentrations for the assay
results:
type: array
items:
$ref: #components/schemas/ic50
With this OpenAPI definition your JSON payloads might look like this:
For a simple quantity
:
{"value": 1.23, "unit": "mg/ml"}
For a quantity
with an optional modifier:
{"value": 1.23, "unit": "mg/ml", "modifier": ">"}
For an array of quantities
:
[{"value": 1.23, "unit": "mg/ml"},{"value": 1.23, "unit": "mg/ml", "modifier": ">"}]
For a quantity range
:
{ "lowerValue": {"value": 1.23, "unit": "mg/ml"}, "upperValue": {"value": 3.45, "unit": "mg/ml"}}
For a minimal ic50
with only a value:
{"ic50": {"value": 1.23, "unit": "mg/ml"}}
For a more complete ic50
:
{
"ic50": {"value": 1.23, "unit": "mg/ml"},
"slope": 0.97,
"numReplicates": 3,
"standardDeviation": 0.57
}
This definition of types such as quantity
and ic50
that allow information to be structured is inherently better than having it in non-structured form as it permits inference of meaning and reduces the chance of misusing data. However most data is only available in unstructured or semi-structured form. A common example might be IC50 data that is present in a CSV file where the value, units and modifier may be present in separate columns, with the association between the columns being inferred by a naming convention, of left to the user to assume.
We want to allow this sort of information to be annotated in such a way that it allows the structure of the data to be automatically inferred. With this it then become possible to convert the data from unstructured form to the structured form.
As an example we have some IC50 data in CSV format. The file is here (link needed), and the first few lines reproduced below:
bla,bla,bla
The challenge: to provide a semantic annotation that allows the unstructured CSV data to be converted to the form defined by the OpenAPI definition described above.
- Review the IC50 model and see how it works for some real data sets.
- Mock up some use cases
- Performing curve fit on assay data to generate IC50s
- Sending IC50 data to application e.g. for generating a predictive model
- Investigate how to annotate this with Json-LD descriptions. Will too much end up being duplicated in the Json schema and the ontology definitions?
- Investigate how this impacts existing APIs such as OpenTox.
- Write up the benefits for service providers
- show how we are using the annotation
- show how it makes their work more FAIR
- Develop (SPARQL) queries that will show the adoption of annotation and summarizes how it is used