Skip to content

Modelling IC50 results

Tim Dudgeon edited this page Dec 18, 2018 · 11 revisions

At the OpenRiskNet Hackathon in Brussels on 13-14 Dec 2018 we undertook an exercise on how to model IC50 assay with Json Schema (with respect to OpenAPI definitions) and how to semantically annotate this using Json-LD. OpenAPI (along with Json schema) provides the structured definition of the data whilst Json-LD is used to add semantic meaning to this payload using ontologies.

Ontology IRIs needed

For:

  • IC50 (in ENM, from BAO)
  • hill model fitting (Hill equation in BAO; added)
  • tcpl (too specific, won't do)

Combined class

Class Axiom

There is now a pull request.

Further reading

Modelling

The idea was to define some key components of an OpenAPI data model for describing IC50 results that can be used by applications. It is not mandatory to use the whole model. For instance, assays are highly variable in nature so it might be more appropriate to create your own custom assay model that uses the model for an IC50 result. Alternatively the quantity and quantityRange schema object can be re-used in many other cases.

The Json Schema we defined is this (note: this is not currently syntactically correct - this will be updated soon):

components:
  schemas:
    quantity:
      required:
      - value
      - unit
      type: object
      description: A value with units
      properties:
        value:
          type: number
          format: float
        unit:
          type: string
        modifier:
          type: string
          enum:
          - '>'
          - '<'
          - '~'

    quantityRange:
      type: object
      properties:
        lowerValue:
          $ref: #components/schemas/quantity
        upperValue:
          $ref: #components/schemas/quantity

    ic50:
      type: object
      required:
      - value
      properties:
        ic50:
          $ref: #components/schemas/quantity
        slope:
          type: number
          format: float
          description: The calculated Hill constant
        numReplicates:
          type: number
          format: int32
          description: The number of replicates that generate the value.
        rmse:
          type: number
          format: float
          description: Optional RMSE value for the replicates. The units are implicitly the same as those of the value 
        standardDeviation:
          type: number
          format: float
          description: Optional standard deviation value for the replicates. The units are implicitly the same as those of the value

     assay: # very limited description of an assay, primarily as a holder for a set of ic50 values
       properties:
         protocol:
           type: string
           description: The description of the assay
         assayRange:
           $ref: #components/schemas/quantityRange
           description: The min and max concentrations for the assay
         results:
           type: array
           items:   
             $ref: #components/schemas/ic50

With this OpenAPI definition your JSON payloads might look like this:

For a simple quantity:

{"value": 1.23, "unit": "mg/ml"}

For a quantity with an optional modifier:

{"value": 1.23, "unit": "mg/ml", "modifier": ">"}

For an array of quantities:

[{"value": 1.23, "unit": "mg/ml"},{"value": 1.23, "unit": "mg/ml", "modifier": ">"}]

For a quantity range:

{ "lowerValue": {"value": 1.23, "unit": "mg/ml"}, "upperValue": {"value": 3.45, "unit": "mg/ml"}}

For a minimal ic50 with only a value:

{"ic50": {"value": 1.23, "unit": "mg/ml"}}

For a more complete ic50:

{
"ic50": {"value": 1.23, "unit": "mg/ml"},
"slope": 0.97,
"numReplicates": 3,
"standardDeviation": 0.57
}

Use cases

Mapping between schemas

This definition of types such as quantity and ic50 that allow information to be structured is inherently better than having it in non-structured form as it permits inference of meaning and reduces the chance of misusing data. However most data is only available in unstructured or semi-structured form. A common example might be IC50 data that is present in a CSV file where the value, units and modifier may be present in separate columns, with the association between the columns being inferred by a naming convention, of left to the user to assume.

We want to allow this sort of information to be annotated in such a way that it allows the structure of the data to be automatically inferred. With this it then become possible to convert the data from unstructured form to the structured form.

As an example we have some IC50 data in CSV format. The file is here (link needed), and the first few lines reproduced below:

bla,bla,bla

The challenge: to provide a semantic annotation that allows the unstructured CSV data to be converted to the form defined by the OpenAPI definition described above.

Actions that remain

  • Review the IC50 model and see how it works for some real data sets.
  • Mock up some use cases
    • Performing curve fit on assay data to generate IC50s
    • Sending IC50 data to application e.g. for generating a predictive model
  • Investigate how to annotate this with Json-LD descriptions. Will too much end up being duplicated in the Json schema and the ontology definitions?
  • Investigate how this impacts existing APIs such as OpenTox.
  • Write up the benefits for service providers
    • show how we are using the annotation
    • show how it makes their work more FAIR
  • Develop (SPARQL) queries that will show the adoption of annotation and summarizes how it is used
Clone this wiki locally