Skip to content

Commit d650906

Browse files
dweindlFFroehlichdilpathfbergmann
committed
Proposal: Different languages for model specification (#538)
# Motivation There are a number of formats for specifying models in systems biology, each with their specific strengths and weaknesses. PEtab version 1.0.0 only allows Systems Biology Markup Language (SBML) models. While SBML is supported by a large number of tools, there are good reasons to use other formats. For example, rule-based model formats (e.g., BioNetGenLanguage) permit more abstract and compact specification of models based on rules, which are generalisations of reactions. Therefore, and based on user request (#436), we propose to lift PEtab’s restriction to SBML models and allow arbitrary model formats. # Proposed changes * Changes to the PEtab YAML file: * Change `sbml_files` to `models` * `models` entries will be model IDs (following the existing conventions for PEtab IDs) mapping to: * `location`: path / URL to the model * `language`: model format Initial set of model format identifiers (to be extended as needed): * SBML: `sbml` * CellML: `cellml` * BNGL: `bngl` * PySB: `pysb` * An additional entry for mapping tables (see below) is added Example: **Before:** ```yaml format_version: 1 parameter_file: parameters.tsv problems: - condition_files: - conditions.tsv measurement_files: - measurements.tsv observable_files: - observables.tsv sbml_files: - model1.xml ``` **After:** ```yaml format_version: 2.0.0 parameter_file: parameters.tsv problems: - condition_files: - conditions.tsv measurement_files: - measurements.tsv observable_files: - observables.tsv mapping_file: mappings.tsv # optional models: id_for_model1: location: model1.xml language: sbml ``` * Changes to the format of existing tables/files: * Condition/Observable/Parameter Table All symbols that previously referenced the ID of SBML entities, such as parameter IDs or compartment IDs, now refer to (globally unique) named entities in the model, such as parameters, observables, expressions. For example, condition table columns may correspond to parameters, states, species of the referenced model. For species, assignments in the condition table set the initial value at the beginning of the simulation for that condition, potentially replacing the initialization from preequilibration. For all other entities, values are statically replaced at all time points. For entities that assign values to other entities, such as SBML AssignmentRules, the value of the target of that rule is statically replaced at all time points. * Additional files * Mapping Table: Mapping PEtab entity IDs to entity IDs in the model. This optional file may be used to reference model entities in PEtab files where the ID in the model would not be a valid identifier in PEtab (e.g., due to containing blanks, dots, or other special characters). The tsv file has two mandatory columns: `petabEntityId`, `modelEntityId`. Additional columns are allowed. modelEntityIds must be unique identifiers in the model. The mapping table must not map modelEntityIds to petabEntityIds that are also defined in any other part of the PEtab problem. modelEntityId may not refer to other petabEntityIds, including those defined in the mapping table. petabEntityIds defined in the mapping table may be referenced in condition, measurement, parameter and observable tables, but cannot be referenced in the model itself. For example, in SBML, local parameters may be referenced as `$reactionId.$localParameterId`, which are not valid PEtab IDs as they contain a `.` character. Similarly, this table may be used to reference specific species in a BGNL model which may contain many unsupported characters such as `,`, `(` or `.`. However, please note that IDs must exactly match the species names in the BNGL generated network file and no pattern matching will be performed. # Implications * Tools need to check the model format and provide an informative message if the given format cannot be handled * Validators will skip model-dependent validation when encountering unknown model types - ideally there would be some plugin mechanisms to provide validation --- Co-authored by @FFroehlich @fbergmann. Also thanks to everybody participating in these discussions during the last COMBINE meeting. --------- Co-authored-by: FFroehlich <[email protected]> Co-authored-by: Dilan Pathirana <[email protected]> Co-authored-by: Frank T. Bergmann <[email protected]>
1 parent fcccbcf commit d650906

File tree

2 files changed

+124
-44
lines changed

2 files changed

+124
-44
lines changed

doc/_static/petab_schema.yaml

Lines changed: 29 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -38,13 +38,26 @@ properties:
3838
files and optional visualization files.
3939
properties:
4040

41-
sbml_files:
42-
type: array
43-
description: List of PEtab SBML files.
44-
45-
items:
46-
type: string
47-
description: PEtab SBML file name or URL.
41+
model_files:
42+
type: object
43+
description: One or multiple models
44+
45+
# the model ID
46+
patternProperties:
47+
"^[a-zA-Z_]\\w*$":
48+
type: object
49+
properties:
50+
location:
51+
type: string
52+
description: Model file name or URL
53+
language:
54+
type: string
55+
description: |
56+
Model language, e.g., 'sbml', 'cellml', 'bngl', 'pysb'
57+
required:
58+
- location
59+
- language
60+
additionalProperties: false
4861

4962
measurement_files:
5063
type: array
@@ -78,8 +91,16 @@ properties:
7891
type: string
7992
description: PEtab visualization file name or URL.
8093

94+
mapping_files:
95+
type: array
96+
description: List of PEtab mapping files.
97+
98+
items:
99+
type: string
100+
description: PEtab mapping file name or URL.
101+
81102
required:
82-
- sbml_files
103+
- model_files
83104
- observable_files
84105
- measurement_files
85106
- condition_files

doc/documentation_data_format.rst

Lines changed: 95 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ PEtab data format specification
22
===============================
33

44

5-
Format version: 1
5+
Format version: 2.0.0
66

77
This document explains the PEtab data format.
88

@@ -41,12 +41,11 @@ Overview
4141
---------
4242

4343
The PEtab data format specifies a parameter estimation problem using a number
44-
of text-based files (`Systems Biology Markup Language (SBML) <http://sbml.org>`_
45-
and
44+
of text-based files (
4645
`Tab-Separated Values (TSV) <https://www.iana.org/assignments/media-types/text/tab-separated-values>`_)
4746
(Figure 2), i.e.
4847

49-
- An SBML model [SBML]
48+
- A model
5049

5150
- A measurement file to fit the model to [TSV]
5251

@@ -67,6 +66,9 @@ and
6766
- (optional) A visualization file, which contains specifications how the data
6867
and/or simulations should be plotted by the visualization routines [TSV]
6968

69+
- (optional) A mapping file, which allows mapping PEtab entity IDs to entity
70+
IDs in the model, which might not have valid PEtab IDs themselves [TSV]
71+
7072
.. figure:: gfx/petab_files.png
7173
:alt: Files constituting a PEtab problem
7274

@@ -91,11 +93,11 @@ problem as such.
9193
- Fields in "[]" are optional and may be left empty.
9294

9395

94-
SBML model definition
95-
---------------------
96-
97-
The model must be specified as valid SBML. There are no further restrictions.
96+
Model definition
97+
----------------
9898

99+
PEtab 2.0.0 is agnostic of specific model formats. A model file is referenced
100+
in the PEtab problem description (YAML) via its file name or a URL.
99101

100102
Condition table
101103
---------------
@@ -107,7 +109,7 @@ different experimental conditions).
107109
This is specified as a tab-separated value file in the following way:
108110

109111
+--------------+------------------+------------------------------------+-----+---------------------------------------+
110-
| conditionId | [conditionName] | parameterOrSpeciesOrCompartmentId1 | ... | parameterOrSpeciesOrCompartmentId${n} |
112+
| conditionId | [conditionName] | modelEntityId1 | ... | modelEntityId${n} |
111113
+==============+==================+====================================+=====+=======================================+
112114
| STRING | [STRING] | NUMERIC\|STRING | ... | NUMERIC\|STRING |
113115
+--------------+------------------+------------------------------------+-----+---------------------------------------+
@@ -140,32 +142,44 @@ Detailed field description
140142
Condition names are arbitrary strings to describe the given condition.
141143
They may be used for reporting or visualization.
142144

143-
- ``${parameterOrSpeciesOrCompartmentId1}``
144-
145-
Further columns may be global parameter IDs, IDs of species or compartments
146-
as defined in the SBML model. Only one column is allowed per ID.
147-
Values for these condition parameters may be provided either as numeric
148-
values, or as IDs defined in the SBML model, the parameter table or both.
149-
150-
- ``${parameterId}``
151-
152-
The values will override any parameter values specified in the model.
153-
154-
- ``${speciesId}``
155-
156-
If a species ID is provided, it is interpreted as the initial
157-
condition of that species (as amount if `hasOnlySubstanceUnits` is set to `True`
158-
for the respective species, as concentration otherwise) and will override the
159-
initial condition given in the SBML model or given by a preequilibration
160-
condition. If no value is provided for a condition, the result of the
161-
preequilibration (or initial condition from the SBML model, if
162-
no preequilibration is defined) is used.
163-
164-
- ``${compartmentId}``
165-
166-
If a compartment ID is provided, it is interpreted as the initial
167-
compartment size.
168-
145+
- ``${modelEntityId}``
146+
147+
Further columns may be the IDs of model entities that have globally unique
148+
IDs, such as parameters, species or compartments defined in the model to set
149+
condition-specific values. Only one column is allowed per ID.
150+
Values for these entities may be provided either as numeric values, or as IDs
151+
of globally unique entity IDs as defined in the model, the mapping table or
152+
the parameter table.
153+
154+
Any non-``NaN`` value will override the original values of the model, or if
155+
preequilibration was used, they will override the value obtained from
156+
preequilibration. A ``NaN`` value indicates that the original value of the
157+
model is to be used (when used in the preequilibration condition, or in the
158+
simulation condition if no preequilibration is used) or that the result of
159+
preequilibration is to be used (when used in the simulation condition after
160+
preequilibration).
161+
162+
The value in the condition table either replaces the initial value or the
163+
value at all timepoints based on whether the model entity has a rate law
164+
assigned or not:
165+
166+
* For model entities that have constant algebraic assignments
167+
(but not necessarily constant values), i.e, that do not have a rate of
168+
change with respect to time assigned and that are not subject to event
169+
assignments, the algebraic assignment is replaced statically at all
170+
timepoints. Examples for such model entities are the targets of SBML
171+
`AssignmentRules`.
172+
173+
* For all other entities, e.g., those that are assigned by SBML `RateRules`,
174+
only the initial value can be assigned in the condition table. If an
175+
assignment of the rate of change with respect to time or event assignment
176+
is desired, the values of model entities that are used to define rate of
177+
change or event assignments must be assigned in the condition table.
178+
If no such model entities exist, assignment is not possible.
179+
180+
If the model has a concept of species and a species ID is provided, its
181+
value is interpreted as amount or concentration in the same way as anywhere
182+
else in the model.
169183

170184
Measurement table
171185
-----------------
@@ -705,6 +719,49 @@ Detailed field description
705719
legend and which defaults to the value in ``datasetId``.
706720

707721

722+
Mapping table
723+
-------------
724+
725+
Mapping PEtab entity IDs to entity IDs in the model. This optional file may be
726+
used to reference model entities in PEtab files where the ID in the model would
727+
not be a valid identifier in PEtab (e.g., due to inclusion of blanks, dots, or
728+
other special characters).
729+
730+
The TSV file has two mandatory columns, ``petabEntityId`` and
731+
``modelEntityId``. Additional columns are allowed.
732+
733+
+---------------+---------------+
734+
| petabEntityId | modelEntityId |
735+
+===============+===============+
736+
| STRING | STRING |
737+
+---------------+---------------+
738+
| reaction1_k1 | reaction1.k1 |
739+
+---------------+---------------+
740+
741+
742+
Detailed field description
743+
~~~~~~~~~~~~~~~~~~~~~~~~~~
744+
745+
- ``petabEntityId`` [STRING, NOT NULL]
746+
747+
A valid PEtab identifier that is not defined in any other part of the PEtab
748+
problem. This identifier may be referenced in condition, measurement,
749+
parameter and observable tables, but cannot be referenced in the model
750+
itself.
751+
752+
- ``modelEntityId`` [STRING, NOT NULL]
753+
754+
A globally unique identifier defined in the model,
755+
*that is not a valid PEtab ID* (see :ref:`identifiers`).
756+
757+
For example, in SBML, local parameters may be referenced as
758+
``$reactionId.$localParameterId``, which are not valid PEtab IDs as they
759+
contain a ``.`` character. Similarly, this table may be used to reference
760+
specific species in a BNGL model that may contain many unsupported
761+
characters such as ``,``, ``(`` or ``.``. However, please note that IDs must
762+
exactly match the species names in the BNGL-generated network file, and no
763+
pattern matching will be performed.
764+
708765
Extensions
709766
~~~~~~~~~~
710767

@@ -743,7 +800,7 @@ Parameter estimation problems combining multiple models
743800
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
744801

745802
Parameter estimation problems can comprise multiple models. For now, PEtab
746-
allows to specify multiple SBML models with corresponding condition and
803+
allows one to specify multiple models with corresponding condition and
747804
measurement tables, and one joint parameter table. This means that the parameter
748805
namespace is global. Therefore, parameters with the same ID in different models
749806
will be considered identical.
@@ -1070,6 +1127,8 @@ float values are demoted to boolean values. For example, in ``1 + true``,
10701127
the expression is interpreted as ``true && true = true``.
10711128

10721129

1130+
.. _identifiers:
1131+
10731132
Identifiers
10741133
-----------
10751134

0 commit comments

Comments
 (0)