Merge pull request #984 from SciML/improve_si_find_ident_funcs_display

TorkelE · web-flow · commit cae7118dbba7 · 2024-07-16T11:37:47.000-05:00
Improve conservation law handling in `find_identifiable_functions`
diff --git a/docs/src/inverse_problems/structural_identifiability.md b/docs/src/inverse_problems/structural_identifiability.md
@@ -21,9 +21,9 @@ StructuralIdentifiability currently assesses identifiability for the first two o
 !!! note
     Currently, the StructuralIdentifiability.jl extension only considers structural identifiability for the ODE generated by the reaction rate equation. It is possible that for the SDE model (generated by the chemical Langevin equation) and the jump model (generated by stochastic chemical kinetics) the identifiability of model quantities is different.
 
-## Global identifiability analysis
+## [Global identifiability analysis](@id structural_identifiability_gi)
 
-### Basic example
+### [Basic example](@id structural_identifiability_gi_example)
 
 Global identifiability can be assessed using the `assess_identifiability` function. For each model quantity (parameters and variables), it will assess whether they are:
 - Globally identifiable.
@@ -48,7 +48,7 @@ assess_identifiability(gwo; measured_quantities = [:M, :P, :E], loglevel = Loggi
 ```
 in which case all species trajectories and parameters become identifiable.
 
-### Indicating known parameters
+### [Indicating known parameters](@id structural_identifiability_gi_known_ps)
 In the previous case we assumed that all parameters are unknown, however, this is not necessarily true. If there are parameters with known values, we can supply these using the `known_p` argument. Providing this additional information might also make other, previously unidentifiable, parameters identifiable. Let us consider the previous example, where we measure the concentration of $M$ only, but now assume we also know the production rate of $E$ ($pₑ$):
 ```@example si1
 assess_identifiability(gwo; measured_quantities = [:M], known_p = [:pₑ], loglevel = Logging.Error)
@@ -57,7 +57,7 @@ Not only does this turn the previously non-identifiable `pₑ` (globally) identi
 
 To, in a similar manner, indicate that certain initial conditions are known is a work in progress. Hopefully this feature should be an available in the near future.
 
-### Providing non-trivial measured quantities
+### [Providing non-trivial measured quantities](@id structural_identifiability_gi_nontrivial_mq)
 Sometimes, ones may not have measurements of species, but rather some combinations of species (or possibly parameters). To account for this, `measured_quantities` accepts any algebraic expression (and not just single species). To form such expressions, species and parameters have to first be `@unpack`'ed from the model. Say that we have a model where an enzyme ($E$) is converted between an active and inactive form, which in turns activates the production of a product, $P$:
 ```@example si1
 rs = @reaction_network begin
@@ -72,30 +72,30 @@ assess_identifiability(rs; measured_quantities = [Eᵢ + Eₐ, :P], loglevel = L
 nothing # hide
 ```
 
-### Assessing identifiability for specified quantities only
+### [Assessing identifiability for specified quantities only](@id structural_identifiability_gi_ftc)
 By default, StructuralIdentifiability assesses identifiability for all parameters and variables. It is, however, possible to designate precisely which quantities you want to check using the `funcs_to_check` option. This both includes selecting a smaller subset of parameters and variables to check, or defining customised expressions. Let us consider the Goodwind from previously, and say that we would like to check whether the production parameters ($pₘ$, $pₑ$, and $pₚ$) and the total amount of the three species ($P + M + E$) are identifiable quantities. Here, we would first unpack these (allowing us to form algebraic expressions) and then use the following code:
 ```@example si1
 @unpack pₘ, pₑ, pₚ, M, E, P = gwo
 assess_identifiability(gwo; measured_quantities = [:M], funcs_to_check = [pₘ, pₑ, pₚ, M + E + P], loglevel = Logging.Error)
 nothing # hide
 ```
 
-### Probability of correctness
+### [Probability of correctness](@id structural_identifiability_gi_probs)
 The identifiability methods used can, in theory, produce erroneous results. However, it is possible to adjust the lower bound for the probability of correctness using the argument `prob_threshold` (by default set to `0.99`, that is, at least a $99\%$ chance of correctness). We can e.g. increase the bound through:
 ```@example si1
 assess_identifiability(gwo; measured_quantities=[:M], prob_threshold = 0.999, loglevel = Logging.Error)
 nothing # hide
 ```
 giving a minimum bound of $99.9\%$ chance of correctness. In practise, the bounds used by StructuralIdentifiability are very conservative, which means that while the minimum guaranteed probability of correctness in the default case is $99\%$, in practise it is much higher. While increasing the value of `prob_threshold` increases the certainty of correctness, it will also increase the time required to assess identifiability.
 
-## Local identifiability analysis
+## [Local identifiability analysis](@id structural_identifiability_lit)
 Local identifiability can be assessed through the `assess_local_identifiability` function. While this is already determined by `assess_identifiability`, assessing local identifiability only has the advantage that it is easier to compute. Hence, there might be models where global identifiability analysis fails (or takes a prohibitively long time), where instead `assess_local_identifiability` can be used. This function takes the same inputs as `assess_identifiability` and returns, for each quantity, `true` if it is locally identifiable (or `false` if it is not). Here, for the Goodwind oscillator, we assesses it for local identifiability only:
 ```@example si1
 assess_local_identifiability(gwo; measured_quantities = [:M], loglevel = Logging.Error)
 ```
 We note that the results are consistent with those produced by `assess_identifiability` (with globally or locally identifiable quantities here all being assessed as at least locally identifiable).
 
-## Finding identifiable functions
+## [Finding identifiable functions](@id structural_identifiability_identifiabile_funcs)
 Finally, StructuralIdentifiability provides the `find_identifiable_functions` function. Rather than determining the identifiability of each parameter and unknown of the model, it finds a set of identifiable functions, such as any other identifiable expression of the model can be generated by these. Let us again consider the Goodwind oscillator, using the `find_identifiable_functions` function we find that identifiability can be reduced to five globally identifiable expressions:
 ```@example si1
 find_identifiable_functions(gwo; measured_quantities = [:M], loglevel = Logging.Error)
@@ -104,7 +104,7 @@ Again, these results are consistent with those produced by `assess_identifiabili
 
 `find_identifiable_functions` tries to simplify its output functions to create nice expressions. The degree to which it does this can be adjusted using the `simplify` keywords. Using the `:weak`, `:standard` (default), and `:strong` arguments, increased simplification can be forced (at the expense of longer runtime).
 
-## Creating StructuralIdentifiability compatible ODE models from Catalyst `ReactionSystem`s
+## [Creating StructuralIdentifiability compatible ODE models from Catalyst `ReactionSystem`s](@id structural_identifiability_si_odes)
 While the functionality described above covers the vast majority of analysis that user might want to perform, the StructuralIdentifiability package supports several additional features. While these does not have inherent Catalyst support, we do provide the `make_si_ode` function to simplify their use. Similar to the previous functions, it takes a `ReactionSystem`, lists of measured quantities, and known parameter values. The output is a [ODE of the standard form supported by StructuralIdentifiability](https://docs.sciml.ai/StructuralIdentifiability/stable/tutorials/creating_ode/#Defining-the-model-using-@ODEmodel-macro). It can be created using the following syntax:
 ```@example si1
 si_ode = make_si_ode(gwo; measured_quantities = [:M])
@@ -116,28 +116,17 @@ print_for_DAISY(si_ode)
 nothing # hide
 ```
 
-## Notes on systems with conservation laws
+## [Notes on systems with conservation laws](@id structural_identifiability_conslaws)
 Several reaction network models, such as
 ```@example si3
 using Catalyst, Logging, StructuralIdentifiability # hide
 rs = @reaction_network begin
     (k1,k2), X1 <--> X2
 end
 ```
-contain conservation laws (in this case $Γ = X1 + X2$, where $Γ = X1(0) + X2(0)$ is a constant). Because the presence of such conservation laws makes structural identifiability analysis prohibitively computationally expensive (for all but the simplest of cases), these are automatically eliminated by Catalyst (removing one ODE from the resulting ODE system for each conservation law). For the `assess_identifiability` and `assess_local_identifiability` functions, this will be unnoticed by the user. However, for the `find_identifiable_functions` and `make_si_ode` functions, this may result in one, or several, parameters of the form `Γ[i]` (where `i` is an integer) appearing in the produced expressions. These correspond to the conservation law constants and can be found through
-```@example si3
-conservedequations(rs)
-```
-E.g. if you run:
-```@example si3
-find_identifiable_functions(rs; measured_quantities = [:X1, :X2])
-```
-we see that `Γ[1]` (`= X1(0) + X2(0)`) is detected as an identifiable expression. If we want to disable this feature for any function, we can use the `remove_conserved = false` option:
-```@example si3
-find_identifiable_functions(rs; measured_quantities = [:X1, :X2], remove_conserved = false)
-```
+contain [conservation laws](@ref network_analysis_deficiency) (in this case $Γ = X1 + X2$, where $Γ = X1(0) + X2(0)$ is a constant). Because the presence of such conservation laws makes structural identifiability analysis prohibitively computationally expensive (for all but the simplest of cases), these are automatically eliminated by Catalyst. This is handled internally, and should not be noticeable to the user. The exception is the `make_si_ode` function. For each conservation law, its output will have one ODE removed, and instead have a conservation parameter (of the form `Γ[i]`) added to its equations. This feature can be disabled through the `remove_conserved = false` option.
 
-## Systems with exponent parameters
+## [Systems with exponent parameters](@id structural_identifiability_exp_params)
 Structural identifiability cannot currently be applied to systems with parameters (or species) in exponents. E.g. this
 ```julia
 rn = @reaction_network begin
diff --git a/ext/CatalystStructuralIdentifiabilityExtension/structural_identifiability_extension.jl b/ext/CatalystStructuralIdentifiabilityExtension/structural_identifiability_extension.jl
@@ -30,7 +30,7 @@ function Catalyst.make_si_ode(rs::ReactionSystem; measured_quantities = [], know
         ignore_no_measured_warn = false, remove_conserved = true)
     # Creates a MTK ODESystem, and a list of measured quantities (there are equations).
     # Gives these to SI to create an SI ode model of its preferred form.
-    osys, conseqs, _ = make_osys(rs; remove_conserved)
+    osys, conseqs, _, _ = make_osys(rs; remove_conserved)
     measured_quantities = make_measured_quantities(rs, measured_quantities, known_p,
         conseqs; ignore_no_measured_warn)
     return SI.mtk_to_si(osys, measured_quantities)[1]
@@ -67,15 +67,15 @@ function SI.assess_local_identifiability(rs::ReactionSystem, args...;
         measured_quantities = [], known_p = [], funcs_to_check = Vector(),
         remove_conserved = true, ignore_no_measured_warn = false, kwargs...)
     # Creates a ODESystem, list of measured quantities, and functions to check, of SI's preferred form.
-    osys, conseqs, vars = make_osys(rs; remove_conserved)
+    osys, conseqs, consconsts, vars = make_osys(rs; remove_conserved)
     measured_quantities = make_measured_quantities(rs, measured_quantities, known_p,
         conseqs; ignore_no_measured_warn)
     funcs_to_check = make_ftc(funcs_to_check, conseqs, vars)
 
     # Computes identifiability and converts it to a easy to read form.
     out = SI.assess_local_identifiability(osys, args...; measured_quantities,
         funcs_to_check, kwargs...)
-    return make_output(out, funcs_to_check, reverse.(conseqs))
+    return make_output(out, funcs_to_check, consconsts)
 end
 
 """
@@ -107,15 +107,15 @@ function SI.assess_identifiability(rs::ReactionSystem, args...;
         measured_quantities = [], known_p = [], funcs_to_check = Vector(),
         remove_conserved = true, ignore_no_measured_warn = false, kwargs...)
     # Creates a ODESystem, list of measured quantities, and functions to check, of SI's preferred form.
-    osys, conseqs, vars = make_osys(rs; remove_conserved)
+    osys, conseqs, consconsts, vars = make_osys(rs; remove_conserved)
     measured_quantities = make_measured_quantities(rs, measured_quantities, known_p,
         conseqs; ignore_no_measured_warn)
     funcs_to_check = make_ftc(funcs_to_check, conseqs, vars)
 
     # Computes identifiability and converts it to a easy to read form.
     out = SI.assess_identifiability(osys, args...; measured_quantities,
         funcs_to_check, kwargs...)
-    return make_output(out, funcs_to_check, reverse.(conseqs))
+    return make_output(out, funcs_to_check, consconsts)
 end
 
 """
@@ -147,13 +147,13 @@ function SI.find_identifiable_functions(rs::ReactionSystem, args...;
         measured_quantities = [], known_p = [], remove_conserved = true,
         ignore_no_measured_warn = false, kwargs...)
     # Creates a ODESystem, and list of measured quantities, of SI's preferred form.
-    osys, conseqs = make_osys(rs; remove_conserved)
+    osys, conseqs, consconsts, _ = make_osys(rs; remove_conserved)
     measured_quantities = make_measured_quantities(rs, measured_quantities, known_p,
         conseqs; ignore_no_measured_warn)
 
     # Computes identifiable functions and converts it to a easy to read form.
     out = SI.find_identifiable_functions(osys, args...; measured_quantities, kwargs...)
-    return vector_subs(out, reverse.(conseqs))
+    return vector_subs(out, consconsts)
 end
 
 ### Helper Functions ###
@@ -172,22 +172,24 @@ function make_osys(rs::ReactionSystem; remove_conserved = true)
 
     # Computes equations for system conservation laws.
     # If there are no conserved equations, the `conseqs` variable must still have the `Vector{Pair{Any, Any}}` type.
-    if !remove_conserved
-        conseqs = Vector{Pair{Any, Any}}[]
-    else
+    if remove_conserved
         conseqs = [ceq.lhs => ceq.rhs for ceq in conservedequations(rs)]
+        consconsts = [cconst.lhs => cconst.rhs for cconst in conservationlaw_constants(rs)]
         isempty(conseqs) && (conseqs = Vector{Pair{Any, Any}}[])
+        isempty(consconsts) && (consconsts = Vector{Pair{Any, Any}}[])
+    else
+        conseqs = Vector{Pair{Any, Any}}[]
+        consconsts = Vector{Pair{Any, Any}}[]
     end
 
-    return osys, conseqs, vars
+    return osys, conseqs, consconsts, vars
 end
 
 # Creates a list of measured quantities of a form that SI can read.
 # Each measured quantity must have a form like:
 # `obs_var ~ X` # (Here, `obs_var` is a variable, and X is whatever we can measure).
-function make_measured_quantities(
-        rs::ReactionSystem, measured_quantities::Vector{T}, known_p::Vector{S},
-        conseqs; ignore_no_measured_warn = false) where {T, S}
+function make_measured_quantities(rs::ReactionSystem, measured_quantities::Vector{T},
+        known_p::Vector{S}, conseqs; ignore_no_measured_warn = false) where {T, S}
     # Warning if the user didn't give any measured quantities.
     if !ignore_no_measured_warn && isempty(measured_quantities)
         @warn "No measured quantity provided to the `measured_quantities` argument, any further identifiability analysis will likely fail. You can disable this warning by setting `ignore_no_measured_warn = true`."
@@ -218,9 +220,9 @@ end
 # Processes the outputs to a better form.
 # Replaces conservation law equations back in the output (so that e.g. Γ are not displayed).
 # Sorts the output according to their input order (defaults to the `[unknowns; parameters]` order).
-function make_output(out, funcs_to_check, conseqs)
-    funcs_to_check = vector_subs(funcs_to_check, conseqs)
-    out = OrderedDict(zip(vector_subs(keys(out), conseqs), values(out)))
+function make_output(out, funcs_to_check, consconsts)
+    funcs_to_check = vector_subs(funcs_to_check, consconsts)
+    out = OrderedDict(zip(vector_subs(keys(out), consconsts), values(out)))
     sortdict = Dict(ftc => i for (i, ftc) in enumerate(funcs_to_check))
     return sort!(out; by = x -> sortdict[x])
 end