Added support for edge_error_scaling

alexandrutomescu · alexandrutomescu · commit 792c96e31957 · 2025-03-30T16:19:22.000+03:00
diff --git a/docs/k-min-path-error.md b/docs/k-min-path-error.md
@@ -31,14 +31,14 @@ This class implements a more general version, as follows:
 
 1. The paths can start/end not only in source/sink nodes, but also in given sets of start/end nodes (set parameters `additional_starts` and `additional_ends`). See also [Additional start/end nodes](additional-start-end-nodes.md).
 2. This class supports adding subpath constraints, that is, lists of edges that must appear in some solution path. See [Subpath constraints](subpath-constraints.md) for details.
-3. The above constraint can happen only over a given subset $E' \subseteq E$ of the edges (set parameter `edges_to_ignore` to be $E \setminus E'$), 
-4. The error (i.e. the above absolute of the difference) of every edge can contribute differently to the objective function, according to a scale factor $\in [0,1]$. Set these via a dictionary that you pass to `edge_error_scaling`, which stores the scale factor $\lambda_{(u,v)} \in [0,1]$ of each edge $(u,v)$ in the dictionary. Setting $\lambda_{(u,v)} = 0$ will add the edge $(u,v)$ to `edges_to_ignore`, because the constraint for $(u,v)$ becomes always true.
+3. The above constraint can happen only over a given subset $E' \subseteq E$ of the edges (set parameter `edges_to_ignore` to be $E \setminus E'$). See also [ignoring edges documentation](ignoring-edges.md).
+4. The error (i.e. the above absolute of the difference) of every edge can contribute differently to the objective function, according to a scale factor $\in [0,1]$. Set these via a dictionary that you pass to `edge_error_scaling`, which stores the scale factor $\lambda_{(u,v)} \in [0,1]$ of each edge $(u,v)$ in the dictionary. Setting $\lambda_{(u,v)} = 0$ will add the edge $(u,v)$ to `edges_to_ignore`, because the constraint for $(u,v)$ becomes always true. See also [ignoring edges documentation](ignoring-edges.md).
 5. Another way to relax the constraint is to allow also some looseness in the slack value, based on the length of the solution path. Intuitively, suppose that longer paths have even higher variance in their weight across the edges of the path, while shorter paths less. Formally, suppose that we have a function $\alpha : \mathbb{N} \rightarrow \mathbb{R}^+$ that for every solution path length $\ell$, it returns a multiplicative factor $\alpha(\ell)$. Then, we can multiply each path slack $\rho_i$ by $\alpha(|P_i|)$ in the constraint of the problem (where $|P_i|$ denotes the length of solution path $P_i$). In the above example, we could set $\alpha(\ell) > 1$ for "large" lengths $\ell$. Note that in this model we keep the same objective function (i.e. sum of slacks), and thus this multiplier has no effect on the objective value. You can pass the function $\alpha$ to the class as a piecewise encoding, via parameters `path_length_ranges` and `path_length_factors`, see [kMinPathError()](k-min-path-error.md#flowpaths.kminpatherror.kMinPathError).
 
 !!! info "Generalized constraint"
     Formally, the constraint generalized as in 3., 4. and 5. above is:
     $$
-    \lambda_{u,v} \cdot \left|f(u,v) - \sum_{i \in \\{1,\dots,k\\} : (u,v) \in P_i }w_i\right| \leq \sum_{i \in \\{1,\dots,k\\} : (u,v) \in P_i }\rho_i \cdot \alpha(|P_i|), ~\forall (u,v) \in E'.
+    \lambda_{(u,v)} \cdot \left|f(u,v) - \sum_{i \in \\{1,\dots,k\\} : (u,v) \in P_i }w_i\right| \leq \sum_{i \in \\{1,\dots,k\\} : (u,v) \in P_i }\rho_i \cdot \alpha(|P_i|), ~\forall (u,v) \in E'.
     $$
 
 !!! warning "A lowerbound on $k$"
diff --git a/docs/minimum-error-flow.md b/docs/minimum-error-flow.md
@@ -1,8 +1,8 @@
 # Minimum Correction of Weights to a Flow
 
-Often, the edge weights of a graph are not a flow (i.e. do not satisfy flow conservation for non- source/sink nodes). While the models [k-Minimum Path Error](k-min-path-error.md) or [k-Least Absolute Errors](k-least-absolute-errors.md) can decompose such graphs, as a less principled approach, one can first minimally correct the graph weights to become a flow, and then optimally decompose the resulting flow flow using the [Minimum Flow Decomposition](minimum-flow-decomposition.md) model. 
+Often, the edge weights of a graph are not a flow (i.e. do not satisfy flow conservation for non- source/sink nodes). While the models [k-Minimum Path Error](k-min-path-error.md) or [k-Least Absolute Errors](k-least-absolute-errors.md) can decompose such graphs, as a less principled approach, one can first minimally correct the graph weights to become a flow, and then optimally decompose the resulting flow using the [Minimum Flow Decomposition](minimum-flow-decomposition.md) model. 
 
-This is faster in practice, because the Minimum Flow Decomposition solver is faster than the ones decomposing graphs without flow conservation. In some sense, we are delegating error correction to a pre-processing step, and then remove the error-resolution when decomposing the resulting graph.
+This is faster in practice, because the Minimum Flow Decomposition solver is faster than the ones decomposing graphs without flow conservation. We are thus delegating error correction to a pre-processing step, and then avoiding the error-handling difficulty when decomposing the resulting graph.
 
 ## 1. Definition
 
@@ -134,13 +134,14 @@ flowchart LR
 This class implements a more general version, as follows:
 
 1. The corrected flow can start/end not only in source/sink nodes, but also in given sets of start/end nodes (set parameters `additional_starts` and `additional_ends`). See also [Additional start/end nodes](additional-start-end-nodes.md).
-2. The error can count only for a given subset $E' \subseteq E$ of the edges (set parameter `edges_to_ignore` to be $E \setminus E'$), 
-3. One can also ensure some "sparsity" in the solution, meaning the total corrected flow exiting the source node is counts also in the minimization function, with a given multiplier $\lambda$ (see ref. [2]). If $\lambda = 0$, this has no effect.
+2. The error can count only for a given subset $E' \subseteq E$ of the edges (set parameter `edges_to_ignore` to be $E \setminus E'$). See also [ignoring edges documentation](ignoring-edges.md).
+3. The error (i.e. the above absolute of the difference) of every edge can contribute differently to the objective function, according to a scale factor $\in [0,1]$. Set these via a dictionary that you pass to `edge_error_scaling`, which stores the scale factor $\lambda_{(u,v)} \in [0,1]$ of each edge $(u,v)$ in the dictionary. Setting $\lambda_{(u,v)} = 0$ will add the edge $(u,v)$ to `edges_to_ignore`, because the constraint for $(u,v)$ becomes always true. See also [ignoring edges documentation](ignoring-edges.md).
+4. One can also ensure some "sparsity" in the solution, meaning the total corrected flow exiting the source node is counts also in the minimization function, with a given multiplier $\lambda$ (see ref. [2]). If $\lambda = 0$, this has no effect.
 
 !!! info "Generalized objective function"
-    Formally, the objective function generalized as in 2. and 3. above is:
+    Formally, the objective function generalized as in 2., 3. and 4. above is:
     $$
-    \sum_{(u,v) \in E'}\Big|f(u,v) - w(u,v)\Big| + \lambda \sum_{(s,v) \in E} f(s,v).
+    \sum_{(u,v) \in E'}\lambda_{(u,v)} \cdot \Big|f(u,v) - w(u,v)\Big| + \lambda \sum_{(s,v) \in E} f(s,v).
     $$
 
 ## 4. References
diff --git a/flowpaths/kminpatherror.py b/flowpaths/kminpatherror.py
@@ -90,7 +90,7 @@ def __init__(
             
             Dictionary `edge: factor` storing the error scale factor (in [0,1]) of every edge, which scale the allowed difference between edge weight and path weights.
             Default is an empty dict. If an edge has a missing error scale factor, it is assumed to be 1. The factors are used to scale the 
-            difference between the flow value of the edge and the sum of the weights of the paths going through the edge.
+            difference between the flow value of the edge and the sum of the weights of the paths going through the edge. See [ignoring edges documentation](ignoring-edges.md)
 
         - `path_length_ranges: list`, optional
             
@@ -114,19 +114,19 @@ def __init__(
 
         - `additional_starts: list`, optional
             
-            List of additional start nodes of the paths. Default is an empty list.
+            List of additional start nodes of the paths. Default is an empty list. See [additional start/end nodes documentation](additional-start-end-nodes.md).
 
         - `additional_ends: list`, optional
             
-            List of additional end nodes of the paths. Default is an empty list.
+            List of additional end nodes of the paths. Default is an empty list. See [additional start/end nodes documentation](additional-start-end-nodes.md).
 
         - `optimization_options: dict`, optional
 
             Dictionary with the optimization options. Default is `None`. See [optimization options documentation](solver-options-optimizations.md).
 
         - `solver_options: dict`, optional
 
-            Dictionary with the solver options. Default is `None`. See [solver options documentation](solver-options-optimizations.md).
+            Dictionary with the solver options. Default is `{}`. See [solver options documentation](solver-options-optimizations.md).
 
         Raises
         ----------
diff --git a/flowpaths/minerrorflow.py b/flowpaths/minerrorflow.py
@@ -12,6 +12,7 @@ def __init__(
             weight_type: type = float,
             sparsity_lambda: float = 0,
             edges_to_ignore: list = [],
+            edge_error_scaling: dict = {},
             additional_starts: list = [],
             additional_ends: list = [],
             solver_options: dict = {},
@@ -33,31 +34,37 @@ def __init__(
 
             The name of the attribute in the edges of the graph that contains the weight of the edge.
 
-        - `weight_type: type`
+        - `weight_type: type`, optional
 
             The type of the weights of the edges. It can be either `int` or `float`. Default is `float`.
 
-        - `sparsity_lambda: float`
+        - `sparsity_lambda: float`, optional
 
             The sparsity parameter. It is used to control the trade-off between the sparsity of the solution and the closeness to the original weights. Default is `0`.
             If `sparsity_lambda` is set to `0`, then the solution will be as close as possible to the original weights. If `sparsity_lambda` is set to a positive value, then the solution will be sparser, i.e. it will have less flow going out of the source.
             The higher the value of `sparsity_lambda`, the sparser the solution will be.
 
-        - `edges_to_ignore: list`
+        - `edges_to_ignore: list`, optional
 
-            A list of edges to ignore. The weights of these edges will still be corrected, but their error will not count in the objective function that is being minimized. Default is `[]`.
+            A list of edges to ignore. The weights of these edges will still be corrected, but their error will not count in the objective function that is being minimized. Default is `[]`. See [ignoring edges documentation](ignoring-edges.md)
 
-        - `additional_starts: list`
+        - `edge_error_scaling: dict`, optional
+            
+            Dictionary `edge: factor` storing the error scale factor (in [0,1]) of every edge, which scale the allowed difference between edge weight and path weights.
+            Default is an empty dict. If an edge has a missing error scale factor, it is assumed to be 1. The factors are used to scale the 
+            difference between the flow value of the edge and the sum of the weights of the paths going through the edge. See [ignoring edges documentation](ignoring-edges.md)
+
+        - `additional_starts: list`, optional
 
-            A list of nodes to be added as additional sources. Flow is allowed to start start at these nodes, meaning that their out going flow can be greater than their incoming flow. Default is `[]`.
+            A list of nodes to be added as additional sources. Flow is allowed to start start at these nodes, meaning that their out going flow can be greater than their incoming flow. Default is `[]`. See also [additional start/end nodes documentation](additional-start-end-nodes.md).
 
-        - `additional_ends: list`
+        - `additional_ends: list`, optional
 
-            A list of nodes to be added as additional sinks. Flow is allowed to end at these nodes, meaning that their incoming flow can be greater than their outgoing flow. Default is `[]`.
+            A list of nodes to be added as additional sinks. Flow is allowed to end at these nodes, meaning that their incoming flow can be greater than their outgoing flow. Default is `[]`. See also [additional start/end nodes documentation](additional-start-end-nodes.md).
 
-        - `solver_options: dict`
+        - `solver_options: dict`, optional
 
-            A dictionary containing the options for the solver. The options are passed to the solver wrapper. Default is `{}`.
+            A dictionary containing the options for the solver. The options are passed to the solver wrapper. Default is `{}`. See [solver options documentation](solver-options-optimizations.md).
         """
         
         self.original_graph_copy = deepcopy(G)
@@ -66,9 +73,18 @@ def __init__(
         if weight_type not in [int, float]:
             raise ValueError(f"weight_type must be either int or float, not {weight_type}")
         self.weight_type = weight_type
+        self.solver_options = solver_options
+
         self.sparsity_lambda = sparsity_lambda
         self.edges_to_ignore = set(edges_to_ignore).union(self.G.source_sink_edges)
-        self.solver_options = solver_options
+        self.edge_error_scaling = edge_error_scaling
+        # Checking that every entry in self.edge_error_scaling is between 0 and 1
+        for key, value in self.edge_error_scaling.items():
+            if value < 0 or value > 1:
+                raise ValueError(f"Edge error scaling factor for edge {key} must be between 0 and 1.")
+            if value == 0:
+                self.edges_to_ignore.add(key)
+        
 
         self.__solution = None
         self.__is_solved = None
@@ -173,7 +189,7 @@ def __encode_objective(self):
         # plus the sparsity of the solution (i.e. sparsity_lambda * sum of the corrected flow going out of the source)
         self.solver.set_objective(
             self.solver.quicksum(
-                self.edge_error_vars[(u, v)]
+                self.edge_error_vars[(u, v)] * self.edge_error_scaling.get((u, v), 1)
                 for (u, v) in self.G.edges()
                 if (u, v) not in self.edges_to_ignore
             ) + self.sparsity_lambda * self.solver.quicksum(
diff --git a/pyproject.toml b/pyproject.toml
@@ -1,6 +1,6 @@
 [project]
 name = "flowpaths"
-version = "0.1.12" 
+version = "0.1.13" 
 description = "A Python package to quickly decompose weighted graphs into weights paths, under various models."
 readme = "README.md"
 authors = [{name="Graph Algorithms and Bioinformatics Group @ University of Helsinki, and external collaborators"}]