Skip to content

Commit 792c96e

Browse files
Added support for edge_error_scaling
1 parent eaf1a4a commit 792c96e

File tree

5 files changed

+43
-26
lines changed

5 files changed

+43
-26
lines changed

docs/k-min-path-error.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -31,14 +31,14 @@ This class implements a more general version, as follows:
3131

3232
1. The paths can start/end not only in source/sink nodes, but also in given sets of start/end nodes (set parameters `additional_starts` and `additional_ends`). See also [Additional start/end nodes](additional-start-end-nodes.md).
3333
2. This class supports adding subpath constraints, that is, lists of edges that must appear in some solution path. See [Subpath constraints](subpath-constraints.md) for details.
34-
3. The above constraint can happen only over a given subset $E' \subseteq E$ of the edges (set parameter `edges_to_ignore` to be $E \setminus E'$),
35-
4. The error (i.e. the above absolute of the difference) of every edge can contribute differently to the objective function, according to a scale factor $\in [0,1]$. Set these via a dictionary that you pass to `edge_error_scaling`, which stores the scale factor $\lambda_{(u,v)} \in [0,1]$ of each edge $(u,v)$ in the dictionary. Setting $\lambda_{(u,v)} = 0$ will add the edge $(u,v)$ to `edges_to_ignore`, because the constraint for $(u,v)$ becomes always true.
34+
3. The above constraint can happen only over a given subset $E' \subseteq E$ of the edges (set parameter `edges_to_ignore` to be $E \setminus E'$). See also [ignoring edges documentation](ignoring-edges.md).
35+
4. The error (i.e. the above absolute of the difference) of every edge can contribute differently to the objective function, according to a scale factor $\in [0,1]$. Set these via a dictionary that you pass to `edge_error_scaling`, which stores the scale factor $\lambda_{(u,v)} \in [0,1]$ of each edge $(u,v)$ in the dictionary. Setting $\lambda_{(u,v)} = 0$ will add the edge $(u,v)$ to `edges_to_ignore`, because the constraint for $(u,v)$ becomes always true. See also [ignoring edges documentation](ignoring-edges.md).
3636
5. Another way to relax the constraint is to allow also some looseness in the slack value, based on the length of the solution path. Intuitively, suppose that longer paths have even higher variance in their weight across the edges of the path, while shorter paths less. Formally, suppose that we have a function $\alpha : \mathbb{N} \rightarrow \mathbb{R}^+$ that for every solution path length $\ell$, it returns a multiplicative factor $\alpha(\ell)$. Then, we can multiply each path slack $\rho_i$ by $\alpha(|P_i|)$ in the constraint of the problem (where $|P_i|$ denotes the length of solution path $P_i$). In the above example, we could set $\alpha(\ell) > 1$ for "large" lengths $\ell$. Note that in this model we keep the same objective function (i.e. sum of slacks), and thus this multiplier has no effect on the objective value. You can pass the function $\alpha$ to the class as a piecewise encoding, via parameters `path_length_ranges` and `path_length_factors`, see [kMinPathError()](k-min-path-error.md#flowpaths.kminpatherror.kMinPathError).
3737

3838
!!! info "Generalized constraint"
3939
Formally, the constraint generalized as in 3., 4. and 5. above is:
4040
$$
41-
\lambda_{u,v} \cdot \left|f(u,v) - \sum_{i \in \\{1,\dots,k\\} : (u,v) \in P_i }w_i\right| \leq \sum_{i \in \\{1,\dots,k\\} : (u,v) \in P_i }\rho_i \cdot \alpha(|P_i|), ~\forall (u,v) \in E'.
41+
\lambda_{(u,v)} \cdot \left|f(u,v) - \sum_{i \in \\{1,\dots,k\\} : (u,v) \in P_i }w_i\right| \leq \sum_{i \in \\{1,\dots,k\\} : (u,v) \in P_i }\rho_i \cdot \alpha(|P_i|), ~\forall (u,v) \in E'.
4242
$$
4343

4444
!!! warning "A lowerbound on $k$"

docs/minimum-error-flow.md

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
# Minimum Correction of Weights to a Flow
22

3-
Often, the edge weights of a graph are not a flow (i.e. do not satisfy flow conservation for non- source/sink nodes). While the models [k-Minimum Path Error](k-min-path-error.md) or [k-Least Absolute Errors](k-least-absolute-errors.md) can decompose such graphs, as a less principled approach, one can first minimally correct the graph weights to become a flow, and then optimally decompose the resulting flow flow using the [Minimum Flow Decomposition](minimum-flow-decomposition.md) model.
3+
Often, the edge weights of a graph are not a flow (i.e. do not satisfy flow conservation for non- source/sink nodes). While the models [k-Minimum Path Error](k-min-path-error.md) or [k-Least Absolute Errors](k-least-absolute-errors.md) can decompose such graphs, as a less principled approach, one can first minimally correct the graph weights to become a flow, and then optimally decompose the resulting flow using the [Minimum Flow Decomposition](minimum-flow-decomposition.md) model.
44

5-
This is faster in practice, because the Minimum Flow Decomposition solver is faster than the ones decomposing graphs without flow conservation. In some sense, we are delegating error correction to a pre-processing step, and then remove the error-resolution when decomposing the resulting graph.
5+
This is faster in practice, because the Minimum Flow Decomposition solver is faster than the ones decomposing graphs without flow conservation. We are thus delegating error correction to a pre-processing step, and then avoiding the error-handling difficulty when decomposing the resulting graph.
66

77
## 1. Definition
88

@@ -134,13 +134,14 @@ flowchart LR
134134
This class implements a more general version, as follows:
135135

136136
1. The corrected flow can start/end not only in source/sink nodes, but also in given sets of start/end nodes (set parameters `additional_starts` and `additional_ends`). See also [Additional start/end nodes](additional-start-end-nodes.md).
137-
2. The error can count only for a given subset $E' \subseteq E$ of the edges (set parameter `edges_to_ignore` to be $E \setminus E'$),
138-
3. One can also ensure some "sparsity" in the solution, meaning the total corrected flow exiting the source node is counts also in the minimization function, with a given multiplier $\lambda$ (see ref. [2]). If $\lambda = 0$, this has no effect.
137+
2. The error can count only for a given subset $E' \subseteq E$ of the edges (set parameter `edges_to_ignore` to be $E \setminus E'$). See also [ignoring edges documentation](ignoring-edges.md).
138+
3. The error (i.e. the above absolute of the difference) of every edge can contribute differently to the objective function, according to a scale factor $\in [0,1]$. Set these via a dictionary that you pass to `edge_error_scaling`, which stores the scale factor $\lambda_{(u,v)} \in [0,1]$ of each edge $(u,v)$ in the dictionary. Setting $\lambda_{(u,v)} = 0$ will add the edge $(u,v)$ to `edges_to_ignore`, because the constraint for $(u,v)$ becomes always true. See also [ignoring edges documentation](ignoring-edges.md).
139+
4. One can also ensure some "sparsity" in the solution, meaning the total corrected flow exiting the source node is counts also in the minimization function, with a given multiplier $\lambda$ (see ref. [2]). If $\lambda = 0$, this has no effect.
139140

140141
!!! info "Generalized objective function"
141-
Formally, the objective function generalized as in 2. and 3. above is:
142+
Formally, the objective function generalized as in 2., 3. and 4. above is:
142143
$$
143-
\sum_{(u,v) \in E'}\Big|f(u,v) - w(u,v)\Big| + \lambda \sum_{(s,v) \in E} f(s,v).
144+
\sum_{(u,v) \in E'}\lambda_{(u,v)} \cdot \Big|f(u,v) - w(u,v)\Big| + \lambda \sum_{(s,v) \in E} f(s,v).
144145
$$
145146

146147
## 4. References

flowpaths/kminpatherror.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -90,7 +90,7 @@ def __init__(
9090
9191
Dictionary `edge: factor` storing the error scale factor (in [0,1]) of every edge, which scale the allowed difference between edge weight and path weights.
9292
Default is an empty dict. If an edge has a missing error scale factor, it is assumed to be 1. The factors are used to scale the
93-
difference between the flow value of the edge and the sum of the weights of the paths going through the edge.
93+
difference between the flow value of the edge and the sum of the weights of the paths going through the edge. See [ignoring edges documentation](ignoring-edges.md)
9494
9595
- `path_length_ranges: list`, optional
9696
@@ -114,19 +114,19 @@ def __init__(
114114
115115
- `additional_starts: list`, optional
116116
117-
List of additional start nodes of the paths. Default is an empty list.
117+
List of additional start nodes of the paths. Default is an empty list. See [additional start/end nodes documentation](additional-start-end-nodes.md).
118118
119119
- `additional_ends: list`, optional
120120
121-
List of additional end nodes of the paths. Default is an empty list.
121+
List of additional end nodes of the paths. Default is an empty list. See [additional start/end nodes documentation](additional-start-end-nodes.md).
122122
123123
- `optimization_options: dict`, optional
124124
125125
Dictionary with the optimization options. Default is `None`. See [optimization options documentation](solver-options-optimizations.md).
126126
127127
- `solver_options: dict`, optional
128128
129-
Dictionary with the solver options. Default is `None`. See [solver options documentation](solver-options-optimizations.md).
129+
Dictionary with the solver options. Default is `{}`. See [solver options documentation](solver-options-optimizations.md).
130130
131131
Raises
132132
----------

flowpaths/minerrorflow.py

Lines changed: 28 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ def __init__(
1212
weight_type: type = float,
1313
sparsity_lambda: float = 0,
1414
edges_to_ignore: list = [],
15+
edge_error_scaling: dict = {},
1516
additional_starts: list = [],
1617
additional_ends: list = [],
1718
solver_options: dict = {},
@@ -33,31 +34,37 @@ def __init__(
3334
3435
The name of the attribute in the edges of the graph that contains the weight of the edge.
3536
36-
- `weight_type: type`
37+
- `weight_type: type`, optional
3738
3839
The type of the weights of the edges. It can be either `int` or `float`. Default is `float`.
3940
40-
- `sparsity_lambda: float`
41+
- `sparsity_lambda: float`, optional
4142
4243
The sparsity parameter. It is used to control the trade-off between the sparsity of the solution and the closeness to the original weights. Default is `0`.
4344
If `sparsity_lambda` is set to `0`, then the solution will be as close as possible to the original weights. If `sparsity_lambda` is set to a positive value, then the solution will be sparser, i.e. it will have less flow going out of the source.
4445
The higher the value of `sparsity_lambda`, the sparser the solution will be.
4546
46-
- `edges_to_ignore: list`
47+
- `edges_to_ignore: list`, optional
4748
48-
A list of edges to ignore. The weights of these edges will still be corrected, but their error will not count in the objective function that is being minimized. Default is `[]`.
49+
A list of edges to ignore. The weights of these edges will still be corrected, but their error will not count in the objective function that is being minimized. Default is `[]`. See [ignoring edges documentation](ignoring-edges.md)
4950
50-
- `additional_starts: list`
51+
- `edge_error_scaling: dict`, optional
52+
53+
Dictionary `edge: factor` storing the error scale factor (in [0,1]) of every edge, which scale the allowed difference between edge weight and path weights.
54+
Default is an empty dict. If an edge has a missing error scale factor, it is assumed to be 1. The factors are used to scale the
55+
difference between the flow value of the edge and the sum of the weights of the paths going through the edge. See [ignoring edges documentation](ignoring-edges.md)
56+
57+
- `additional_starts: list`, optional
5158
52-
A list of nodes to be added as additional sources. Flow is allowed to start start at these nodes, meaning that their out going flow can be greater than their incoming flow. Default is `[]`.
59+
A list of nodes to be added as additional sources. Flow is allowed to start start at these nodes, meaning that their out going flow can be greater than their incoming flow. Default is `[]`. See also [additional start/end nodes documentation](additional-start-end-nodes.md).
5360
54-
- `additional_ends: list`
61+
- `additional_ends: list`, optional
5562
56-
A list of nodes to be added as additional sinks. Flow is allowed to end at these nodes, meaning that their incoming flow can be greater than their outgoing flow. Default is `[]`.
63+
A list of nodes to be added as additional sinks. Flow is allowed to end at these nodes, meaning that their incoming flow can be greater than their outgoing flow. Default is `[]`. See also [additional start/end nodes documentation](additional-start-end-nodes.md).
5764
58-
- `solver_options: dict`
65+
- `solver_options: dict`, optional
5966
60-
A dictionary containing the options for the solver. The options are passed to the solver wrapper. Default is `{}`.
67+
A dictionary containing the options for the solver. The options are passed to the solver wrapper. Default is `{}`. See [solver options documentation](solver-options-optimizations.md).
6168
"""
6269

6370
self.original_graph_copy = deepcopy(G)
@@ -66,9 +73,18 @@ def __init__(
6673
if weight_type not in [int, float]:
6774
raise ValueError(f"weight_type must be either int or float, not {weight_type}")
6875
self.weight_type = weight_type
76+
self.solver_options = solver_options
77+
6978
self.sparsity_lambda = sparsity_lambda
7079
self.edges_to_ignore = set(edges_to_ignore).union(self.G.source_sink_edges)
71-
self.solver_options = solver_options
80+
self.edge_error_scaling = edge_error_scaling
81+
# Checking that every entry in self.edge_error_scaling is between 0 and 1
82+
for key, value in self.edge_error_scaling.items():
83+
if value < 0 or value > 1:
84+
raise ValueError(f"Edge error scaling factor for edge {key} must be between 0 and 1.")
85+
if value == 0:
86+
self.edges_to_ignore.add(key)
87+
7288

7389
self.__solution = None
7490
self.__is_solved = None
@@ -173,7 +189,7 @@ def __encode_objective(self):
173189
# plus the sparsity of the solution (i.e. sparsity_lambda * sum of the corrected flow going out of the source)
174190
self.solver.set_objective(
175191
self.solver.quicksum(
176-
self.edge_error_vars[(u, v)]
192+
self.edge_error_vars[(u, v)] * self.edge_error_scaling.get((u, v), 1)
177193
for (u, v) in self.G.edges()
178194
if (u, v) not in self.edges_to_ignore
179195
) + self.sparsity_lambda * self.solver.quicksum(

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[project]
22
name = "flowpaths"
3-
version = "0.1.12"
3+
version = "0.1.13"
44
description = "A Python package to quickly decompose weighted graphs into weights paths, under various models."
55
readme = "README.md"
66
authors = [{name="Graph Algorithms and Bioinformatics Group @ University of Helsinki, and external collaborators"}]

0 commit comments

Comments
 (0)