Skip to content

Commit a54928b

Browse files
author
Marc TEYSSIER
committed
Merge pull request #2 from moufmouf/1.0
Working on ambiguity detection
2 parents 56d1952 + bbbd38d commit a54928b

8 files changed

+792
-18
lines changed

README.md

Lines changed: 46 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@ It will return the list of foreign keys it used to link the 2 tables.
5555
Internals:
5656

5757
- Each foreign key has a *cost* of 1
58-
- Junction tables have a cost of 1.5, instead of 2 (one for each foreign key)
58+
- Junction tables have a *cost* of 1.5, instead of 2 (one for each foreign key)
5959

6060
```php
6161
// $conn is the DBAL connection.
@@ -66,4 +66,48 @@ $fks = $schemaAnalyzer->getShortestPath("users", "rights");
6666
// This will return an array of Doctrine\DBAL\Schema\ForeignKeyConstraint objects
6767
```
6868

69-
// TODO: Ambiguity exception!
69+
<div class="alert alert-info"><strong>Heads up!</strong> The shortest path is based on the <em>cost</em> of the
70+
foreign keys. It is perfectly possible to have several shortest paths (if several paths have the same total cost).
71+
If there are several shortest paths, rather than choosing one path amongst the others, SchemaAnalyzer will throw
72+
a <code>ShortestPathAmbiguityException</code>. The exception message details all the possible shortest
73+
paths.</div>
74+
75+
## Caching results
76+
77+
Analyzing the full data model and looking for shortest paths can take a long time. For anything that should run
78+
in a production environment, it is recommended to cache the result. `SchemaAnalyzer` can be passed a Doctrine cache,
79+
along a cache prefix. The cache prefix is a string that will be used to prefix all cache keys. It is useful to
80+
avoid cache collisions between several databases.
81+
82+
Usage:
83+
84+
```php
85+
// $conn is the DBAL connection.
86+
// Let's use the ApcCache (or any other Doctrine cache...)
87+
$cache = new ApcCache();
88+
$schemaAnalyzer = new SchemaAnalyzer($conn->getSchemaManager(), $cache, "my_prefix");
89+
```
90+
91+
## Changing the cost of the foreign keys to alter the shortest path
92+
93+
If you are facing an ambiguity exception or if the shortest path simply does not suit you, you can alter the
94+
cost of the foreign keys.
95+
96+
```php
97+
$schemaAnalyzer->setForeignKeyCost($tableName, $columnName, $cost);
98+
```
99+
100+
The `$cost` can be any number. Remember that the default cost for a foreign key is **1**.
101+
102+
SchemaAnalyzer comes with a set of default constants to help you work with costs:
103+
104+
- `SchemaAnalyzer::WEIGHT_IMPORTANT` (0.75) for foreign keys that should be followed in priority
105+
- `SchemaAnalyzer::WEIGHT_IRRELEVANT` (2) for foreign keys that should be generally avoided
106+
- `SchemaAnalyzer::WEIGHT_IGNORE` (Infinity) for foreign keys that should never be used as part of the shortest path
107+
108+
Another option is to add a cost modifier to a table. This will alter the cost of all foreign keys pointing to or
109+
originating from this table.
110+
111+
```php
112+
$schemaAnalyzer->setTableCostModifier($tableName, $cost);
113+
```

src/MultiDijkstra.php

Lines changed: 200 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,200 @@
1+
<?php
2+
3+
namespace Mouf\Database\SchemaAnalyzer;
4+
5+
use Fhaculty\Graph\Edge;
6+
use Fhaculty\Graph\Exception\UnexpectedValueException;
7+
use Fhaculty\Graph\Vertex;
8+
use SplPriorityQueue;
9+
10+
/**
11+
* Dijkstra's shortest path algorithm modified to measure all possible shortest paths.
12+
*/
13+
class MultiDijkstra
14+
{
15+
/**
16+
* Get all edges on shortest path for this vertex.
17+
*
18+
* @throws UnexpectedValueException when encountering an Edge with negative weight
19+
* @throws MultiDijkstraNoPathException
20+
*
21+
* @return array<string, Vertex[]> where key is the destination vertex name and value is an array of possible origin vertex
22+
*/
23+
public static function findShortestPaths(Vertex $startVertex, Vertex $endVertex)
24+
{
25+
$totalCostOfCheapestPathTo = [];
26+
// start node distance
27+
$totalCostOfCheapestPathTo[$startVertex->getId()] = 0;
28+
29+
$endVertexId = $endVertex->getId();
30+
31+
// just to get the cheapest vertex in the correct order
32+
$cheapestVertex = new SplPriorityQueue();
33+
$cheapestVertex->setExtractFlags(SplPriorityQueue::EXTR_BOTH);
34+
$cheapestVertex->insert($startVertex, 0);
35+
36+
// predecessors
37+
$predecesEdgeOfCheapestPathTo = [];
38+
39+
// mark vertices when their cheapest path has been found
40+
$usedVertices = [$startVertex->getId() => true];
41+
42+
$isFirst = true;
43+
44+
// Repeat until all vertices have been marked
45+
$totalCountOfVertices = count($startVertex->getGraph()->getVertices());
46+
for ($i = 0; $i < $totalCountOfVertices; ++$i) {
47+
$currentVertex = null;
48+
$currentVertexId = null;
49+
$isEmpty = false;
50+
if ($isFirst) {
51+
$currentVertex = $startVertex;
52+
$currentCost = 0;
53+
$currentVertexId = $currentVertex->getId();
54+
} else {
55+
do {
56+
// if the priority queue is empty there are isolated vertices, but the algorithm visited all other vertices
57+
if ($cheapestVertex->isEmpty()) {
58+
$isEmpty = true;
59+
break;
60+
}
61+
// Get cheapest unmarked vertex
62+
$cheapestResult = $cheapestVertex->extract();
63+
$currentVertex = $cheapestResult['data'];
64+
$currentCost = $cheapestResult['priority'];
65+
$currentVertexId = $currentVertex->getId();
66+
// Vertices can be in the priority queue multiple times, with different path costs (if vertex is already marked, this is an old unvalid entry)
67+
} while (isset($usedVertices[$currentVertexId]));
68+
}
69+
70+
// Check premature end condition
71+
// If the end vertex is marked as done and the next lowest possible weight is bigger than end vertix,
72+
// we are done processing.
73+
74+
75+
// catch "algorithm ends" condition
76+
if ($isEmpty) {
77+
break;
78+
}
79+
80+
if ($isFirst) {
81+
$isFirst = false;
82+
}
83+
84+
// mark this vertex
85+
$usedVertices[$currentVertexId] = true;
86+
87+
if (isset($usedVertices[$endVertexId]) && $totalCostOfCheapestPathTo[$endVertexId] < -$currentCost) {
88+
break;
89+
}
90+
91+
// check for all edges of current vertex if there is a cheaper path (or IN OTHER WORDS: Add reachable nodes from currently added node and refresh the current possible distances)
92+
foreach ($currentVertex->getEdgesOut() as $edge) {
93+
$weight = $edge->getWeight();
94+
if ($weight < 0) {
95+
throw new UnexpectedValueException('Dijkstra not supported for negative weights - Consider using MooreBellmanFord');
96+
}
97+
98+
$targetVertex = $edge->getVertexToFrom($currentVertex);
99+
$targetVertexId = $targetVertex->getId();
100+
101+
// if the targetVertex is marked, the cheapest path for this vertex has already been found (no negative edges) {
102+
if (!isset($usedVertices[$targetVertexId])) {
103+
// calculate new cost to vertex
104+
$newCostsToTargetVertex = $totalCostOfCheapestPathTo[$currentVertexId] + $weight;
105+
106+
if ((!isset($predecesEdgeOfCheapestPathTo[$targetVertexId]))
107+
// is the new path cheaper?
108+
|| $totalCostOfCheapestPathTo[$targetVertexId] > $newCostsToTargetVertex) {
109+
110+
// Not an update, just a new insert with lower cost
111+
$cheapestVertex->insert($targetVertex, -$newCostsToTargetVertex);
112+
// so the lowest cost will be extracted first
113+
// and higher cost will be skipped during extraction
114+
115+
// update/set costs found with the new connection
116+
$totalCostOfCheapestPathTo[$targetVertexId] = $newCostsToTargetVertex;
117+
// update/set predecessor vertex from the new connection
118+
$predecesEdgeOfCheapestPathTo[$targetVertexId] = [$edge];
119+
} elseif ($totalCostOfCheapestPathTo[$targetVertexId] == $newCostsToTargetVertex) {
120+
// Same length paths. We need to add the predecessor to the list of possible predecessors.
121+
$predecesEdgeOfCheapestPathTo[$targetVertexId][] = $edge;
122+
}
123+
}
124+
}
125+
}
126+
127+
if (!isset($totalCostOfCheapestPathTo[$endVertexId])) {
128+
throw new MultiDijkstraNoPathException("No path found between vertex '".$startVertex->getId()."' and vertex '".$endVertex->getId()."'");
129+
}
130+
131+
// algorithm is done, return resulting edges
132+
return $predecesEdgeOfCheapestPathTo;
133+
}
134+
135+
/**
136+
* @param array<string, Vertex[]> $predecesEdgesArray key is the destination vertex name and value is an array of possible origin vertex
137+
*
138+
* @return Edge\Base[]
139+
*/
140+
public static function getCheapestPathFromPredecesArray(Vertex $startVertex, Vertex $endVertex, array $predecesEdgesArray)
141+
{
142+
$edges = [];
143+
$currentVertex = $endVertex;
144+
while ($currentVertex !== $startVertex) {
145+
$predecessorEdges = $predecesEdgesArray[$currentVertex->getId()];
146+
if (count($predecessorEdges) > 1) {
147+
throw new MultiDijkstraAmbiguityException("There are many possible shortest paths to link vertex '".$startVertex->getId()."' to '".$endVertex->getId()."'");
148+
}
149+
/* @var $edge \Fhaculty\Graph\Edge\Base */
150+
$edge = $predecessorEdges[0];
151+
$edges[] = $edge;
152+
if ($currentVertex === $edge->getVerticesStart()->getVertexFirst()) {
153+
$currentVertex = $edge->getVerticesTarget()->getVertexFirst();
154+
} else {
155+
$currentVertex = $edge->getVerticesStart()->getVertexFirst();
156+
}
157+
}
158+
159+
return array_reverse($edges);
160+
}
161+
162+
/**
163+
* @param Vertex $startVertex
164+
* @param Vertex $endVertex
165+
* @param array $predecesEdgesArray
166+
*
167+
* @return Edge\Base[][]
168+
*/
169+
public static function getAllPossiblePathsFromPredecesArray(Vertex $startVertex, Vertex $endVertex, array $predecesEdgesArray)
170+
{
171+
$edgesPaths = [];
172+
173+
if ($startVertex === $endVertex) {
174+
return [];
175+
}
176+
177+
$predecessorEdges = $predecesEdgesArray[$endVertex->getId()];
178+
179+
foreach ($predecessorEdges as $edge) {
180+
if ($endVertex === $edge->getVerticesStart()->getVertexFirst()) {
181+
$nextVertex = $edge->getVerticesTarget()->getVertexFirst();
182+
} else {
183+
$nextVertex = $edge->getVerticesStart()->getVertexFirst();
184+
}
185+
186+
$edgesPaths2 = self::getAllPossiblePathsFromPredecesArray($startVertex, $nextVertex, $predecesEdgesArray);
187+
if ($edgesPaths2) {
188+
foreach ($edgesPaths2 as &$edges2) {
189+
$edges2[] = $edge;
190+
}
191+
} else {
192+
$edgesPaths2 = [[$edge]];
193+
}
194+
195+
$edgesPaths = array_merge($edgesPaths, $edgesPaths2);
196+
}
197+
198+
return $edgesPaths;
199+
}
200+
}
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
<?php
2+
3+
namespace Mouf\Database\SchemaAnalyzer;
4+
5+
class MultiDijkstraAmbiguityException extends \Exception
6+
{
7+
}
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
<?php
2+
3+
namespace Mouf\Database\SchemaAnalyzer;
4+
5+
class MultiDijkstraNoPathException extends \Exception
6+
{
7+
}

0 commit comments

Comments
 (0)