Add routing optimisation options#65
Add routing optimisation options#65periodically-makes-puns wants to merge 14 commits intomockingbirdnest:masterfrom
Conversation
|
I think such checkbox/opt-in ideas are fine for development and for specific users that are coached through setting them -- for instance, in testing on networks that aren't the solutions you considered. However, they're very bad for typical UX. Particularly because the answer to the question "What boxes should be checked?" is answered by "Read the output compute time and pick the one with the lowest number. Unless your network isn't doing what you think it should, in which case, try something else." Where "what you think it should" is meaningless to 99% of users I have interacted with in RA or Skopos. As for the 1-hop optimization: why is this faster than just the finding the intersection of {nodes linked from source} and {nodes linked from destination}? That's the definition of 1-hop. You can brute-force that list rather than brute-force every vessel with [some antenna characteristic that is only valid with current configs]? I'll otherwise defer to @eggrobin on the perf/algorithmic complexity he's looking for and general graph theory topics, since I'm 25 years out of date on doing that. |
And for maintenance, because we’re maintaining one more algorithm (and worse, one that most people might not use). I will have to take a close look at the A* & Floyd–Warshall stuff; I have seen but not followed the discussion on Discord, and this will take some time to properly think about. Maybe this week-end, maybe the one after that. |
Should now be fixed. When I wrote that I was working under the incorrect assumption that vessel count is significantly less than the average degree of a ground station node, which is demonstrably false.
Noted. I suspect the end state for this might look something like an automatic switching algorithm between the routing functions with metrics and information from that function hidden in a debug panel. Something like the state machine you mentioned in #skopos, operating off of a formula based off of node count and contract count. In terms of maintainability, I think we could remove one-hop optimization from consideration. It's not much of a speedup compared to A*, and introduces awkward correctness considerations. I think there's a relatively simple sufficient condition (2 * min (ground to vessel edge latencies) >= max (ground to vessel edge latencies)) but unless we filter out non-Skopos vessels, that condition is unlikely to ever fire. |
|
I don't normally comment on Skopos PRs, but there is a lot of complicated algorithmic code in this PR and virtually no new tests. Do the existing tests cover enough cases that they would surely catch a bug in the new code? Maybe, but somehow I am skeptical. (But remember that I don't understand any of this.) Also, given that performance is the driver for these changes, benchmarks would be a good way to validate that this is actually an improvement. It would also help assess under what circumstances each algorithm should be used. Abstract reasoning about asymptotic complexity or the multiplicative constant I find totally unconvincing (that must be my inexperience). |
|
Good point. I'm going to start working on some actual benchmarks for the routing algorithm that aren't "run it against my save and see how it does". In terms of correctness tests I could write a random graph test that compares A* routing to the current Dijkstra's routing to ensure they always match (because they should always match if my correctness argument holds), but I don't have any immediate thoughts on edgecase bugs that could break A*. |
|
I think that a random graph of reasonable size would be a nice proof of correctness. And the same could also be used to compare the performance of the algorithms. |
… the ShortestPath shortcut assign Channel.latency if it succeeds
|
I added Yes, there is a Keplerian orbit to ECEF coordinate calculator in there, because I wanted the ability to simulate routing against other comm network layouts. There are tests for the test infrastructure. These are very much not unit tests, but I do not know the best way to split out performance/overall correctness tests from unit tests. It does use two different Routing instances at the same time, which might cause issues? I'm not sure. It's definitely able to detect routing discrepancies, but the second instance seems to run much faster when timed. It's not a random graph by any means, but it is a graph that we might see in practice, which I think may be slightly more effective as a test. I am surprised by how sparse the network ends up being in most cases. The average degree appears to be around 4-8, which I suppose makes sense if you consider that the majority of the graph consists of ground stations, and ground stations only link to vessels, of which there are very few. Perhaps I should implement a Dijkstra's-based heuristic layer and see how it stacks up performance-wise. Currently one of the tests is failing due to a different choice in links. The latency of the channels are the same between the two, but that discrepancy causes a snowballing effect that leads to different routing later (if allowed to continue). I suspect this is due to the grid-like layout of the satellites in that test, which causes a tie for the correct solution. I do not know how to reprioritize A* such that it yields the exact same result as Dijkstra's even in the case of a tie for the correct solution. I will see if I can write a smaller unit test that demonstrates this behaviour later. |
|
Dropping one-bounce optimisation from consideration. As noted earlier, it is an extra algorithm that would need to be maintained, and is mostly impractical to use due to its strict correctness condition on edge weights. A single L-band omni in LEO breaks the latency assumption needed for one-bounce. I would much rather spend more development time on improving the A* search and its associated precomputation steps. |
|
This has become an incredibly bloated PR, and I have more optimisations in separate branches now, so I'm going to spend some time restructuring my Git branches so that I can split this into separate PRs with specific intent.
|
Summary
This implements a few new routing methods and optimisations for use by FindChannels: one-hop prioritization and A* search.
The main motivation behind these methods is that Dijkstra's expands around 40 nodes per search, half of which are usually in the complete wrong direction. Reducing the number of links made and nodes expanded drastically improves search performance.
One-Hop Prioritization
Most links in my networks involve solely one intermediate node between source and destination. These particular routes (which I call one-hop) are common and simple, which makes them easy targets for prioritization in the pathfinding algorithm.
This optimisation, when enabled, adds a check prior to Dijkstra's that runs through every vessel containing a wideband antenna around Earth, and finds the best one to relay the connection (via brute force). If no solution exists with only one intermediate node, then it falls back to the current Dijkstra's implementation (or A*, if that is configured).
This is a significant speed-up compared to Dijkstra's, as it shortcuts evaluation for most simplex and duplex connections by evaluating a small, fixed number of links.
However, this prioritization does not preserve correctness in all cases. For example, a user could have a LEO or MEO constellation with strong relaying capability that achieves lower latency than a single hop through a geostationary satellite, in which case the correct route is through the lower constellations. That being said, if the Dijkstra's output would have been a single-bounce route, then this optimisation is correct.
Therefore, I currently leave it up to the user's discretion whether to use this routing optimisation. For networks where it can be proved that no optimal multi-hop path exists that is better than any working single-hop path (e.g. pure geostationary), this offers a substantial speedup with no impact to correctness. For other networks, it may yield different and incorrect behaviour.
A* Search
We solve a relaxed version of the problem that does not consider achievable data rates. This is solvable with any all-pairs shortest-path algorithm, of which I chose Floyd-Warshall. Johnson-Dijkstra was considered but discarded since the network graph is usually fairly dense, and the better constant factor of Floyd-Warshall is likely more important than the better asymptotic complexity of Johnson-Dijkstra.
It is possible that a good or parallel implementation of Johnson-Dijkstra considering less destination nodes than Floyd-Warshall (since it can limit itself to destination nodes that are actually marked as a destination by some contract) will run faster. I have not tested this, since the APSP portion of this optimisation remains a small fraction of the overall FixedUpdate time.
The solution to this relaxed problem is a consistent heuristic for the original problem, which can be easily proven from the definition of a shortest path. Therefore we can implement A* using this heuristic with no impact on correctness. This heuristic is a fairly tight lower bound on the correct answer - in fact it is tight enough that trying the heuristic route first before starting A* saves time on graphs I tested, and we can provably filter nodes and edges that have no hope of improving the best connection latency. Necessary data structures for this optimisation are in
Routing.APSPHeuristic.Currently there are two constants presented as default arguments to
Routing.APSPHeuristic.FindNodesandRouting.APSPHeuristic.GenerateShortestPaths.The first,
bandwidth_filter, filters the band types we care about. If a vessel has no antennas with a band ChannelWidth at leastbandwidth_filter, it discards the vessel from consideration. By default this filters to L, C, and Ku wideband. This can be incorrect if a vessel relays one of the low-datarate connections through an intermediary science band (e.g. Moscow-Washington Hotline, or the Tracking Ship contracts.Discarding unnecessary nodes helps improve the performance of Floyd-Warshall significantly, though I observed minimal impact from pushing this down to include all science bands and therefore all of the ground stations associated with the various launch sites.
The second,
minimum_link_data_rate, filters edges based on their maximum data rate. By default this is 1000, which matches the link data rate of the weakest contracted connection currently in RP-1 (Moscow-Washington Hotline - Level 1). I don't think this one actually does that much other than make the graph sparser, which might improve Johnson-Dijkstra in comparison.Currently this is also configured as a separate option to allow the user to experiment with its effects on FixedUpdate runtime. I could not test this against sparse networks with many vessels in varying altitudes that would dramatically increase the Floyd-Warshall cost, since I did not have any save with such a network. By default this toggle is off to preserve current behavior.