Replace implementation of maximum_weighted_matching() by jorisvr · Pull Request #400 · boostorg/graph

jorisvr · 2024-12-05T18:57:58Z

This is a re-implementation of maximum_weighted_matching, based on a paper by Zvi Galil.
The new code runs in time O(V^3).
A new set of testcases are also added.

Resolves #199 #223 #399

The code has been tested extensively on large random graphs, using LEMON as a reference.

Faster algorithms are known for this problem. I initially planned to implement the O(VElog(V)) algorithm by Galil, Micali and Gabow. However it needs a mutable heap with features that are not readily available in the BGL, and it needs a special kind of mergeable priority queue. While possible, I feel that the amount of code would be disproportionate. So I decided to fall back to a simpler O(V^3) algorithm, essentially the same algorithm that inspired the previous implementation.

Feedback is very welcome. I will already mention a few points that may draw criticism:

I kept brute_force_maximum_weighted_matching() unchanged. This function is not very useful in my opinion, but it was part of the public API and there is no need to change it.
The documented API of maximum_weighted_matching() is backwards compatible with the previous code. But I removed the class weighted_augmenting_path_finder, which was essentially an internal detail although it lived in the global boost namespace.
I do runtime checks on some aspects of the input graph (vertex indices and range of edge weights). I don't see this much in the BGL and I guess it may be frowned upon. The thing is, the code will trigger undefined behavior if these preconditions are violated, and I feel like I can't let that happen.
Once a matching has been computed, a separate (much faster) algorithm can verify that the matching is optimal. If the primary algorithm is correct, this verification will never fail. I enabled the verification step by default, even though it is redundant and never fails. The primary algorithm is tricky. I feel that the certainty provided by the verification step is worth more than the clock cycles it costs.

Without this check, the test program declares all tests passed if it fails to open the input file.

- Hand-picked graphs to explore basic functionality. - Graphs that are known to trigger bugs in the old implementation of maximum_weighted_matching(). - Random small graphs.

The new code runs in O(V^3). It also solves a number of known issues.

jeremy-murphy · 2025-01-14T09:19:47Z

Are the heap data structures in Boost.Heap not sufficient? They are mergeable and mutable.
However, even if they are sufficient, I would be happy to simply get a correct implementation first that fixes the bugs and get a fast implementation later.
What do you think?

jorisvr · 2025-01-14T12:24:38Z

Are the heap data structures in Boost.Heap not sufficient?

The mutable heap in Boost.Heap would work as the "plain" type of heap in the matching algo. But it looks like BGL currently does not use Boost.Heap and I don't know how you feel about adding that dependency.

I also need a concatenable heap which does not currently exist in Boost. The merge feature of Boost.Heap is not sufficient. I need to merge heaps in O(log(n)) time with the option to unmerge them later in O(log(n)). The typical way to implement this is with a custom balanced binary tree. It's not rocket science but it adds another 800 lines or so.

LEMON and LEDA implement the O(V E log(V)) matching algorithm. It is much faster than O(V^3) on certain classes of sparse graphs. The speedup on random sparse graphs is fairly modest in my experience. And it can be slower on dense graphs.

The new code is already an order of magnitude faster than the previous version for graphs with V > 200. My feeling is that the faster algorithm adds a lot of code in exchange for little benefit. But I'm up for the challenge. If you want the best matching algorithm in BGL, I will be happy to work on it.

jeremy-murphy · 2025-01-15T00:37:25Z

Thanks for the explanation. There's no problem with adding Boost.Heap as a dependency, as Boost.Graph already depends on many other parts of Boost. Sounds like the efficient algorithm would require adding a new data structure to Boost.Heap to start with, which shouldn't be too difficult, although I'm aware that the maintainer is not all that active any more. Given that the new implementation is much faster anyway, let's defer the efficient algorithm to later. Ultimately it would be nice to have a top-level algorithm that uses a heuristic to pick the fastest algorithm but users are still free to call specific algorithms. (Best of both worlds.)
Ok, now I have to find time to look over this code. Honestly, it could take a couple of weeks, so please be patient.
Thanks for the work!

jorisvr · 2025-01-15T08:02:53Z

Sounds like the efficient algorithm would require adding a new data structure to Boost.Heap to start with

I think the concatenable queue may be so special-purpose that it could just stay in BGL, but generalizing it is definitely also a valid option.

Given that the new implementation is much faster anyway, let's defer the efficient algorithm to later.

Agreed. It occurs to me that the O(V E log(V)) algorithm also needs an edge_index property, which breaks backward compatibility. There may be ways to deal with this, but it seems like it will be a more difficult road than the current PR.

Ok, now I have to find time to look over this code.

I understand. There is no hurry from my side. Thanks for supporting this effort.

jeremy-murphy · 2025-01-22T09:30:43Z

What's the "nearest ancestor" problem referred to in the documentation for the fast Gabow algorithm? Is that LCA or something else?

jorisvr · 2025-01-22T20:20:20Z

What's the "nearest ancestor" problem referred to in the documentation for the fast Gabow algorithm?

To be honest, I don't know. I kept this comment from the documentation by Yi Ji as I saw no reason to remove it.

I know about the existence of that fast algorithm. I tried to read the paper by Gabow but I can not make heads or tails of it. Mehlhorn and Schaefer made an offhand remark that this algorithm may be unpractical (https://dl.acm.org/doi/10.1145/944618.944622 page 7) but that was a long time ago. I'm not aware of any public available implementation.

jeremy-murphy · 2025-01-24T08:49:39Z

Doesn't surprise me too much. Bender et. al. made a similar remark about the theoretically optimal algorithm for LCA: it just ain't worth it.

jeremy-murphy · 2025-01-25T22:14:56Z

Citation LCA remark: https://www.sciencedirect.com/science/article/abs/pii/S0196677405000854

jeremy-murphy

I haven't even got to the code proper yet but here are a few requests and questions to start with.

test/weighted_matching_test2.cpp

doc/maximum_weighted_matching.html

test/Jamfile.v2

jeremy-murphy

Sorry, just a trivial request, but since the library is C++14 now can you please use using instead of typedef and remove spaces from in-between nested template <<....>> brackets? Thanks. I will enable clang-format one day...
PS. I mean, a template should be written as
template <typename Foo>, etc.

jeremy-murphy

Not quite finished...

test/weighted_matching_test2.cpp

include/boost/graph/maximum_weighted_matching.hpp

jeremy-murphy · 2025-02-05T11:29:26Z

include/boost/graph/maximum_weighted_matching.hpp

+    typedef typename property_traits< VertexIndexMap >::value_type index_t;
+    typedef typename std::make_unsigned<index_t>::type unsigned_index_t;
+    auto nv = num_vertices(g);
+    std::vector<bool> got_vertex(nv);


Make sure this is really what you want, as opposed to a bitset.

I believe std::bitset<N> requires its size to be fixed at compile time. But I need an array which is sized at run time to match the number of vertices of the graph.

Yes, sorry, I meant Boost's dynamic_bitset.

Ah ok. vector<bool> and dynamic_bitset both provide the functionality I need. I don't see a specific reason to prefer one or the other.

I forgot to address this one, whoops.
vector<bool> is generally avoided because of its unusual performance trying to satisfy the standard container interface (providing a reference to each element via iteration) whilst only using one bit per value, which requires the use of a proxy class, etc.
A dynamic bitset is just more honest about what it is and does.
It's not a big deal, might not need to be changed until someone makes some other changes in there.

include/boost/graph/maximum_weighted_matching.hpp

jeremy-murphy · 2025-02-05T11:33:53Z

include/boost/graph/maximum_weighted_matching.hpp

+    template < typename Func >
+    static void for_vertices_in_blossom(const blossom_t* blossom, Func func)
+    {
+        const nontrivial_blossom_t* ntb = blossom->nontrivial();


Just use auto?

I don't understand.
Do you mean auto func instead of template <typename Func> ? That was not allowed before C++20.

Or do you mean auto ntb instead of const nontrivial_blossom_t* ntb ?

The latter.

Ok. I changed it to auto for these and a few similar verbose declarations.

Is that what you had in mind, or do you want to push further towards the almost-alway-auto style?

I personally use and recommend AAA style, but what you've done here is fine. I just felt those long typenames with qualifiers were not helping the readability.

Also remove unnecessary < > spaces around template arguments.

jeremy-murphy

Still going,..

jeremy-murphy · 2025-02-05T11:42:45Z

include/boost/graph/maximum_weighted_matching.hpp

+        blossom_label_t label;
+
+        /** True if this is an instance of nontrivial_blossom. */
+        const bool is_nontrivial_blossom;


No const member variables either, same reason as references, but I'll review again on a proper screen to decide if it's worth changing.

It's a good general principle, so yeah, please change it.

include/boost/graph/maximum_weighted_matching.hpp

jeremy-murphy · 2025-02-05T22:47:44Z

include/boost/graph/maximum_weighted_matching.hpp

+    typedef typename property_traits< VertexIndexMap >::value_type index_t;
+    typedef typename std::make_unsigned<index_t>::type unsigned_index_t;
+    auto nv = num_vertices(g);
+    std::vector<bool> got_vertex(nv);


Yes, sorry, I meant Boost's dynamic_bitset.

include/boost/graph/maximum_weighted_matching.hpp

jeremy-murphy · 2025-02-05T22:49:56Z

include/boost/graph/maximum_weighted_matching.hpp

+    template < typename Func >
+    static void for_vertices_in_blossom(const blossom_t* blossom, Func func)
+    {
+        const nontrivial_blossom_t* ntb = blossom->nontrivial();


The latter.

jeremy-murphy · 2025-02-05T22:56:53Z

include/boost/graph/maximum_weighted_matching.hpp

-                = sub_blossom->vertices();
-            for (vertex_vec_iter_t v = sub_vertices.begin();
-                 v != sub_vertices.end(); ++v)
+            if ((! edge.has_value()) || (s < slack))


I'm personally not in favour of a space between a unary operator such as ! and its operand, same as for other unary operators like dereference *, negation -, etc.
I don't think BGL has a history of it, but maybe I'm wrong?

Oh, I probably came up with this. I don't see it anywhere else in BGL.
I removed the spaces now.

I really like the space, but this is not the time to start a discussion about code formatting.

jeremy-murphy · 2025-02-06T01:52:12Z

There are some functions that take a blossom* but require that it is not null, in which case they should take a reference instead.

jeremy-murphy · 2025-02-06T02:02:26Z

That's all from me, once the last comments are resolved I'll merge it in.
Overall I love the code, so thank you!

jorisvr · 2025-02-06T23:39:44Z

There are some functions that take a blossom* but require that it is not null, in which case they should take a reference instead.

I now changed these into blossom&.
This involved adding a number of * and & operators in the code. Because blossom* is still the native type in several data structures, I end up going back-and-forth between pointer and reference. In my opinion, it makes the code less clear. But I can live with it.

jeremy-murphy · 2025-02-10T02:16:28Z

There are some functions that take a blossom* but require that it is not null, in which case they should take a reference instead.

I now changed these into blossom&. This involved adding a number of * and & operators in the code. Because blossom* is still the native type in several data structures, I end up going back-and-forth between pointer and reference. In my opinion, it makes the code less clear. But I can live with it.

I know what you mean, sometimes you just have to pay a price for doing things correctly, but it also might mean that those other places using blossom* could do with some refactoring.

PS. And I realize that sounds hypocritical because I asked you to change the member variables from reference to pointer. Sometimes C++ is just ugly.

jorisvr added 5 commits December 3, 2024 21:38

weighted_matching_test check input file

f98edae

Without this check, the test program declares all tests passed if it fails to open the input file.

Add more tests for maximum_weighted_matching

f30ba13

- Hand-picked graphs to explore basic functionality. - Graphs that are known to trigger bugs in the old implementation of maximum_weighted_matching(). - Random small graphs.

Replace maximum_weighted_matching implementation

fb66ac4

The new code runs in O(V^3). It also solves a number of known issues.

Update maximum_weighted_matching documentation

059365f

Add weighted_matching_example to Jamfile

5fa8626

jorisvr force-pushed the matching_joris branch from dbe033a to 5fa8626 Compare January 13, 2025 20:44

jorisvr changed the title ~~Add tests for maximum_weighted_matching()~~ Replace implementation of maximum_weighted_matching() Jan 13, 2025

jorisvr mentioned this pull request Jan 13, 2025

multiple bugs in maximum_weighted_matching() #399

Closed

jeremy-murphy self-assigned this Jan 13, 2025

jeremy-murphy requested changes Jan 25, 2025

View reviewed changes

jorisvr added 5 commits January 26, 2025 12:04

Fix include order after review

c8bd42d

Fix include order after review

fa388e2

Fix grammar in doc after review

26efe66

Remove verification code

13c5fd7

Clarify algorithm doc after review

88a4f85

jeremy-murphy requested changes Feb 5, 2025

View reviewed changes

jeremy-murphy reviewed Feb 5, 2025

View reviewed changes

jorisvr added 2 commits February 5, 2025 19:21

Remove random test cases

6396f82

Rewrite "typedef" as "using"

69f393a

Also remove unnecessary < > spaces around template arguments.

jeremy-murphy reviewed Feb 5, 2025

View reviewed changes

jorisvr added 6 commits February 6, 2025 20:09

Remove unnused #include

3db393b

Use pointer instead of reference member variable

8600b33

Avoid const member variables

cbcefb2

Use auto for local variables

ef60a85

Avoid space after "!" operator

dc9886b

Use reference arguments instead of pointers

6d9f239

jeremy-murphy approved these changes Feb 10, 2025

View reviewed changes

jeremy-murphy merged commit 167ac18 into boostorg:develop Feb 10, 2025
22 checks passed

Conversation

jorisvr commented Dec 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeremy-murphy commented Jan 14, 2025

Uh oh!

jorisvr commented Jan 14, 2025

Uh oh!

jeremy-murphy commented Jan 15, 2025

Uh oh!

jorisvr commented Jan 15, 2025

Uh oh!

jeremy-murphy commented Jan 22, 2025

Uh oh!

jorisvr commented Jan 22, 2025

Uh oh!

jeremy-murphy commented Jan 24, 2025

Uh oh!

jeremy-murphy commented Jan 25, 2025

Uh oh!

jeremy-murphy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jeremy-murphy left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jeremy-murphy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jeremy-murphy left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jorisvr commented Dec 5, 2024 •

edited

Loading

jeremy-murphy left a comment •

edited

Loading

jeremy-murphy commented Feb 10, 2025 •

edited

Loading