Skip to content

Conversation

@bwbarrett
Copy link
Member

@bwbarrett bwbarrett commented Sep 18, 2017

The reachable framework was largely some code copied from usNIC by Jeff and Ralph, but never got finished. This patch series makes the reachable framework usable, at least for the purposes of the TCP BTL.

After lots of discussion with Jeff, it became clear that the primary use for the reachable framework should be as input to the bipartite graph code also recently moved from usNIC to util/, so the interface for the framework has been tweaked slightly to better match the inputs of the graph solver.

@bwbarrett bwbarrett added this to the Future milestone Sep 18, 2017
@bwbarrett bwbarrett self-assigned this Sep 18, 2017
@bwbarrett bwbarrett changed the title Reachable Clean up reachable framework / components Sep 18, 2017
Copy link
Member

@jsquyres jsquyres left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty minor nits, and some compiler warnings:

base/reachable_base_alloc.c:32: warning: no previous prototype for ‘opal_reachable_allocate’
base/reachable_base_alloc.c: In function ‘opal_reachable_allocate’:
base/reachable_base_alloc.c:53: warning: assignment from incompatible pointer type

CFLAGS = -g
CXX = ortec++
CXXFLAGS = -g
FFLAGS = -g
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: are FFLAGS and CXXFLAGS necessary

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sigh :)


clean:
rm -f $(PROGS) *~

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: blank line at end of file.

CQ_NO_CONNECTION = 0,
CQ_DIFFERENT_NETWORK = 50,
CQ_SAME_NETWORK = 100
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You define the same values in reachable_weighted.c (and also the TCP BTL). Should these be put in reachable.h and prefixed with OPAL_REACHABILITY_ or somesuch?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it, but I don't actually want them to be part of the public interface; it means we can't tweak the algorithms and have to live with those choices going forward. Imagine a netlink version that counted router hops as part of the weight. Or ping latency, or something like that. Those would be hard if we baked the constants into the API.

If you look at the usNIC BTL (which is where we should be moving the TCP BTL), there's no need for these constants, because it's the relative weights that matter. So in time, we'll be removing them from the TCP BTL.

bwbarrett and others added 10 commits September 19, 2017 15:05
Add a check for link-local IPv6 addresses to the net
interface to support better computation of network
pairings in the weighted reachable component.

Signed-off-by: Brian Barrett <[email protected]>
Ralph and Jeff created the reachable framework and added the
netlink component based on code copied from the usnic btl.
However, they never renamed all the symbols from the libnl
compatibility code.  This patch finishes the rename.

Signed-off-by: Brian Barrett <[email protected]>
Initialize the reachable framework during opal_init() and tear
it back down during opal_finalize().  The framework was never
used, so the lack of initialization didn't matter, but this is
a required step in using the framework.

Signed-off-by: Brian Barrett <[email protected]>
Based on work from usNIC, the best way to use the reachability
information the reachable components return is to build a
connectivity graph between the two peers and run a bipartite
graph solver.  Rather than returning the "best" pairing,
the reachability framework now returns the entire mapping,
allowing a (soon to be added) graph solver to build the
"optimal" connectivity pairing.

Practically, this means changing the return type of the
reachable() function and rewriting the weighted_reachable()
function to return the full mapping.  The netlink_reachable()
function still always returns NULL.

At the same time, fix bit-rot in the weighted component and
enable builds of the component by removing the opal_ignore.
Also, add IPv6 support to the weighted component to support
both use cases in the TCP BTL.

Signed-off-by: Brian Barrett <[email protected]>
The netlink reachable component has never been released in a usable
form, but had code copied from usNIC to support both libnl-1 and
libnl-3.  If nothing else, this code was a little buggy in
handling the case where libnl-3 but not libnl-route-3 were
installed.  Jeff and I decided to drop libnl-1 support from the
netlink reachable component, given that it's getting pretty old
and the weighted component provides the same information that
the TCP BTL and OOB are using today, so libnl-1 customers won't
see a step backwards from where they are today.

Signed-off-by: Brian Barrett <[email protected]>
The netlink component's libnl wrapper code returned the
next hop in the route table to allow the calling code
to differentiate between same and different networks,
which is a fine comparison for IPv4, but is pretty
expensive for IPv6 (coming soon to a netlink component
near you).  Rather than provide extra information
(the address of the next hop), just provide whether
there is a gateway or not, which is all the netlink
component actually needs.

Signed-off-by: Brian Barrett <[email protected]>
Add IPv6 support to the netlink component's utility
wrappers around libnl-3.

Signed-off-by: Brian Barrett <[email protected]>
Wire up the libnl utilities Jeff and Ralph added previously to
the netlink reachable component so that it actually does work.
The algorithm is a bit simplistic, but should work for our use
cases.  If there's a route, assume the two interfaces can talk.
If there's no gateway, assume the two interfaces are in the
same subnet, and give preference to that connection.  If there's
a gateway, assume there's a route, but the interfaces are not
in the same subnet.

Signed-off-by: Brian Barrett <[email protected]>
Amazon is going to use the reachable framework to fix some connection
bugs in the TCP BTL, so claim support  ownership of the weighted and
netlink components.

Signed-off-by: Brian Barrett <[email protected]>
Add test suite for netlink and weighted reachable components.  We
don't have a great way of running components through unit tests
today, so make them stand-alone tests that are run with mpirun
and such.

Signed-off-by: Brian Barrett <[email protected]>
@bwbarrett
Copy link
Member Author

we really need to get warnings to the point where "make > /dev/null" pops the new warnings easily... Anyway, comments fixed.

@bwbarrett bwbarrett merged commit c6235a9 into open-mpi:master Sep 20, 2017
@bwbarrett bwbarrett deleted the reachable branch September 20, 2017 02:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants