feat: convert execs to ip to netlink calls#1697
Conversation
38733d4 to
7a58331
Compare
|
This PR is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 10 days. |
4657546 to
cc734e5
Compare
3a4526c to
08f07c7
Compare
08f07c7 to
b0246ef
Compare
Not making direct exec calls to user binary interfaces has long been a principle of kube-router. When kube-router was first coded, the netlink library was missing significant features that forced us to exec out. However, now netlink seems to have most of the functionality that we need. This converts all of the places where we can use netlink to use the netlink functionality.
The rt_tables list is already in an ordered form in terms of priority. Once one is found, it should be considered the optimal one and stop looking for additional tables.
Previously we were accidentally deleting all routes that were found, this mimics the previous functionality better by only deleting external IPs that were found in the externalIPRouteTable that are no longer in the activeExternalIPs map. Also improves logging around any routes that are deleted as this is likely of interest to all kube-router administrators.
In order for a local route to be valid it needs to have the scope set to host. When we were executing ip commands iproute2 just did this for us to make the command accurate. Now that we're communicating with the netlink socket, we need to do this conversion for ourselves. Without this we get an error that says "invalid argument" from the netlink subsystem. But if the route isn't local, then most of the routing logic for services doesn't work correctly because it acts upon external traffic as well as local traffic which isn't correct.
It has proven to be tricky to insert new rules without calling the designated NewRule() function from the netlink library. Usually attempts will fail with an operation not supported message. This improves the reliability of rule insertion.
Consolidate IP utility functions into a new file and add proper unit testing. Additionally consolidate logic and references to default route subnets.
When ip rules are evaluated in the netlink library, default routes for src and dst are equated to nil. This makes it difficult to evaluate them and requires additional handling in order for them. I filed an issue upstream so that this could potentially get fixed: vishvananda/netlink#1080 however if it doesn't get resolved, this should allow us to move forward.
Removes repeated logic of calculating IP address subnets for single subnet hosts and consolidates it in one place.
It used to be when we were using iproute2's CLI we needed to have the fwmark as a hex number so we were passing it as a string in that format. However, now that we use the netlink library directly, we already have the fwmark in the condition that we need it. So instead of doing all of these string <-> int conversions, lets just keep this simpler.
Instead of deleting and just hoping for the best, this change makes it so that we check first whether or not a route exists. This helps to reduce needless warnings that the user receives and is just all around more accurate.
Rather than yolo'ing a delete of the IP on the interface, check to see if it exists and save the user some warning in their logs.
Previously, kube-router was only considering externalIPs when setting up source routing policy, notably absent was consideration of LoadBalancer IPs which are equally important for getting right with DSR. This appears to have been a long-standing use-case that was never correctly considered since when kube-router added a LoadBalancer controller.
Over time this function has grown to be way too large and difficult to read. This refactor splits out this function into smaller chunks and makes it easier to follow what's going on.
b0246ef to
f0244b5
Compare
|
This PR is almost ready to go. There is just one small problem with IPv4 DSR service routing from a worker node in the cluster to an LB IP when the destination get's loadbalanced to another node in the cluster. During this scenario the service request times out instead of being fulfilled. This doesn't seem to affect IPv6 traffic, or non-DSR enabled services, or IPv4 / DSR services when the traffic policy is local. Brief testing shows that this scenario also works ok with the current stable kube-router build. |
This was originally added in PR #210, but it appears to cause more problems in my testing scenarios than it solves. When this is enabled, it makes it so that services cannot be routed to from kube workers to DSR enabled services when routed to other nodes in the cluster.
|
There is a minor outstanding case where there is some problems with DSR traffic (see #1870) however, this was true whether netlink was used or iproute2 user-space tooling. So I think that we're fine leaving that issue alone for now. |
Not making direct exec calls to user binary interfaces has long been a principle of kube-router. When kube-router was first coded, the netlink library was missing significant features that forced us to exec out. However, now netlink seems to have most of the functionality that we need.
This converts all of the places where we can use netlink to use the netlink functionality.
The current state of this PR is untested and still needs to undergo significant testing: