-
Notifications
You must be signed in to change notification settings - Fork 6
Add flow-filter stage to assert packets belong to a valid peering connection #1158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
0e29c5a to
999f163
Compare
mvachhar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really nice. When can we remove the dst_vpcd_lookup stage?
There's no real point in keeping the two. I can add a commit to this PR to remove it, if you like the changes. |
Add two methods insert_public_ip() and public_ips_mut() to VpcExpose in order to manipulate "public IPs" for the expose, that is, the exposed prefix if NAT is not in use (or empty), or the translated prefixes otherwise. These methods will be used in a follow-up commit. Signed-off-by: Quentin Monnet <[email protected]>
We'll have a merge() method for several objects in the next commit, and this method will return the merged version regardless of the order of the two objects. In preparation for this, let's rename the current "merge()" methods as "extend_right()", to avoid name conflicts and to make it more explicit that self is lower than the object we merge into it. Signed-off-by: Quentin Monnet <[email protected]>
Add a merge() method to trait IpRangeWithPorts and implement it for various objects implementing the trait (PrefixWithPorts, PrefixWithOptionalPorts), or not implementing it (Prefix, PortRange). This method will be used in follow-up work to manipulate lists of prefixes. Signed-off-by: Quentin Monnet <[email protected]>
This trie is used with IP prefixes associated with port ranges, to determine what prefix + optional port range a given tuple (IP address, port) belongs to. We need such data structures in a few locations in the code for NAT or destination VPC lookup, so we should as well make it generic so that we can optimise or replace it in an easy fashion, if needed. Values for the trie must implement trait ValueWithAssociatedRanges for the lookups to succeed. Signed-off-by: Quentin Monnet <[email protected]>
999f163 to
65e69b0
Compare
Rather than implementing longest-prefix-match lookups for IP prefixes with associated port ranges, reuse the generic struct that we recently introduced in the flow-filter crate. There should be no functional change. Signed-off-by: Quentin Monnet <[email protected]>
65e69b0 to
e131863
Compare
|
So I have reworked the new pipeline stage. Here's what we're looking at:
THIS PR DOES NOT ACCOUNT FOR TODAY'S DISCUSSION ON THE SHARED IPS. The related adjustments will come on top of this (can be the same PR, although it's maybe easier have them in a new one). But given that I have this code ready with support for destination VPC lookup for some edge cases, I think it's worth committing this code to the Git history, even if we remove some of the processing right after. |
The new flow-filter crate introduced in this commit contains a new
pipeline stage that addresses multiple purposes:
- It validates that a packet matches an existing peering connection
- It retrieves the destination VPC for the packet
- It improves prefix overlap handling, when compared to the existing
destination VPC lookup stage
Flow validation
---------------
This flow-filter stage validates that a packet matches an existing
peering connection, as defined in the configuration provided by the
user. All packets that do not have a source IP, port and destination IP,
port corresponding to existing, valid connections between the prefixes
in exposed lists of peerings, get dropped.
This allows us to enforce that traffic matches the peering rule:
- Generically, whether or not the appropriate rules are set
- For stateless NAT, which would just let the packet go un-NATed if
there was no NAT rule to be found, so far
- For stateful NAT, allowing us to remove the dirty implementation of
similar checks in the stateful NAT code (in future work).
Destination VPC lookup
----------------------
Since filtering flow involves retrieving the source VPC, source IP/port
and destination IP/port, then looking in a table, we've got everything
we need to reimplement a destination VPC lookup stage. It's trivial to
store the destination VPC in the context table, and to attach it to the
packet's metadata on successful lookups. Doing so, we replicate the
function of the dst_vpcd_lookup stage: the legacy, redundant stage will
be removed in a follow-up commit. This will make one stage fewer in the
pipeline, and will improve accuracy, as described in the next section.
Improved resiliency for "shared IPs"
------------------------------------
When looking up for the destination VPC for a packet, we need a special
treatment when IP/port ranges are shared across peerings. In the legacy
dst_vpcd_lookup stage, we already had the checks in place to make sure
that the lookup stage does not return an answer when there are multiple
possible matching destination VPCs (based on source VPC and destination
address).
However, the restriction in that stage is too strong: when we have
overlapping, but distinct IP ranges shared across peerings, we can tell
the destination VPC for some portion of the ranges.
Consider the following case, for example:
VPC A <-> VPC B <-> VPC C
1.0.0.0/24 2.0.0.0/24 2.0.0.0/24 1.0.0.0/24
In this case, we just can't tell, for packets coming from VPC B, what
the destination VPC is (unless there's some stateful NAT session in
place, but that's beyond the destination VPC lookup stage's scope).
But if, instead, we have:
VPC A <-> VPC B <-> VPC C
1.0.0.0/23 2.0.0.0/24 2.0.0.0/24 1.0.0.0/24
Then for packets going to 1.0.1.0/24, we know the destination is VPC A.
Before this commit, the destination VPC lookup would not account for the
unambiguous portion of prefixes.
Similarly, in a configuration such as this one:
VPC A <-> VPC B <-> VPC C
1.0.0.0/24 5.0.0.0/24 2.0.0.0/24 1.0.0.0/24
Then we can find the relevant destination VPC for packets coming from B,
because even if the destination prefixes are the same, we can look at
the source address too, and find which (non-overlapping, in our example)
prefix it belongs to. Before this commit, the destination VPC lookup
would never account for the source address.
This commit does not change the legacy stage, but the flow-filter stage
is able to identify the correct destination VPC in both situations.
Unit tests for the new pipeline stage will come in a follow-up commit.
Signed-off-by: Quentin Monnet <[email protected]>
Add unit tests to test the various modules in the recent flow-filter crate. Many of these were generated by Claude, with manual control and adjustments. Some were added without Claude's intervention. Co-Authored-By: Claude <[email protected]> Signed-off-by: Quentin Monnet <[email protected]>
Now that we've transitioned to the new, more powerful/accurate flow-filter stage for looking up the destination VPC discriminant for packets, remove the legacy, redundant dst_vpcd_lookup stage. Signed-off-by: Quentin Monnet <[email protected]>
e131863 to
4dc9978
Compare
The new flow-filter crate introduced in this commit contains a new pipeline stage for validating that a packet matches an existing peering connection, as defined in the configuration provided by the user. All packets that do not have a source IP, port and destination IP, port corresponding to existing, valid connections between the prefixes in exposed lists of peerings, get dropped.
This allows us to enforce that traffic matches the peering rule:
Because the new stage gets the whole context about the authorized connections (as per the peering configuration), it can also replace the destination VPC lookup, to avoid redundant structures and lookups.
In the future, we could also easily adjust and use the stage to mark whether packets should be processed for stateless or stateful NAT, which would allow us to drop the "NAT exempt list" hack used for stateful NAT.
PR is not ready to be merged because some points need discussion, but it's ready for review and comments.