Skip to content

Conversation

@Nieuwejaar
Copy link
Collaborator

I've added both Ry and Levon as reviewers. I'm particularly interested on Ry's thoughts on the P4 (which is implemented slightly differently on sidecar-lite) and Levon's input on the API, but I'll take any feedback you have to offer.

@rcgoodfellow rcgoodfellow added this to the 18 milestone Jan 8, 2026
Copy link
Contributor

@FelixMcFelix FelixMcFelix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Nils, I've had a look over the P4 and the test machinery. I didn't see anything on the API side to give me pause.

# Note: this now seems silly since we have maxed out the number of stages, but
# we want to leave this check and note in place should we ever find a way to
# reduce our footprint below 20 stages.
TOFINO_STAGES=20
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yikes 😬

Comment on lines +334 to +343
// This is identical to the more generically named InstanceTarget, which should
// be used for new APIs.
#[derive(
Clone, Copy, Debug, PartialEq, Eq, Deserialize, JsonSchema, Serialize,
)]
pub struct NatTarget {
pub internal_ip: Ipv6Addr,
pub inner_mac: MacAddr,
pub vni: Vni,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this type being kept around due to versioned Open API specs? I haven't had too much interaction with that machinery yet.

Comment on lines +62 to +63
// Remove non-existent entry and be sure it fails
Ok(())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing a test here?

Comment on lines +104 to +106
// On egress, we deencapsulate the geneve packet as we do for NAT, but we don't
// rewrite the source address before forwarding the packet.
async fn test_egress(switch: &Switch, test: &ExternalTest) -> TestResult {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment confused me a bit. The first 'we' refers to the switch, but the second 'we' is the test harness I'd assume (since sidecar.p4 just decaps on egress)?

Comment on lines +212 to +215
// On ingress, we wrap the incoming packet with a geneve header and forward to
// the correct gimlet. Unlike NAT ingress, there is no manipulation of the
// target address.
async fn test_ingress(switch: &Switch, test: &ExternalTest) -> TestResult {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think NatIngress adjusts any target address, other than the destination MAC (which external subnets do too).

Comment on lines +547 to +552
pkt_src_ip: "fa00:22::1".to_string(),
pkt_src_mac: "a1:a2:a3:a4:a5:a6".to_string(),
pkt_src_port: 3333,
pkt_dst_ip: "fc00:13::1".to_string(),
pkt_dst_mac: "b1:b2:b3:b4:b5:b6".to_string(),
pkt_dst_port: 4444,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src_ip and dst_ip seem swapped here wrt. the diagram, also placing pkt_src_ip outside of the external_subnet for egress.

(I think that sidecar.p4 will just blindly decap traffic directed at the switch address anyhow? Which would explain this test passing.)

meta.nat_ingress_tgt = target;
meta.nat_inner_mac = inner_mac;
meta.nat_geneve_vni = vni;
meta.encap_needed = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meta.encap_needed is always equal to meta.nat_ingress_hit. Can we remove nat_ingress_hit and update the last remaining reference it has in table service to use encap_needed instead?

Comment on lines +148 to +154
table external_subnets_v4 {
key = { hdr.ipv4.dst_addr: lpm; }
actions = { forward_extsub_v4_to ; }

const size = EXTERNAL_SUBNETS_V4_SIZE + 1;
counters = external_subnets_v4_ctr;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly unviable suggestion if we're looking to save on stages -- do we need new tables at all (and thus multiple paths to set meta.encap_needed)? I.e., we could update NatIngress to the form:

control NatIngress (...)
{
	// ...
	table ingress_ipv4 {
		key = {
			hdr.ipv4.dst_addr : lpm;
			meta.l4_dst_port : range;
		}
		actions = { forward_ipv4_to; }

		const size = IPV4_NAT_TABLE_SIZE;
		counters = ipv4_ingress_ctr;
	}
	// ...
}

We'd then program external subnets with port ranges always set to 0–65535, and individual NAT/SNAT IPs as /32s or /128s.

The downside is that we no longer have dedicated counters for external subnets. But the reason I'm thinking of this as an approach is that there's no difference in dataplane operations applied by sidecar for these packets whether they get standard NAT or an external subnet (it's always [ingress] push encap and rewrite inner dst MAC or [egress] strip encap). I also feel as though the control plane shouldn't be allowing the pools for these concepts to overlap, so there shouldn't be any ambiguity on when to remove such table entries?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good idea, and it took me a couple of minutes to figure out why it wouldn't work. The problem is that we drop any packets that arrive at the switch that aren't addressed to the switch. This is what we use the Filter control for. All packets arriving for a NATed guest have a switch port's address as its destination IP. The external subnet traffic has an address on that subnet as a destination IP. So, we either need a table containing all the allowed subnet targets or we have to allow any packet into our Ingress pipeline, and hope it's not aimed at one of our underlay addresses.

I probably could have the external_subnet table just contain the allowed subnets in the Filter control and then let the NatIngress control actually plug in the forwarding data. I'm concerned about the complexity (in dpd, omicron, or both) required to work with an external subnet mechanism that involves multiple tables, or a single ingress_ipv4 table that is populated with both external subnets and NAT ranges. I'm also not sure that ends up saving us any space in the end, but it's worth exploring in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants