feat(api): add Flat VPC virtualization type for zero-DPU hosts#1775
Conversation
|
🌿 Preview your docs: https://nvidia-preview-pull-request-1775.docs.buildwithfern.com/infra-controller |
| .ensure_supports_segment(&new_network_segment) | ||
| .map_err(CarbideError::from)?; | ||
| virtualization_type.allocates_svi_for(&new_network_segment) | ||
| } else { |
There was a problem hiding this comment.
Should we fail if there's no VPC and new_network_segment.segment_type is HostInband? Your new code in api/src/handlers/instance/mod.rs will fail allocations onto HostInband segments if there's no vpc_id for the segment, so maybe it's better to validate that here?
There was a problem hiding this comment.
Oh yeah good call out -- so, I think a HostInband segment should be able to exist without being bound to a VPC.
For example, maybe we want a HostInband segment for playing around with zero DPU provisioning, but maybe we'd never bind it to a VPC, and that should be ok; if we required a VPC, we'd have this weird inter-dependency thing where we'd need to create a VPC just for getting zero DPU hosts provisioned. This is kind of similar to Admin maybe?
All that to say, you're right in calling this out. I think the actual adjustment is to update the comments in api/src/handlers/instance/mod.rs to explain that better, and improve the error handling a bit to return an error specific to the segment not being bound to a VPC at allocation time.
I guess TLDR is it should be fine to have a HostInband segment not within a VPC, BUT, once it comes time to allocate an instance from the host into a VPC, the segment the host is in needs to be bound to a VPC?
There was a problem hiding this comment.
And yeah to close this out (going back through this), I updated the comment, and now return a FailedPrecondition if the segment is VPC-less at the point of instance allocation.
| segment: &NewNetworkSegment, | ||
| ) -> Result<(), VpcCapabilityError> { | ||
| self.ensure_supports_segment_type(segment.segment_type)?; | ||
| if segment.prefixes.iter().any(|p| p.prefix.is_ipv6()) { |
There was a problem hiding this comment.
It feels like there's an assumption here that things will always be v4 or dual-stack but never pure-v6. 🤔
There was a problem hiding this comment.
...yeah. I've been trying to get away from that with a lot of the other v6 work. Like a lot of the flow takes an IpAddress to support both, and in other cases I've got an optional v6-specific flag (and subsequently made the v4 flag generic and support v4 or v6) to allow things to be pure v6, but there's probably some to be desired in there. There's even the older cases where certain variables were a Vec with the idea it would support up to 2 "items", to allow for v4 or v6 or dual stack, lol. Let me see what I can do to clean this up.
There was a problem hiding this comment.
Okay, I was trying to decide between implicit and explicit, and I think it [probably??] makes sense to add a supports_ipv4_prefix bool w/ matching check per family. Really it makes sense not to even need the check to begin with, but until all virtualization types actually support v6, I guess we need the check.
There was a problem hiding this comment.
Dealer's choice. Something about seeing one explicitly check and not the other threw me, so maybe comment would have been enough. supports_ipv4_prefix doesn't seem bad, though.
| NetworkSegmentType::Underlay, | ||
| ], | ||
| supports_ipv6_prefix: false, | ||
| supports_routing_profiles: true, |
There was a problem hiding this comment.
I think I need to split this out into two -- I sort of conflated "supports accepting a routing profile type" and "supports applying a routing profile" into a single parameter. In this case yeah, the latter is definitely false (like you're saying), but the former would be true (which is what I'm true-ing) for here. Let me fix!
There was a problem hiding this comment.
I sort of conflated "supports accepting a routing profile type" and "supports applying a routing profile" into a single parameter.
How could one of those be true and the other false, (or maybe, how would it make sense)? 🤔
There was a problem hiding this comment.
@bcavnvidia Soooooo... "supports accepting routing profile type" means the API accepts the profile type string, which ETV and FNN both do.
...but the difference is the extent to which "supports applying a routing profile" differs between ETV and FNN.
For ETV, it means that it uses resolve_vpc_routing to look up the FnnRoutingProfileConfig, and then looks at access_tier, internal, and the profile type gets stored on the VPC... but that's.. it?
Like the agent (in the ETV case) doesn't do anything to apply route_target_imports / route_targets_on_exports / leak_* params.
..so I was trying, but failing, at trying to express that, lol.
There was a problem hiding this comment.
Aaaaah, I don't think the rest API will allow a routing profile for non-FNN VPCs, so maybe we just need to enforce that on the -core side if it doesn't already.
433147c to
425e9a6
Compare
This closes NVIDIA#1522. Adds `Flat` to `VpcVirtualizationType` for VPCs hosted on zero-DPU machines. ETV and FNN both presume a Carbide-managed DPU data plane, so using them for zero-DPU hosts meant allocating overlay machinery that nothing consumed. Flat just records the VPC and lets the network operator's switch fabric own reachability. Per-variant policy lives in a new `VpcCapabilities` profile in `model::vpc::capability`: which host fabric interface the type attaches to (`Dpu` or `Nic`), which segment types it accepts, whether it supports IPv6 / routing profiles / stretched-L2 SVI, and which other types it peers with. Each variant maps to one profile constant; handlers consult capability methods that just read from the profile. Adding a future VPC type is a six-field profile plus one match arm, no handler edits. Flat VPCs and `HostInband` segments are mutually bound -- a Flat VPC can only hold HostInband segments, and HostInband segments can only live in Flat VPCs. Tenants pick `FLAT` through the same VPC create flow as any other type. Docs in a separate PR. Tests added! Signed-off-by: Chet Nichols III <chetn@nvidia.com>
This closes NVIDIA#351, and is the REST-side counterpart to the Flat VPC and `auto` interface work that landed in Core (NVIDIA/infra-controller#1775, NVIDIA/infra-controller#1576, and NVIDIA/infra-controller#1674), which together let tenants put instances on zero-DPU hosts (or hosts with their DPU in NIC mode). The discussion on NVIDIA#351 was originally about whether to model this as `vpcId: null` or a special "unmanaged VPC" type. What landed in Core is closer to the latter -- Flat VPCs are real VPCs with VNIs, NSGs, and peering, but NICo doesn't drive their data plane. How it's done: - Brought the two new fields into `nico_nico.proto`. - `Flat` is a new value in the `network_virtualization_type` enum across the API, DB, OpenAPI, and proto layers. - `auto` is a new boolean on instance create / update that flips the interface configuration model from "explicit list" to "NICo resolves from HostInband segments." Mutually exclusive with `interfaces`, persisted on the instance row so partial updates can re-issue the signal without the caller re-supplying it. - Defense-in-depth check at the REST layer that `auto: true` requires a Flat VPC. Core enforces the same rule, but we fail-fast here too to avoid round-tripping the site for an obviously bad request. Tests added! Signed-off-by: Chet Nichols III <chetn@nvidia.com>
This closes NVIDIA#351, and is the REST-side counterpart to the Flat VPC and `auto` interface work that landed in Core (NVIDIA/infra-controller#1775, NVIDIA/infra-controller#1576, and NVIDIA/infra-controller#1674), which together let tenants put instances on zero-DPU hosts (or hosts with their DPU in NIC mode). The discussion on NVIDIA#351 was originally about whether to model this as `vpcId: null` or a special "unmanaged VPC" type. What landed in Core is closer to the latter -- Flat VPCs are real VPCs with VNIs, NSGs, and peering, but NICo doesn't drive their data plane. How it's done: - Brought the two new fields into `nico_nico.proto`. - `Flat` is a new value in the `network_virtualization_type` enum across the API, DB, OpenAPI, and proto layers. - `auto` is a new boolean on instance create / update that flips the interface configuration model from "explicit list" to "NICo resolves from HostInband segments." Mutually exclusive with `interfaces`, persisted on the instance row so partial updates can re-issue the signal without the caller re-supplying it. - Defense-in-depth check at the REST layer that `auto: true` requires a Flat VPC. Core enforces the same rule, but we fail-fast here too to avoid round-tripping the site for an obviously bad request. This also adds some "VPC capability-like" mappings (similar to what we have in Core) to drive logic decisions at the REST API layer. Tests added! Signed-off-by: Chet Nichols III <chetn@nvidia.com>
This closes NVIDIA#351, and is the REST-side counterpart to the Flat VPC and `auto` interface work that landed in Core (NVIDIA/infra-controller#1775, NVIDIA/infra-controller#1576, and NVIDIA/infra-controller#1674), which together let tenants put instances on zero-DPU hosts (or hosts with their DPU in NIC mode). The discussion on NVIDIA#351 was originally about whether to model this as `vpcId: null` or a special "unmanaged VPC" type. What landed in Core is closer to the latter -- Flat VPCs are real VPCs with VNIs, NSGs, and peering, but NICo doesn't drive their data plane. How it's done: - Brought the two new fields into `nico_nico.proto`. - `Flat` is a new value in the `network_virtualization_type` enum across the API, DB, OpenAPI, and proto layers. - `auto` is a new boolean on instance create / update that flips the interface configuration model from "explicit list" to "NICo resolves from HostInband segments." Mutually exclusive with `interfaces`, persisted on the instance row so partial updates can re-issue the signal without the caller re-supplying it. - Defense-in-depth check at the REST layer that `auto: true` requires a Flat VPC. Core enforces the same rule, but we fail-fast here too to avoid round-tripping the site for an obviously bad request. This also adds some "VPC capability-like" mappings (similar to what we have in Core) to drive logic decisions at the REST API layer. Tests added! Signed-off-by: Chet Nichols III <chetn@nvidia.com>
This closes NVIDIA#351, and is the REST-side counterpart to the Flat VPC and `auto` interface work that landed in Core (NVIDIA/infra-controller#1775, NVIDIA/infra-controller#1576, and NVIDIA/infra-controller#1674), which together let tenants put instances on zero-DPU hosts (or hosts with their DPU in NIC mode). The discussion on NVIDIA#351 was originally about whether to model this as `vpcId: null` or a special "unmanaged VPC" type. What landed in Core is closer to the latter -- Flat VPCs are real VPCs with VNIs, NSGs, and peering, but NICo doesn't drive their data plane. How it's done: - Brought the two new fields into `nico_nico.proto`. - `Flat` is a new value in the `network_virtualization_type` enum across the API, DB, OpenAPI, and proto layers. - `auto` is a new boolean on instance create / update that flips the interface configuration model from "explicit list" to "NICo resolves from HostInband segments." Mutually exclusive with `interfaces`, persisted on the instance row so partial updates can re-issue the signal without the caller re-supplying it. - Defense-in-depth check at the REST layer that `auto: true` requires a Flat VPC. Core enforces the same rule, but we fail-fast here too to avoid round-tripping the site for an obviously bad request. This also adds some "VPC capability-like" mappings (similar to what we have in Core) to drive logic decisions at the REST API layer. Tests added! Signed-off-by: Chet Nichols III <chetn@nvidia.com>
This closes NVIDIA#351, and is the REST-side counterpart to the Flat VPC and `auto` interface work that landed in Core (NVIDIA/infra-controller#1775, NVIDIA/infra-controller#1576, and NVIDIA/infra-controller#1674), which together let tenants put instances on zero-DPU hosts (or hosts with their DPU in NIC mode). The discussion on NVIDIA#351 was originally about whether to model this as `vpcId: null` or a special "unmanaged VPC" type. What landed in Core is closer to the latter -- Flat VPCs are real VPCs with VNIs, NSGs, and peering, but NICo doesn't drive their data plane. How it's done: - Brought the two new fields into `nico_nico.proto`. - `Flat` is a new value in the `network_virtualization_type` enum across the API, DB, OpenAPI, and proto layers. - `auto` is a new boolean on instance create / update that flips the interface configuration model from "explicit list" to "NICo resolves from HostInband segments." Mutually exclusive with `interfaces`, persisted on the instance row so partial updates can re-issue the signal without the caller re-supplying it. - Defense-in-depth check at the REST layer that `auto: true` requires a Flat VPC. Core enforces the same rule, but we fail-fast here too to avoid round-tripping the site for an obviously bad request. This also adds some "VPC capability-like" mappings (similar to what we have in Core) to drive logic decisions at the REST API layer. Tests added! Signed-off-by: Chet Nichols III <chetn@nvidia.com>
This closes NVIDIA#351, and is the REST-side counterpart to the Flat VPC and `auto` interface work that landed in Core (NVIDIA/infra-controller#1775, NVIDIA/infra-controller#1576, and NVIDIA/infra-controller#1674), which together let tenants put instances on zero-DPU hosts (or hosts with their DPU in NIC mode). The discussion on NVIDIA#351 was originally about whether to model this as `vpcId: null` or a special "unmanaged VPC" type. What landed in Core is closer to the latter -- Flat VPCs are real VPCs with VNIs, NSGs, and peering, but NICo doesn't drive their data plane. How it's done: - Brought the two new fields into `nico_nico.proto`. - `Flat` is a new value in the `network_virtualization_type` enum across the API, DB, OpenAPI, and proto layers. - `auto` is a new boolean on instance create / update that flips the interface configuration model from "explicit list" to "NICo resolves from HostInband segments." Mutually exclusive with `interfaces`, persisted on the instance row so partial updates can re-issue the signal without the caller re-supplying it. - Defense-in-depth check at the REST layer that `auto: true` requires a Flat VPC. Core enforces the same rule, but we fail-fast here too to avoid round-tripping the site for an obviously bad request. This also adds some "VPC capability-like" mappings (similar to what we have in Core) to drive logic decisions at the REST API layer. Tests added! Signed-off-by: Chet Nichols III <chetn@nvidia.com>
Description
This closes #1522.
Adds
FlattoVpcVirtualizationTypefor VPCs hosted on zero-DPU machines. ETV and FNN both presume a Carbide-managed DPU data plane, so using them for zero-DPU hosts meant allocating overlay machinery that nothing consumed. Flat just records the VPC and lets the network operator's switch fabric own reachability.Also, as I had mentioned to @Matthias247 and @bcavnvidia on the side:
Per-variant policy lives in a new
VpcCapabilitiesprofile inmodel::vpc::capability: which host fabric interface the type attaches to (DpuorNic), which segment types it accepts, whether it supports IPv6 / routing profiles / stretched-L2 SVI, and which other types it peers with. Each variant maps to one profile constant; handlers consult capability methods that just read from the profile. Adding a future VPC type is a six-field profile plus one match arm, no handler edits.Flat VPCs and
HostInbandsegments are mutually bound -- a Flat VPC can only hold HostInband segments, and HostInband segments can only live in Flat VPCs. Tenants pickFLATthrough the same VPC create flow as any other type.Docs in a separate PR. Tests added!
Signed-off-by: Chet Nichols III chetn@nvidia.com
Type of Change
Related Issues (Optional)
Breaking Changes
Testing
Additional Notes