Skip to content

Tracking issue: Moving system-level networking reconciliation from Nexus to sled-agent #10167

@jgallagher

Description

@jgallagher

(Much of this was originally written by @rcgoodfellow and @davepacheco; I'm copying details with some editing over from a google doc, linked below, into this tracking issue.)

System-level networking is the collective networking machinery that is required for the rack as a whole to function. From the perspective of Omicron, this includes:

  • Synchronizing switch port settings to both dpd and uplinkd. This includes things like physical link settings as well as addressing.
  • Synchronizing NAT entries for control plane services that require external communication (today: boundary NTP, Nexus, and external DNS) to dpd.
  • Synchronizing routing configuration (which are currently a part of switch port settings) to mgd. This includes static routes, BGP configuration and BFD configuration.

Here synchronizing means communicating with dpd, uplinkd, and mgd in order to set up state that has been configured in the control plane. At the present time this is done in both Nexus RPWs as well as in sled-agent early networking code. The split is because while we originally wanted all this code to live in Nexus, the rack requires the network to be functional for both initialization and cold boot. So a lot of functionality has been duplicated. Beyond complex logic duplication, there are several fundamental problems with the way things currently are described, including #9700, #9708, and #9954, as well as long-standing issues with the implementation and complexity of the relevant Nexus RPWs like #8579.

The plan to address this is to move all the system-level networking machinery to a set of reconcilers in sled-agent that operate based on data in the bootstore. There will still be a system networking RPW in Nexus, but its responsibility will stop at updating the data in the bootstore. The work of synchronizing that configuration to the various networking daemons will be the responsibility of sled-agent, specifically the sled-agent instances on the scrimlets. More specifically:

  • The control plane (Nexus/CockroachDB) remains the source of truth for all this data (configuration of external networking, which system zones exist, whatever NAT configuration those zones require, etc.)
  • Nexus propagates this rack-wide networking configuration to all sled agents. (It’s not enough to propagate it just to scrimlets because sleds can always be swapped and we need whatever sled is put in the scrimlet position to have this configuration.)
  • Sled agents persist this configuration. (This is already true of the bootstore today.)
  • A set of reconcilers within sled-agent on scrimlets only constantly attempts to realize this configuration in their own local switch zone services.
  • All of this only applies to “system networking”. Instance networking is still driven from Nexus directly to switch zone services.

This approach ensures:

  • Sled agents on scrimlets have the configuration that they require to configure switch zone services for both external networking and system zones (like NTP) even when the control plane is not online. In other words: rack cold start still works, including bringing up NTP and synchronizing clocks.
  • There is no overlap in responsibility between Nexus and sled agents. They do not make the same API calls to switch zone services (at least, not for the same data – Nexus is still responsible for configuring switch zone services with NAT, etc. for customer instances).
  • Sled agents no longer call switch zone services on other sleds, which fixes a major upgrade risk uncovered in dpd, MGS APIs marked server-side-versioned, but that's not valid #9708.

I'll add implementation tasks as sub-issues. (This list will certainly be incomplete at first, and will grow as we get into the implementation details.)


Background reading:

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions