Skip to content

Commit 9cfae2c

Browse files
authored
Break EarlyNetworkConfig up to support cleaner bootstore type changes (#9944)
This PR is pretty widespread and touches some easily-broken bits, so I'll go into some background here. The replicated bootstore (soon to be backed by trust quorum instead, but that's not relevant to the context here) gives an eventually-consistent copy of data to all sleds. The type exposed by the bootstore is https://github.com/oxidecomputer/omicron/blob/7c56d21f5fb8127a619cb8557cbcc286e8ae0af7/bootstore/src/schemes/v0/storage.rs#L119-L126 `generation` ensures any write requests from an out-of-date Nexus can't overwrite newer changes. Prior to this PR, `blob` contains a JSON-serialized `EarlyNetworkConfig`, defined as https://github.com/oxidecomputer/omicron/blob/7c56d21f5fb8127a619cb8557cbcc286e8ae0af7/sled-agent/types/versions/src/bgp_v6/early_networking.rs#L23-L35 `EarlyNetworkConfigBody` contains a variety of networking configuration information required to bring up connectivity on a rack cold boot, including BGP details, which switches have transceivers and in which slots, etc. This type is quite complex, and historically changing it has been quite painful - addressing that is the point of this PR and #9801 in general. This PR does not change `NetworkConfig` or `EarlyNetworkConfigBody` - the focus is on `EarlyNetworkConfig`. It has a couple of problems: * `schema_version` is supposed to tell us what version of `body` is present, but it's defined _in line_ with the body. That means any time we rev a new version of `EarlyNetworkConfigBody`, we also have to rev a new version of `EarlyNetworkConfig`, and don't have an opportunity to inspect `schema_version` before already needing to know how to deserialize `body`. * `generation` is duplicated with the `generation` in `NetworkConfig`. (This is not nearly as much of a problem in practice as the previous point, but is an opportunity for illegal states - what would it mean for a `NetworkConfig` at generation N to hold a blobified `EarlyNetworkConfig` with a different generation?) `EarlyNetworkConfig` is used in three places on `main`: * Serialized as `blob` in the bootstore as described above * Serialized as `data` in the `bootstore_config` CRDB table (this has the same "duplicated generation" problem as `NetworkConfig` - this table also stores `generation` as a separate column next to `data`) * As the type for the `write_network_bootstore_config()` sled-agent OpenAPI This PR breaks `EarlyNetworkConfig` up into two types to address the problems above. The serialized form is now `EarlyNetworkConfigEnvelope`: https://github.com/oxidecomputer/omicron/blob/a8de498858cbeb2a58ebf835ccb11b21536de59b/sled-agent/types/versions/src/bootstore_versioning/early_networking.rs#L45-L55 This has `schema_version` and an _opaque_ `body` (typed as `serde_json::Value`). This fixes both problems with `EarlyNetworkConfig`: * `EarlyNetworkConfigEnvelope` does not have to have a new type revved any time `EarlyNetworkConfigBody`, because `body` is opaque. We can deserialize the envelope, then inspect `schema_version` to know which version of `EarlyNetworkConfigBody` it contains. (My hope is we _never_ have to rev this type.) * No duplicated `generation`. This is fully backwards-compatible as far as _deserialization_ is concerned: any existing `EarlyNetworkConfig` can be safely deserialized as an `EarlyNetworkConfigEnvelope`: * `schema_version` is unchanged * `generation` will be ignored * `body` will be read as a `serde_json::Value` instead of an `EarlyNetworkConfigBody` `EarlyNetworkConfigEnvelope` does not implement `JsonSchema`, because it should not be used in HTTP / OpenAPI contexts; it's only meant to be a serialization wrapper. That brings us to the second type added in this PR, `WriteNetworkConfigRequest`: https://github.com/oxidecomputer/omicron/blob/a8de498858cbeb2a58ebf835ccb11b21536de59b/sled-agent/types/versions/src/bootstore_versioning/early_networking.rs#L38-L42 This is now the type accepted by sled-agent's `write_network_bootstore_config()` endpoint. sled-agent will convert this request into a `bootstore::NetworkConfig` in the straightforward way: 1. Wrap `body` in an `EarlyNetworkConfigEnvelope` 2. Serialize the now-wrapped `body` as `NetworkConfig::blob` 3. Include `generation` as `NetworkConfig::generation` I believe this gets us most of the way to #9801. I want to do an actual rev of `EarlyNetworkConfigBody` to confirm this puts us in a good place (and also work on the mechanics of ensuring that any revs made are required to update the relevant bits of implementation that need to be updated, such as `EarlyNetworkConfigEnvelope` knowing how to deserialize the new body type) before closing the issue. I'll also do some update testing on a racklette to confirm I didn't break anything w.r.t. backwards compatibility, but I believe all of these changes should be safe. This PR also fixes #9943: sled-agent should never "upconvert" `EarlyNetworkConfigBody` requests from Nexus into a newer version; it needs to replicate the exact version it's given.
1 parent 9dd2309 commit 9cfae2c

File tree

31 files changed

+1013
-587
lines changed

31 files changed

+1013
-587
lines changed

Cargo.lock

Lines changed: 3 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

clients/sled-agent-client/src/lib.rs

Lines changed: 14 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -29,17 +29,7 @@ progenitor::generate_api!(
2929
}),
3030
derives = [schemars::JsonSchema, PartialEq],
3131
patch = {
32-
BfdPeerConfig = { derives = [Eq, Hash] },
33-
BgpConfig = { derives = [Eq, Hash] },
34-
BgpPeerConfig = { derives = [Eq, Hash] },
35-
MaxPathConfig = { derives = [Eq, Hash] },
36-
LldpPortConfig = { derives = [Eq, Hash, PartialOrd, Ord] },
37-
TxEqConfig = { derives = [Eq, Hash] },
3832
OmicronPhysicalDiskConfig = { derives = [Eq, Hash, PartialOrd, Ord] },
39-
PortConfig = { derives = [Eq, Hash] },
40-
RouteConfig = { derives = [Eq, Hash] },
41-
RouterLifetimeConfig = { derives = [Eq, Hash] },
42-
UplinkAddressConfig = { derives = [Eq, Hash] },
4333
VirtualNetworkInterfaceHost = { derives = [Eq, Hash] },
4434
},
4535
crates = {
@@ -50,6 +40,9 @@ progenitor::generate_api!(
5040
Attestation = sled_agent_types_versions::latest::rot::Attestation,
5141
Baseboard = sled_agent_types_versions::latest::inventory::Baseboard,
5242
BaseboardId = sled_hardware_types::BaseboardId,
43+
BfdPeerConfig = sled_agent_types_versions::latest::early_networking::BfdPeerConfig,
44+
BgpConfig = sled_agent_types_versions::latest::early_networking::BgpConfig,
45+
BgpPeerConfig = sled_agent_types_versions::latest::early_networking::BgpPeerConfig,
5346
ByteCount = omicron_common::api::external::ByteCount,
5447
CertificateChain = sled_agent_types_versions::latest::rot::CertificateChain,
5548
CommitRequest = trust_quorum_types::messages::CommitRequest,
@@ -62,6 +55,7 @@ progenitor::generate_api!(
6255
DiskManagementStatus = omicron_common::disk::DiskManagementStatus,
6356
DiskManagementError = omicron_common::disk::DiskManagementError,
6457
DiskVariant = omicron_common::disk::DiskVariant,
58+
EarlyNetworkConfigBody = sled_agent_types_versions::latest::early_networking::EarlyNetworkConfigBody,
6559
Epoch = trust_quorum_types::types::Epoch,
6660
ExternalIpGatewayMap = omicron_common::api::internal::shared::ExternalIpGatewayMap,
6761
ExternalIpConfig = omicron_common::api::internal::shared::ExternalIpConfig,
@@ -73,8 +67,11 @@ progenitor::generate_api!(
7367
Inventory = sled_agent_types_versions::latest::inventory::Inventory,
7468
InventoryDisk = sled_agent_types_versions::latest::inventory::InventoryDisk,
7569
InventoryZpool = sled_agent_types_versions::latest::inventory::InventoryZpool,
70+
LldpAdminStatus = sled_agent_types_versions::latest::early_networking::LldpAdminStatus,
71+
LldpPortConfig = sled_agent_types_versions::latest::early_networking::LldpPortConfig,
7672
LrtqUpgradeMsg = trust_quorum_types::messages::LrtqUpgradeMsg,
7773
MacAddr = omicron_common::api::external::MacAddr,
74+
MaxPathConfig = sled_agent_types_versions::latest::early_networking::MaxPathConfig,
7875
Measurement = sled_agent_types_versions::latest::rot::Measurement,
7976
MeasurementLog = sled_agent_types_versions::latest::rot::MeasurementLog,
8077
MupdateOverrideBootInventory = sled_agent_types_versions::latest::inventory::MupdateOverrideBootInventory,
@@ -89,24 +86,31 @@ progenitor::generate_api!(
8986
OmicronZoneImageSource = sled_agent_types_versions::latest::inventory::OmicronZoneImageSource,
9087
OmicronZoneType = sled_agent_types_versions::latest::inventory::OmicronZoneType,
9188
OmicronZonesConfig = sled_agent_types_versions::latest::inventory::OmicronZonesConfig,
89+
PortConfig = sled_agent_types_versions::latest::early_networking::PortConfig,
9290
PortFec = sled_agent_types_versions::latest::early_networking::PortFec,
9391
PortSpeed = sled_agent_types_versions::latest::early_networking::PortSpeed,
9492
PrepareAndCommitRequest = trust_quorum_types::messages::PrepareAndCommitRequest,
93+
RackNetworkConfig = sled_agent_types_versions::latest::early_networking::RackNetworkConfig,
9594
ReconfigureMsg = trust_quorum_types::messages::ReconfigureMsg,
9695
ResolvedVpcFirewallRule = omicron_common::api::internal::shared::ResolvedVpcFirewallRule,
9796
ResolvedVpcRoute = omicron_common::api::internal::shared::ResolvedVpcRoute,
9897
ResolvedVpcRouteSet = omicron_common::api::internal::shared::ResolvedVpcRouteSet,
9998
Rot = sled_agent_types_versions::latest::rot::Rot,
99+
RouteConfig = sled_agent_types_versions::latest::early_networking::RouteConfig,
100100
RouterId = omicron_common::api::internal::shared::RouterId,
101+
RouterLifetimeConfig = sled_agent_types_versions::latest::early_networking::RouterLifetimeConfig,
101102
RouterTarget = omicron_common::api::internal::shared::RouterTarget,
102103
RouterVersion = omicron_common::api::internal::shared::RouterVersion,
103104
Sha3_256Digest = sled_agent_types_versions::latest::rot::Sha3_256Digest,
104105
SledRole = sled_agent_types_versions::latest::inventory::SledRole,
105106
SourceNatConfigGeneric = omicron_common::api::internal::shared::SourceNatConfigGeneric,
106107
SwitchLocation = sled_agent_types_versions::latest::early_networking::SwitchLocation,
107108
Threshold = trust_quorum_types::types::Threshold,
109+
TxEqConfig = sled_agent_types_versions::latest::early_networking::TxEqConfig,
110+
UplinkAddressConfig = sled_agent_types_versions::latest::early_networking::UplinkAddressConfig,
108111
Vni = omicron_common::api::external::Vni,
109112
VpcFirewallIcmpFilter = omicron_common::api::external::VpcFirewallIcmpFilter,
113+
WriteNetworkConfigRequest = sled_agent_types_versions::latest::early_networking::WriteNetworkConfigRequest,
110114
ZpoolKind = omicron_common::zpool_name::ZpoolKind,
111115
ZpoolName = omicron_common::zpool_name::ZpoolName,
112116
}

nexus/db-model/src/bfd.rs

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -50,11 +50,20 @@ impl From<sled_agent_types::early_networking::BfdMode> for BfdMode {
5050
fn from(value: sled_agent_types::early_networking::BfdMode) -> Self {
5151
match value {
5252
sled_agent_types::early_networking::BfdMode::SingleHop => {
53-
BfdMode::SingleHop
53+
Self::SingleHop
5454
}
5555
sled_agent_types::early_networking::BfdMode::MultiHop => {
56-
BfdMode::MultiHop
56+
Self::MultiHop
5757
}
5858
}
5959
}
6060
}
61+
62+
impl From<BfdMode> for sled_agent_types::early_networking::BfdMode {
63+
fn from(value: BfdMode) -> Self {
64+
match value {
65+
BfdMode::SingleHop => Self::SingleHop,
66+
BfdMode::MultiHop => Self::MultiHop,
67+
}
68+
}
69+
}

nexus/db-model/src/bgp.rs

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,14 @@ use nexus_types::identity::Resource;
1313
use omicron_common::api::external::Error;
1414
use omicron_common::api::external::IdentityMetadataCreateParams;
1515
use serde::{Deserialize, Serialize};
16+
use sled_agent_types::early_networking::BgpPeerConfig;
17+
use sled_agent_types::early_networking::ImportExportPolicy;
1618
use sled_agent_types::early_networking::MaxPathConfig;
19+
use sled_agent_types::early_networking::RouterLifetimeConfig;
20+
use sled_agent_types::early_networking::RouterLifetimeConfigError;
1721
use slog_error_chain::InlineErrorChain;
22+
use std::net::IpAddr;
23+
use std::net::Ipv6Addr;
1824
use uuid::Uuid;
1925

2026
#[derive(
@@ -160,3 +166,60 @@ pub struct BgpPeerView {
160166
pub vlan_id: Option<SqlU16>,
161167
pub router_lifetime: SqlU16,
162168
}
169+
170+
#[derive(Debug, thiserror::Error)]
171+
pub enum BgpPeerConfigDataError {
172+
#[error("database contains illegal router lifetime value")]
173+
RouterLifetime(#[source] RouterLifetimeConfigError),
174+
#[error("database contains illegal min_ttl value: {0}")]
175+
MinTtl(u32),
176+
}
177+
178+
impl TryFrom<BgpPeerView> for BgpPeerConfig {
179+
type Error = BgpPeerConfigDataError;
180+
181+
fn try_from(value: BgpPeerView) -> Result<Self, Self::Error> {
182+
// For unnumbered peers (addr is None), use UNSPECIFIED
183+
let addr = match value.addr {
184+
None => IpAddr::V6(Ipv6Addr::UNSPECIFIED),
185+
Some(addr) => addr.ip(),
186+
};
187+
188+
// TODO-correctness We should have db constraints to ensure these can't
189+
// fail.
190+
let router_lifetime =
191+
RouterLifetimeConfig::new(value.router_lifetime.0)
192+
.map_err(BgpPeerConfigDataError::RouterLifetime)?;
193+
let min_ttl = value
194+
.min_ttl
195+
.map(|val| {
196+
u8::try_from(*val)
197+
.map_err(|_| BgpPeerConfigDataError::MinTtl(*val))
198+
})
199+
.transpose()?;
200+
201+
Ok(Self {
202+
asn: *value.asn,
203+
port: value.port_name,
204+
addr,
205+
hold_time: Some(value.hold_time.0.into()),
206+
idle_hold_time: Some(value.idle_hold_time.0.into()),
207+
delay_open: Some(value.delay_open.0.into()),
208+
connect_retry: Some(value.connect_retry.0.into()),
209+
keepalive: Some(value.keepalive.0.into()),
210+
enforce_first_as: value.enforce_first_as,
211+
local_pref: value.local_pref.map(|x| x.into()),
212+
md5_auth_key: value.md5_auth_key,
213+
min_ttl,
214+
multi_exit_discriminator: value
215+
.multi_exit_discriminator
216+
.map(|x| x.into()),
217+
remote_asn: value.remote_asn.map(|x| x.into()),
218+
communities: Vec::new(),
219+
allowed_export: ImportExportPolicy::NoFiltering,
220+
allowed_import: ImportExportPolicy::NoFiltering,
221+
vlan_id: value.vlan_id.map(|x| x.0),
222+
router_lifetime,
223+
})
224+
}
225+
}

nexus/mgs-updates/Cargo.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,5 +52,6 @@ rand.workspace = true
5252
repo-depot-api.workspace = true
5353
sled-agent-api.workspace = true
5454
sled-agent-types.workspace = true
55+
sled-agent-types-versions.workspace = true
5556
sled-diagnostics.workspace = true
5657
sp-sim.workspace = true

nexus/mgs-updates/src/test_util/host_phase_2_test_state.rs

Lines changed: 23 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -177,7 +177,6 @@ impl HostPhase2SledAgentContext {
177177
struct HostPhase2SledAgentImpl;
178178

179179
mod api_impl {
180-
181180
use super::HostPhase2SledAgentContext;
182181
use super::HostPhase2SledAgentImpl;
183182
use camino::Utf8PathBuf;
@@ -224,7 +223,6 @@ mod api_impl {
224223
use sled_agent_types::diagnostics::SledDiagnosticsLogsDownloadQueryParam;
225224
use sled_agent_types::disk::DiskEnsureBody;
226225
use sled_agent_types::disk::DiskPathParam;
227-
use sled_agent_types::early_networking::EarlyNetworkConfig;
228226
use sled_agent_types::firewall_rules::VpcFirewallRulesEnsureBody;
229227
use sled_agent_types::instance::InstanceEnsureBody;
230228
use sled_agent_types::instance::InstanceExternalIpBody;
@@ -269,6 +267,9 @@ mod api_impl {
269267
use sled_agent_types::zone_bundle::ZoneBundleId;
270268
use sled_agent_types::zone_bundle::ZoneBundleMetadata;
271269
use sled_agent_types::zone_bundle::ZonePathParam;
270+
use sled_agent_types_versions::v1;
271+
use sled_agent_types_versions::v20;
272+
use sled_agent_types_versions::v25;
272273
use sled_diagnostics::SledDiagnosticsQueryOutput;
273274
use std::collections::BTreeMap;
274275
use std::collections::BTreeSet;
@@ -761,13 +762,30 @@ mod api_impl {
761762

762763
async fn read_network_bootstore_config_cache(
763764
_rqctx: RequestContext<Self::Context>,
764-
) -> Result<HttpResponseOk<EarlyNetworkConfig>, HttpError> {
765+
) -> Result<
766+
HttpResponseOk<v20::early_networking::EarlyNetworkConfig>,
767+
HttpError,
768+
> {
769+
unimplemented!()
770+
}
771+
772+
async fn write_network_bootstore_config_v25(
773+
_rqctx: RequestContext<Self::Context>,
774+
_body: TypedBody<v25::early_networking::WriteNetworkConfigRequest>,
775+
) -> Result<HttpResponseUpdatedNoContent, HttpError> {
776+
unimplemented!()
777+
}
778+
779+
async fn write_network_bootstore_config_v20(
780+
_rqctx: RequestContext<Self::Context>,
781+
_body: TypedBody<v20::early_networking::EarlyNetworkConfig>,
782+
) -> Result<HttpResponseUpdatedNoContent, HttpError> {
765783
unimplemented!()
766784
}
767785

768-
async fn write_network_bootstore_config(
786+
async fn write_network_bootstore_config_v1(
769787
_rqctx: RequestContext<Self::Context>,
770-
_body: TypedBody<EarlyNetworkConfig>,
788+
_body: TypedBody<v1::early_networking::EarlyNetworkConfig>,
771789
) -> Result<HttpResponseUpdatedNoContent, HttpError> {
772790
unimplemented!()
773791
}

0 commit comments

Comments
 (0)