Skip to content

add image source to OmicronZoneConfig#7555

Merged
iliana merged 4 commits intomainfrom
iliana/omicron-zone-image-source
Feb 20, 2025
Merged

add image source to OmicronZoneConfig#7555
iliana merged 4 commits intomainfrom
iliana/omicron-zone-image-source

Conversation

@iliana
Copy link
Contributor

@iliana iliana commented Feb 18, 2025

This adds the necessary bit of plumbing to be able to work on #7281 from both the Reconfigurator and Sled Agent sides simultaneously.

Copy link
Collaborator

@davepacheco davepacheco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick turnaround! I have just a few suggestions, none of them blockers.

zone::MockZones,
};

use nexus_sled_agent_shared::inventory::OmicronZoneImageSource;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm all for having this PR not try to also fix sled agent to honor this field at the same time, but what do you think about having it reject any use of OmicronZoneImageSource::Artifact in the meantime?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might make it more difficult to test the Nexus side of it if omicron_zones_put is rejecting anything using the Artifact variant, and I'm not sure I want to figure out how to word and wire up a new error variant to rip out in an upcoming PR.

pub enum OmicronZoneImageSource {
InstallDataset,
Artifact { hash: ArtifactHash },
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be more of a question about the repo depot API - can sled-agent check that the hash provided here matches the zone type? E.g., I'd expect if I PUT /omicron-zones with a nonsense hash I'd get back an error that sled-agent has no such artifact, but ideally I'd also expect sled-agent to return an error if I tell it to run a Nexus zone with an artifact hash that points to an Oximeter zone image.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sled Agent doesn't currently know what type of artifact is what but I think it probably makes sense to either tell Sled Agent what the artifact types are (this could be added to the artifact config introduced in #7519) or have Sled Agent quickly inspect the artifact to make sure it looks like a zone (tar.gz, oxide.json is present with the correct type and perhaps even correct zone name).

@iliana
Copy link
Contributor Author

iliana commented Feb 19, 2025

Hmm. Does it make more sense to update the representative blueprint files (e.g. madrid-sled14.json) or to implement a default value while deserializing OmicronZoneConfig? I assume the latter?

@davepacheco
Copy link
Collaborator

Hmm. Does it make more sense to update the representative blueprint files (e.g. madrid-sled14.json) or to implement a default value while deserializing OmicronZoneConfig? I assume the latter?

I could go either way on this. In the limit, we'll need something better than those files to achieve what they're doing because we want to be able to change the blueprint incompatibly without having to support older versions. But I'm not sure you want to build a replacement for whatever that right now.

@jgallagher
Copy link
Contributor

Hmm. Does it make more sense to update the representative blueprint files (e.g. madrid-sled14.json) or to implement a default value while deserializing OmicronZoneConfig? I assume the latter?

I think this isn't limited to those tests, right? sled-agent has ledgered OmicronZoneConfigs on disk that we have to be able to read, which I think means we must provide a default value (or have some other kind of backcompat story to fill that in across releases).

@davepacheco
Copy link
Collaborator

Hmm. Does it make more sense to update the representative blueprint files (e.g. madrid-sled14.json) or to implement a default value while deserializing OmicronZoneConfig? I assume the latter?

I think this isn't limited to those tests, right? sled-agent has ledgered OmicronZoneConfigs on disk that we have to be able to read, which I think means we must provide a default value (or have some other kind of backcompat story to fill that in across releases).

Yes, at load-time, we should fill those in.

It'd be nice if we could drop that in the next release. Maybe we could have sled agent rewrite this ledger after loading it?

@jgallagher
Copy link
Contributor

If we keep the standard release procedure of "generate a new blueprint and execute it after upgrade", we might get them filled in for free if we make any other changes that will necessitate a bumped OmicronZonesConfig (e.g., #7214, which isn't at the top of my list at the moment but might bubble up there in the next few weeks).

Copy link
Contributor

@sunshowers sunshowers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is lovely, thank you!

@sunshowers
Copy link
Contributor

If we keep the standard release procedure of "generate a new blueprint and execute it after upgrade", we might get them filled in for free if we make any other changes that will necessitate a bumped OmicronZonesConfig (e.g., #7214, which isn't at the top of my list at the moment but might bubble up there in the next few weeks).

Does this suggest we should keep at least one json fixture around that's representative of what customer systems would have as of the current release?

@jgallagher
Copy link
Contributor

If we keep the standard release procedure of "generate a new blueprint and execute it after upgrade", we might get them filled in for free if we make any other changes that will necessitate a bumped OmicronZonesConfig (e.g., #7214, which isn't at the top of my list at the moment but might bubble up there in the next few weeks).

Does this suggest we should keep at least one json fixture around that's representative of what customer systems would have as of the current release?

That's a good idea. Maybe we could grab the ledgers from each sled after updates to the colo rack?

@sunshowers
Copy link
Contributor

That's a good idea. Maybe we could grab the ledgers from each sled after updates to the colo rack?

That sounds good to me -- in general this kind of coverage across versions is quite difficult, so having an easy way to do this (having json fixtures) is valuable.

@iliana iliana merged commit 5e55b15 into main Feb 20, 2025
17 checks passed
@iliana iliana deleted the iliana/omicron-zone-image-source branch February 20, 2025 21:51
hawkw pushed a commit that referenced this pull request Feb 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants