-
Notifications
You must be signed in to change notification settings - Fork 450
MCO-1898: MCS serves image-aware first-boot config #5357
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@dkhater-redhat: This pull request references MCO-1898 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dkhater-redhat The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
ebca2ba
to
a4fbb2c
Compare
@dkhater-redhat: This pull request references MCO-1898 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@dkhater-redhat: This pull request references MCO-1898 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@dkhater-redhat: This pull request references MCO-1898 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@dkhater-redhat: This pull request references MCO-1898 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@dkhater-redhat: This pull request references MCO-1898 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@dkhater-redhat: This pull request references MCO-1898 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@dkhater-redhat: This pull request references MCO-1898 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
7023ffe
to
c35accf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good! Just left a few questions/cleanups that we could do 😄
Could we also squash the second commit?
mosbList, err := cs.machineOSBuildLister.List(labels.Everything()) | ||
if err != nil { | ||
klog.Infof("Could not list MachineOSBuilds for pool %s: %v", pool.Name, err) | ||
return "" | ||
} | ||
|
||
var currentConf string | ||
if pool.Status.UpdatedMachineCount > 0 { | ||
currentConf = pool.Spec.Configuration.Name | ||
} else { | ||
currentConf = pool.Status.Configuration.Name | ||
} | ||
|
||
var mosb *mcfgv1.MachineOSBuild | ||
for _, build := range mosbList { | ||
if build.Spec.MachineOSConfig.Name == mosc.Name && | ||
build.Spec.MachineConfig.Name == currentConf { | ||
mosb = build | ||
break | ||
} | ||
} | ||
|
||
if mosb == nil { | ||
klog.Infof("Pool %s has MachineOSConfig but no matching MachineOSBuild for MC %s", pool.Name, currentConf) | ||
return "" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't we use the MOSB ref field from the MOSC object directly in the lister? Why do we need to step through the whole list? Sorry if I'm missing something here!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right that we could simplify this, but there's an important nuance here. The current-machine-os-build annotation does track the "latest successful build" but the code in cluster_server.go needs to find the MOSB for a specific rendered MachineConfig that depends on the pool's rollout state. The key issue is that there can be multiple MOSBs for a single MOSC (one per rendered MachineConfig), and the code in cluster_server.go needs to find the MOSB for a specific rendered config (either pool.Spec.Configuration.Name or pool.Status.Configuration.Name).
For example, during a rollout you might have:
worker-abc123 (MOSB for old rendered-worker-1) - successful
worker-def456 (MOSB for new rendered-worker-2) - newly successful
During a rollout, there could be two successful MOSBs (one for the old config, one for the new), and mosc.Status.MachineOSBuild would point to the newer one. But if no nodes have started updating, we need to serve the image for the old config, not the new one.
So if someone scaled a node during a MOSB update (pre-node boot), this would be a safer approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack, that makes sense to me! I guess in this case, the node would go through another reboot, once the new image build is complete. I suppose it doesn't make sense to wait to serve the new node until the MOSB is done building?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
im not sure that i can block it at the boostrap level
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I follow - couldn't the MCS just not respond to the new node until the image is ready? I'd imagine the new node will just keep sending requests till it gets a response from the MCS?
I'm also okay with handling this case as a follow-up, its just not clear to me how serving an older image is any better than serving this node a legacy MachineConfig. It would just result in an additional reboot for the node once the build is complete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from my experience with this, it would block the node scale-up. i havent monitored the node logs closely in this case (during scale up) but when i did (during debugging) I never saw any retries. the console would just stop showing logs. so intuitively speaking, if we have the MCS block waiting for the new MOSB finish, we will time out and the node boot wil fail. This is not something im confident about and will need to follow up with jerry/try it out myself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, I think its worth experimenting in a different card, its definitely not a blocking issue for me!
In my experience, the node should continually try if the MCS doesn't respond to its requests. If the MCS does serve it something that is not usable(such as internal registry image), it will bork the node, yeah. And I also don't think we'd want to the MCS block while processing the request until the build is complete; it can silently exit the request(perhaps we just log for debugging) if the currently ref'd MOSB is building. When the build does complete later, a node request after that can be responded to. I might be making some bad assumptions here, so it is definitely worth testing 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to pitch something slightly different, which is that we should potentially match the existing behaviour of node scale-up for non-on cluster image mode nodes.
For the "regular" MCS, we serve the new version iff the new render successfully rolled out to one node. To match this behaviour, we should potentially increase the requirement the other way and only roll out the build if it's successful and any node has updated to it.
This will cause more reboots generally but comes with safety for node scaling while updates are happening. We should do it as a followup regardless, but just wanted to add that there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you @yuqi-zhang - @djoshy, if its ok with you, im going to write up and test the proposed thoughts around this and, if successful, add to a follow-up PR. This work (as-is) will get the core functionality in, and we can add the modifications later (considering this is for an edge case[s]). LMK if that sounds ok with you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack, I think that makes sense to me! That does make it cleanly fit into our existing methodology.
@dkhater-redhat Could I trouble you to write a briefer version of this comment above this block of code to explain why we don't directly use the MOSB ref? I know it would be helpful for future me 😄 And perhaps make a card for the follow-up work and link it there as a TODO too?
EDIT: @dkhater-redhat ninja'd me, but I'm good with your proposal!
@dkhater-redhat: This pull request references MCO-1898 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
yes! my plan is to squash everything when the team is ready to give an LGTM (incase i need to go back and debug) 😄 |
580bc72
to
cb5982e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice change, thanks for considering the feeback around the image registry URLs detection.
cb5982e
to
11ccab8
Compare
11ccab8
to
c17ebcd
Compare
@dkhater-redhat: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
- What I did
a.
machineconfiguration.openshift.io/currentImage
b.
machineconfiguration.openshift.io/desiredImage
c. alongside the existing MC/state keys when generating the initial node-annotations file (/etc/machine-config-daemon/node-annotations.json).
a. The server can now pass the resolved rendered OS image (e.g., MOSC/MOSB output) into appendNodeAnnotations(...) so new nodes start with authoritative image annotations.
Impact:
New nodes pivot/validate directly against the intended layered image during bootstrap. In image mode, new nodes successfully deploy the rendered image and can complete without an extra reboot when no reboot-requiring changes are present.
- How to verify it
MachineOSConfig
withrenderedImagePushSpec
pointing at your Quay repo/tag.oc scale machineset.machine.openshift.io -n openshift-machine-api dkhater-10-15-2025-a-cpb2n-worker-us-east-1c --replicas=1
Expected:
/etc/machine-config-daemon/currentimage
contains your Quay digest.rpm-ostree status
shows the same digest as the booted deployment.Example:
please note, if you use the internal image registry, you will see the legacy two node boots occur
- Description for the changelog
MCS now embeds current/desired image annotations in the initial node annotations at bootstrap. This makes the MCD pivot/validate directly against the rendered layered OS image. New nodes can complete image-mode provisioning without an unnecessary reboot when no reboot-requiring changes are present.