MCO-1898: MCS serves image-aware first-boot config #5357

dkhater-redhat · 2025-10-15T20:44:30Z

- What I did

Taught the MCS to write image annotations at bootstrap.
Extended appendNodeAnnotations(...) in pkg/server/server.go to optionally include:
a. machineconfiguration.openshift.io/currentImage
b. machineconfiguration.openshift.io/desiredImage
c. alongside the existing MC/state keys when generating the initial node-annotations file (/etc/machine-config-daemon/node-annotations.json).
Added the image into the node-annotations appender.
a. The server can now pass the resolved rendered OS image (e.g., MOSC/MOSB output) into appendNodeAnnotations(...) so new nodes start with authoritative image annotations.

Impact:
New nodes pivot/validate directly against the intended layered image during bootstrap. In image mode, new nodes successfully deploy the rendered image and can complete without an extra reboot when no reboot-requiring changes are present.

- How to verify it

Create a Quay push secret in the MCO namespace
Apply a MachineOSConfig with renderedImagePushSpec pointing at your Quay repo/tag.
Opt nodes into image mode (e.g., via the pool opt-in label/annotation you use).
Scale up a worker so a new node is provisioned:
oc scale machineset.machine.openshift.io -n openshift-machine-api dkhater-10-15-2025-a-cpb2n-worker-us-east-1c --replicas=1
Observe the new node- it should pull and deploy the rendered image without reboot
Confirm on the node:

oc debug node/ip-10-0-85-7.ec2.internal
chroot /host
cat /etc/machine-config-daemon/currentimage
rpm-ostree status

Expected:

/etc/machine-config-daemon/currentimage contains your Quay digest.
rpm-ostree status shows the same digest as the booted deployment.
No node reboot occurs

Example:

$ cat /etc/machine-config-daemon/currentimage
quay.io/dkhater/image-mode-testing@sha256:d9dcf52c08a659561c36bd62a0a57fcec34aec2f71bce88cd68fd314ddf8db70

$ rpm-ostree status
* ostree-unverified-registry:quay.io/dkhater/image-mode-testing@sha256:d9dcf52c08a659561c36bd62a0a57fcec34aec2f71bce88cd68fd314ddf8db70
  Digest: sha256:d9dcf52c08a659561c36bd62a0a57fcec34aec2f71bce88cd68fd314ddf8db70
  Version: 9.6.20251013-1

please note, if you use the internal image registry, you will see the legacy two node boots occur
- Description for the changelog

MCS now embeds current/desired image annotations in the initial node annotations at bootstrap. This makes the MCD pivot/validate directly against the rendered layered OS image. New nodes can complete image-mode provisioning without an unnecessary reboot when no reboot-requiring changes are present.

openshift-ci-robot · 2025-10-15T20:44:38Z

@dkhater-redhat: This pull request references MCO-1898 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

- What I did

- How to verify it

- Description for the changelog

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci · 2025-10-15T20:45:58Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dkhater-redhat

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [dkhater-redhat]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci-robot · 2025-10-15T21:05:40Z

@dkhater-redhat: This pull request references MCO-1898 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

What I did

Taught the MCS to write image annotations at bootstrap.
Extended appendNodeAnnotations(...) in pkg/server/server.go to optionally include:

machineconfiguration.openshift.io/currentImage

machineconfiguration.openshift.io/desiredImage
alongside the existing MC/state keys when generating the initial node-annotations file (/etc/machine-config-daemon/node-annotations.json).

Plumbed the image into the node-annotations appender.
The server can now pass the resolved rendered OS image (e.g., MOSC/MOSB output) into appendNodeAnnotations(...) so new nodes start with authoritative image annotations.

Behavioral impact:

New nodes pivot/validate directly against the intended layered image during bootstrap (no need to read the encapsulated MC for image discovery).

In image mode, fresh nodes successfully deploy the rendered image and can complete without an extra reboot when no reboot-requiring changes are present.

How to verify it

Create a Quay push secret in the MCO namespace (redacted example below).

Apply a MachineOSConfig with renderedImagePushSpec pointing at your Quay repo/tag.

Opt nodes into image mode (e.g., via the pool opt-in label/annotation you use).

Scale up a worker so a new node is provisioned:

oc scale machineset.machine.openshift.io -n openshift-machine-api dkhater-10-15-2025-a-cpb2n-worker-us-east-1c --replicas=1

Observe the new node: it should pull and deploy the rendered image without reboot if no other changes require one.

Confirm on the node:

oc debug node/ip-10-0-85-7.ec2.internal
chroot /host
cat /etc/machine-config-daemon/currentimage
rpm-ostree status

Expected:

/etc/machine-config-daemon/currentimage contains your Quay digest.

rpm-ostree status shows the same digest as the booted deployment.

My run (example output):

$ cat /etc/machine-config-daemon/currentimage
quay.io/dkhater/image-mode-testing@sha256:d9dcf52c08a659561c36bd62a0a57fcec34aec2f71bce88cd68fd314ddf8db70

$ rpm-ostree status

ostree-unverified-registry:quay.io/dkhater/image-mode-testing@sha256:d9dcf52c08a659561c36bd62a0a57fcec34aec2f71bce88cd68fd314ddf8db70
Digest: sha256:d9dcf52c08a659561c36bd62a0a57fcec34aec2f71bce88cd68fd314ddf8db70
Version: 9.6.20251013-1

Description for the changelog

Machine Config Server now embeds current/desired image annotations in the initial node annotations at bootstrap.
This makes the MCD pivot/validate directly against the rendered layered OS image, improving determinism for image mode (MOSC/MOSB/OCB) and avoiding reliance on the encapsulated MC for image discovery. New nodes can complete image-mode provisioning without an unnecessary reboot when no reboot-requiring changes are present.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2025-10-15T21:05:59Z

@dkhater-redhat: This pull request references MCO-1898 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

Taught the MCS to write image annotations at bootstrap.
Extended appendNodeAnnotations(...) in pkg/server/server.go to optionally include:

machineconfiguration.openshift.io/currentImage

machineconfiguration.openshift.io/desiredImage
alongside the existing MC/state keys when generating the initial node-annotations file (/etc/machine-config-daemon/node-annotations.json).

Plumbed the image into the node-annotations appender.
The server can now pass the resolved rendered OS image (e.g., MOSC/MOSB output) into appendNodeAnnotations(...) so new nodes start with authoritative image annotations.

Behavioral impact:

New nodes pivot/validate directly against the intended layered image during bootstrap (no need to read the encapsulated MC for image discovery).

In image mode, fresh nodes successfully deploy the rendered image and can complete without an extra reboot when no reboot-requiring changes are present.

How to verify it

Create a Quay push secret in the MCO namespace (redacted example below).

Apply a MachineOSConfig with renderedImagePushSpec pointing at your Quay repo/tag.

Opt nodes into image mode (e.g., via the pool opt-in label/annotation you use).

Scale up a worker so a new node is provisioned:

oc scale machineset.machine.openshift.io -n openshift-machine-api dkhater-10-15-2025-a-cpb2n-worker-us-east-1c --replicas=1

Observe the new node: it should pull and deploy the rendered image without reboot if no other changes require one.

Confirm on the node:

oc debug node/ip-10-0-85-7.ec2.internal
chroot /host
cat /etc/machine-config-daemon/currentimage
rpm-ostree status

Expected:

/etc/machine-config-daemon/currentimage contains your Quay digest.

rpm-ostree status shows the same digest as the booted deployment.

My run (example output):

$ cat /etc/machine-config-daemon/currentimage
quay.io/dkhater/image-mode-testing@sha256:d9dcf52c08a659561c36bd62a0a57fcec34aec2f71bce88cd68fd314ddf8db70

$ rpm-ostree status

ostree-unverified-registry:quay.io/dkhater/image-mode-testing@sha256:d9dcf52c08a659561c36bd62a0a57fcec34aec2f71bce88cd68fd314ddf8db70
Digest: sha256:d9dcf52c08a659561c36bd62a0a57fcec34aec2f71bce88cd68fd314ddf8db70
Version: 9.6.20251013-1

Description for the changelog

Machine Config Server now embeds current/desired image annotations in the initial node annotations at bootstrap.
This makes the MCD pivot/validate directly against the rendered layered OS image, improving determinism for image mode (MOSC/MOSB/OCB) and avoiding reliance on the encapsulated MC for image discovery. New nodes can complete image-mode provisioning without an unnecessary reboot when no reboot-requiring changes are present.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2025-10-15T21:06:25Z

@dkhater-redhat: This pull request references MCO-1898 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

What I did

Taught the MCS to write image annotations at bootstrap.
Extended appendNodeAnnotations(...) in pkg/server/server.go to optionally include:

machineconfiguration.openshift.io/currentImage

machineconfiguration.openshift.io/desiredImage
alongside the existing MC/state keys when generating the initial node-annotations file (/etc/machine-config-daemon/node-annotations.json).

Plumbed the image into the node-annotations appender.
The server can now pass the resolved rendered OS image (e.g., MOSC/MOSB output) into appendNodeAnnotations(...) so new nodes start with authoritative image annotations.

Behavioral impact:

New nodes pivot/validate directly against the intended layered image during bootstrap (no need to read the encapsulated MC for image discovery).

In image mode, fresh nodes successfully deploy the rendered image and can complete without an extra reboot when no reboot-requiring changes are present.

How to verify it

Create a Quay push secret in the MCO namespace (redacted example below).

Apply a MachineOSConfig with renderedImagePushSpec pointing at your Quay repo/tag.

Opt nodes into image mode (e.g., via the pool opt-in label/annotation you use).

Scale up a worker so a new node is provisioned:

oc scale machineset.machine.openshift.io -n openshift-machine-api dkhater-10-15-2025-a-cpb2n-worker-us-east-1c --replicas=1

Observe the new node: it should pull and deploy the rendered image without reboot if no other changes require one.

Confirm on the node:

oc debug node/ip-10-0-85-7.ec2.internal
chroot /host
cat /etc/machine-config-daemon/currentimage
rpm-ostree status

Expected:

/etc/machine-config-daemon/currentimage contains your Quay digest.

rpm-ostree status shows the same digest as the booted deployment.

My run (example output):

$ cat /etc/machine-config-daemon/currentimage
quay.io/dkhater/image-mode-testing@sha256:d9dcf52c08a659561c36bd62a0a57fcec34aec2f71bce88cd68fd314ddf8db70

$ rpm-ostree status

ostree-unverified-registry:quay.io/dkhater/image-mode-testing@sha256:d9dcf52c08a659561c36bd62a0a57fcec34aec2f71bce88cd68fd314ddf8db70
Digest: sha256:d9dcf52c08a659561c36bd62a0a57fcec34aec2f71bce88cd68fd314ddf8db70
Version: 9.6.20251013-1

Description for the changelog

<Machine Config Server now embeds current/desired image annotations in the initial node annotations at bootstrap.
This makes the MCD pivot/validate directly against the rendered layered OS image, improving determinism for image mode (MOSC/MOSB/OCB) and avoiding reliance on the encapsulated MC for image discovery. New nodes can complete image-mode provisioning without an unnecessary reboot when no reboot-requiring changes are present.>

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2025-10-15T21:06:40Z

@dkhater-redhat: This pull request references MCO-1898 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

What I did

Taught the MCS to write image annotations at bootstrap.
Extended appendNodeAnnotations(...) in pkg/server/server.go to optionally include:

machineconfiguration.openshift.io/currentImage

machineconfiguration.openshift.io/desiredImage
alongside the existing MC/state keys when generating the initial node-annotations file (/etc/machine-config-daemon/node-annotations.json).

Plumbed the image into the node-annotations appender.
The server can now pass the resolved rendered OS image (e.g., MOSC/MOSB output) into appendNodeAnnotations(...) so new nodes start with authoritative image annotations.

Behavioral impact:

New nodes pivot/validate directly against the intended layered image during bootstrap (no need to read the encapsulated MC for image discovery).

In image mode, fresh nodes successfully deploy the rendered image and can complete without an extra reboot when no reboot-requiring changes are present.

How to verify it

Create a Quay push secret in the MCO namespace (redacted example below).

Apply a MachineOSConfig with renderedImagePushSpec pointing at your Quay repo/tag.

Opt nodes into image mode (e.g., via the pool opt-in label/annotation you use).

Scale up a worker so a new node is provisioned:

oc scale machineset.machine.openshift.io -n openshift-machine-api dkhater-10-15-2025-a-cpb2n-worker-us-east-1c --replicas=1

Observe the new node: it should pull and deploy the rendered image without reboot if no other changes require one.

Confirm on the node:

oc debug node/ip-10-0-85-7.ec2.internal
chroot /host
cat /etc/machine-config-daemon/currentimage
rpm-ostree status

Expected:

/etc/machine-config-daemon/currentimage contains your Quay digest.

rpm-ostree status shows the same digest as the booted deployment.

My run (example output):

$ cat /etc/machine-config-daemon/currentimage
quay.io/dkhater/image-mode-testing@sha256:d9dcf52c08a659561c36bd62a0a57fcec34aec2f71bce88cd68fd314ddf8db70

$ rpm-ostree status

ostree-unverified-registry:quay.io/dkhater/image-mode-testing@sha256:d9dcf52c08a659561c36bd62a0a57fcec34aec2f71bce88cd68fd314ddf8db70
Digest: sha256:d9dcf52c08a659561c36bd62a0a57fcec34aec2f71bce88cd68fd314ddf8db70
Version: 9.6.20251013-1

Description for the changelog

Machine Config Server now embeds current/desired image annotations in the initial node annotations at bootstrap.
This makes the MCD pivot/validate directly against the rendered layered OS image, improving determinism for image mode (MOSC/MOSB/OCB) and avoiding reliance on the encapsulated MC for image discovery. New nodes can complete image-mode provisioning without an unnecessary reboot when no reboot-requiring changes are present.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2025-10-15T21:07:44Z

@dkhater-redhat: This pull request references MCO-1898 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

- What I did

Taught the MCS to write image annotations at bootstrap.
Extended appendNodeAnnotations(...) in pkg/server/server.go to optionally include:

machineconfiguration.openshift.io/currentImage

machineconfiguration.openshift.io/desiredImage
alongside the existing MC/state keys when generating the initial node-annotations file (/etc/machine-config-daemon/node-annotations.json).

Plumbed the image into the node-annotations appender.
The server can now pass the resolved rendered OS image (e.g., MOSC/MOSB output) into appendNodeAnnotations(...) so new nodes start with authoritative image annotations.

Behavioral impact:

New nodes pivot/validate directly against the intended layered image during bootstrap (no need to read the encapsulated MC for image discovery).

In image mode, fresh nodes successfully deploy the rendered image and can complete without an extra reboot when no reboot-requiring changes are present.

- How to verify it

Create a Quay push secret in the MCO namespace (redacted example below).

Apply a MachineOSConfig with renderedImagePushSpec pointing at your Quay repo/tag.

Opt nodes into image mode (e.g., via the pool opt-in label/annotation you use).

Scale up a worker so a new node is provisioned:

oc scale machineset.machine.openshift.io -n openshift-machine-api dkhater-10-15-2025-a-cpb2n-worker-us-east-1c --replicas=1

Observe the new node: it should pull and deploy the rendered image without reboot if no other changes require one.

Confirm on the node:

oc debug node/ip-10-0-85-7.ec2.internal
chroot /host
cat /etc/machine-config-daemon/currentimage
rpm-ostree status

Expected:

/etc/machine-config-daemon/currentimage contains your Quay digest.

rpm-ostree status shows the same digest as the booted deployment.

My run (example output):

$ cat /etc/machine-config-daemon/currentimage
quay.io/dkhater/image-mode-testing@sha256:d9dcf52c08a659561c36bd62a0a57fcec34aec2f71bce88cd68fd314ddf8db70

$ rpm-ostree status

ostree-unverified-registry:quay.io/dkhater/image-mode-testing@sha256:d9dcf52c08a659561c36bd62a0a57fcec34aec2f71bce88cd68fd314ddf8db70
Digest: sha256:d9dcf52c08a659561c36bd62a0a57fcec34aec2f71bce88cd68fd314ddf8db70
Version: 9.6.20251013-1

- Description for the changelog

Machine Config Server now embeds current/desired image annotations in the initial node annotations at bootstrap.
This makes the MCD pivot/validate directly against the rendered layered OS image, improving determinism for image mode (MOSC/MOSB/OCB) and avoiding reliance on the encapsulated MC for image discovery. New nodes can complete image-mode provisioning without an unnecessary reboot when no reboot-requiring changes are present.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2025-10-15T21:20:56Z

@dkhater-redhat: This pull request references MCO-1898 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

- What I did

Taught the MCS to write image annotations at bootstrap.

Extended appendNodeAnnotations(...) in pkg/server/server.go to optionally include:
a. machineconfiguration.openshift.io/currentImage
b. machineconfiguration.openshift.io/desiredImage
c. alongside the existing MC/state keys when generating the initial node-annotations file (/etc/machine-config-daemon/node-annotations.json).

Added the image into the node-annotations appender.
a. The server can now pass the resolved rendered OS image (e.g., MOSC/MOSB output) into appendNodeAnnotations(...) so new nodes start with authoritative image annotations.

Impact:
New nodes pivot/validate directly against the intended layered image during bootstrap. In image mode, new nodes successfully deploy the rendered image and can complete without an extra reboot when no reboot-requiring changes are present.

- How to verify it

Create a Quay push secret in the MCO namespace

Apply a MachineOSConfig with renderedImagePushSpec pointing at your Quay repo/tag.

Opt nodes into image mode (e.g., via the pool opt-in label/annotation you use).

Scale up a worker so a new node is provisioned:
oc scale machineset.machine.openshift.io -n openshift-machine-api dkhater-10-15-2025-a-cpb2n-worker-us-east-1c --replicas=1

Observe the new node- it should pull and deploy the rendered image without reboot

Confirm on the node:
oc debug node/ip-10-0-85-7.ec2.internal
chroot /host
cat /etc/machine-config-daemon/currentimage
rpm-ostree status
Expected:

/etc/machine-config-daemon/currentimage contains your Quay digest.

rpm-ostree status shows the same digest as the booted deployment.

My run (example output):
$ cat /etc/machine-config-daemon/currentimage
quay.io/dkhater/image-mode-testing@sha256:d9dcf52c08a659561c36bd62a0a57fcec34aec2f71bce88cd68fd314ddf8db70
$ rpm-ostree status
* ostree-unverified-registry:quay.io/dkhater/image-mode-testing@sha256:d9dcf52c08a659561c36bd62a0a57fcec34aec2f71bce88cd68fd314ddf8db70
 Digest: sha256:d9dcf52c08a659561c36bd62a0a57fcec34aec2f71bce88cd68fd314ddf8db70
 Version: 9.6.20251013-1
- Description for the changelog

MCS now embeds current/desired image annotations in the initial node annotations at bootstrap. This makes the MCD pivot/validate directly against the rendered layered OS image. New nodes can complete image-mode provisioning without an unnecessary reboot when no reboot-requiring changes are present.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2025-10-15T21:21:41Z

@dkhater-redhat: This pull request references MCO-1898 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

- What I did

Taught the MCS to write image annotations at bootstrap.

Extended appendNodeAnnotations(...) in pkg/server/server.go to optionally include:
a. machineconfiguration.openshift.io/currentImage
b. machineconfiguration.openshift.io/desiredImage
c. alongside the existing MC/state keys when generating the initial node-annotations file (/etc/machine-config-daemon/node-annotations.json).

Added the image into the node-annotations appender.
a. The server can now pass the resolved rendered OS image (e.g., MOSC/MOSB output) into appendNodeAnnotations(...) so new nodes start with authoritative image annotations.

Impact:
New nodes pivot/validate directly against the intended layered image during bootstrap. In image mode, new nodes successfully deploy the rendered image and can complete without an extra reboot when no reboot-requiring changes are present.

- How to verify it

Create a Quay push secret in the MCO namespace

Apply a MachineOSConfig with renderedImagePushSpec pointing at your Quay repo/tag.

Opt nodes into image mode (e.g., via the pool opt-in label/annotation you use).

Scale up a worker so a new node is provisioned:
oc scale machineset.machine.openshift.io -n openshift-machine-api dkhater-10-15-2025-a-cpb2n-worker-us-east-1c --replicas=1

Observe the new node- it should pull and deploy the rendered image without reboot

Confirm on the node:
oc debug node/ip-10-0-85-7.ec2.internal
chroot /host
cat /etc/machine-config-daemon/currentimage
rpm-ostree status
Expected:

/etc/machine-config-daemon/currentimage contains your Quay digest.

rpm-ostree status shows the same digest as the booted deployment.

No node reboot occurs

Example:
$ cat /etc/machine-config-daemon/currentimage
quay.io/dkhater/image-mode-testing@sha256:d9dcf52c08a659561c36bd62a0a57fcec34aec2f71bce88cd68fd314ddf8db70
$ rpm-ostree status
* ostree-unverified-registry:quay.io/dkhater/image-mode-testing@sha256:d9dcf52c08a659561c36bd62a0a57fcec34aec2f71bce88cd68fd314ddf8db70
 Digest: sha256:d9dcf52c08a659561c36bd62a0a57fcec34aec2f71bce88cd68fd314ddf8db70
 Version: 9.6.20251013-1
- Description for the changelog

MCS now embeds current/desired image annotations in the initial node annotations at bootstrap. This makes the MCD pivot/validate directly against the rendered layered OS image. New nodes can complete image-mode provisioning without an unnecessary reboot when no reboot-requiring changes are present.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

djoshy

Generally looks good! Just left a few questions/cleanups that we could do 😄

Could we also squash the second commit?

pkg/server/cluster_server.go

djoshy · 2025-10-17T16:40:54Z

pkg/server/cluster_server.go

+	mosbList, err := cs.machineOSBuildLister.List(labels.Everything())
+	if err != nil {
+		klog.Infof("Could not list MachineOSBuilds for pool %s: %v", pool.Name, err)
+		return ""
+	}
+
+	var currentConf string
+	if pool.Status.UpdatedMachineCount > 0 {
+		currentConf = pool.Spec.Configuration.Name
+	} else {
+		currentConf = pool.Status.Configuration.Name
+	}
+
+	var mosb *mcfgv1.MachineOSBuild
+	for _, build := range mosbList {
+		if build.Spec.MachineOSConfig.Name == mosc.Name &&
+			build.Spec.MachineConfig.Name == currentConf {
+			mosb = build
+			break
+		}
+	}
+
+	if mosb == nil {
+		klog.Infof("Pool %s has MachineOSConfig but no matching MachineOSBuild for MC %s", pool.Name, currentConf)
+		return ""
+	}


Couldn't we use the MOSB ref field from the MOSC object directly in the lister? Why do we need to step through the whole list? Sorry if I'm missing something here!

You're right that we could simplify this, but there's an important nuance here. The current-machine-os-build annotation does track the "latest successful build" but the code in cluster_server.go needs to find the MOSB for a specific rendered MachineConfig that depends on the pool's rollout state. The key issue is that there can be multiple MOSBs for a single MOSC (one per rendered MachineConfig), and the code in cluster_server.go needs to find the MOSB for a specific rendered config (either pool.Spec.Configuration.Name or pool.Status.Configuration.Name).

For example, during a rollout you might have:

worker-abc123 (MOSB for old rendered-worker-1) - successful
worker-def456 (MOSB for new rendered-worker-2) - newly successful
During a rollout, there could be two successful MOSBs (one for the old config, one for the new), and mosc.Status.MachineOSBuild would point to the newer one. But if no nodes have started updating, we need to serve the image for the old config, not the new one.

So if someone scaled a node during a MOSB update (pre-node boot), this would be a safer approach.

ack, that makes sense to me! I guess in this case, the node would go through another reboot, once the new image build is complete. I suppose it doesn't make sense to wait to serve the new node until the MOSB is done building?

im not sure that i can block it at the boostrap level

I don't think I follow - couldn't the MCS just not respond to the new node until the image is ready? I'd imagine the new node will just keep sending requests till it gets a response from the MCS?

I'm also okay with handling this case as a follow-up, its just not clear to me how serving an older image is any better than serving this node a legacy MachineConfig. It would just result in an additional reboot for the node once the build is complete.

from my experience with this, it would block the node scale-up. i havent monitored the node logs closely in this case (during scale up) but when i did (during debugging) I never saw any retries. the console would just stop showing logs. so intuitively speaking, if we have the MCS block waiting for the new MOSB finish, we will time out and the node boot wil fail. This is not something im confident about and will need to follow up with jerry/try it out myself.

Agree, I think its worth experimenting in a different card, its definitely not a blocking issue for me!

In my experience, the node should continually try if the MCS doesn't respond to its requests. If the MCS does serve it something that is not usable(such as internal registry image), it will bork the node, yeah. And I also don't think we'd want to the MCS block while processing the request until the build is complete; it can silently exit the request(perhaps we just log for debugging) if the currently ref'd MOSB is building. When the build does complete later, a node request after that can be responded to. I might be making some bad assumptions here, so it is definitely worth testing 😅

I'd like to pitch something slightly different, which is that we should potentially match the existing behaviour of node scale-up for non-on cluster image mode nodes.

For the "regular" MCS, we serve the new version iff the new render successfully rolled out to one node. To match this behaviour, we should potentially increase the requirement the other way and only roll out the build if it's successful and any node has updated to it.

This will cause more reboots generally but comes with safety for node scaling while updates are happening. We should do it as a followup regardless, but just wanted to add that there.

thank you @yuqi-zhang - @djoshy, if its ok with you, im going to write up and test the proposed thoughts around this and, if successful, add to a follow-up PR. This work (as-is) will get the core functionality in, and we can add the modifications later (considering this is for an edge case[s]). LMK if that sounds ok with you?

ack, I think that makes sense to me! That does make it cleanly fit into our existing methodology.

@dkhater-redhat Could I trouble you to write a briefer version of this comment above this block of code to explain why we don't directly use the MOSB ref? I know it would be helpful for future me 😄 And perhaps make a card for the follow-up work and link it there as a TODO too?

EDIT: @dkhater-redhat ninja'd me, but I'm good with your proposal!

pkg/server/cluster_server.go

openshift-ci-robot · 2025-10-17T18:23:53Z

@dkhater-redhat: This pull request references MCO-1898 which is a valid jira issue.

In response to this:

- What I did

Taught the MCS to write image annotations at bootstrap.

Extended appendNodeAnnotations(...) in pkg/server/server.go to optionally include:
a. machineconfiguration.openshift.io/currentImage
b. machineconfiguration.openshift.io/desiredImage
c. alongside the existing MC/state keys when generating the initial node-annotations file (/etc/machine-config-daemon/node-annotations.json).

Added the image into the node-annotations appender.
a. The server can now pass the resolved rendered OS image (e.g., MOSC/MOSB output) into appendNodeAnnotations(...) so new nodes start with authoritative image annotations.

Impact:
New nodes pivot/validate directly against the intended layered image during bootstrap. In image mode, new nodes successfully deploy the rendered image and can complete without an extra reboot when no reboot-requiring changes are present.

- How to verify it

Create a Quay push secret in the MCO namespace

Apply a MachineOSConfig with renderedImagePushSpec pointing at your Quay repo/tag.

Opt nodes into image mode (e.g., via the pool opt-in label/annotation you use).

Scale up a worker so a new node is provisioned:
oc scale machineset.machine.openshift.io -n openshift-machine-api dkhater-10-15-2025-a-cpb2n-worker-us-east-1c --replicas=1

Observe the new node- it should pull and deploy the rendered image without reboot

Confirm on the node:
oc debug node/ip-10-0-85-7.ec2.internal
chroot /host
cat /etc/machine-config-daemon/currentimage
rpm-ostree status
Expected:

/etc/machine-config-daemon/currentimage contains your Quay digest.

rpm-ostree status shows the same digest as the booted deployment.

No node reboot occurs

Example:
$ cat /etc/machine-config-daemon/currentimage
quay.io/dkhater/image-mode-testing@sha256:d9dcf52c08a659561c36bd62a0a57fcec34aec2f71bce88cd68fd314ddf8db70
$ rpm-ostree status
* ostree-unverified-registry:quay.io/dkhater/image-mode-testing@sha256:d9dcf52c08a659561c36bd62a0a57fcec34aec2f71bce88cd68fd314ddf8db70
 Digest: sha256:d9dcf52c08a659561c36bd62a0a57fcec34aec2f71bce88cd68fd314ddf8db70
 Version: 9.6.20251013-1
please note, if you use the internal image registry, you will see the legacy two node boots occur
- Description for the changelog

MCS now embeds current/desired image annotations in the initial node annotations at bootstrap. This makes the MCD pivot/validate directly against the rendered layered OS image. New nodes can complete image-mode provisioning without an unnecessary reboot when no reboot-requiring changes are present.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

dkhater-redhat · 2025-10-17T18:37:07Z

Generally looks good! Just left a few questions/cleanups that we could do 😄

Could we also squash the second commit?

yes! my plan is to squash everything when the team is ready to give an LGTM (incase i need to go back and debug) 😄

pablintino

Nice change, thanks for considering the feeback around the image registry URLs detection.

pkg/controller/common/registry_utils.go

openshift-ci · 2025-10-22T23:38:43Z

@dkhater-redhat: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/bootstrap-unit	`c17ebcd`	link	false	`/test bootstrap-unit`
ci/prow/okd-scos-e2e-aws-ovn	`c17ebcd`	link	false	`/test okd-scos-e2e-aws-ovn`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Oct 15, 2025

openshift-ci bot requested review from cheesesashimi and yuqi-zhang October 15, 2025 20:45

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 15, 2025

dkhater-redhat force-pushed the minimize-reboot branch 3 times, most recently from ebca2ba to a4fbb2c Compare October 15, 2025 20:58

dkhater-redhat force-pushed the minimize-reboot branch 2 times, most recently from 7023ffe to c35accf Compare October 16, 2025 16:21

djoshy reviewed Oct 17, 2025

View reviewed changes

dkhater-redhat force-pushed the minimize-reboot branch from 580bc72 to cb5982e Compare October 21, 2025 18:03

pablintino reviewed Oct 22, 2025

View reviewed changes

pkg/controller/common/registry_utils.go Show resolved Hide resolved

pkg/controller/common/registry_utils.go Show resolved Hide resolved

dkhater-redhat force-pushed the minimize-reboot branch from cb5982e to 11ccab8 Compare October 22, 2025 17:10

minimizing reboot for scaling image mode enabled node

c17ebcd

dkhater-redhat force-pushed the minimize-reboot branch from 11ccab8 to c17ebcd Compare October 22, 2025 20:18

MCO-1898: MCS serves image-aware first-boot config #5357

Are you sure you want to change the base?

MCO-1898: MCS serves image-aware first-boot config #5357

Conversation

dkhater-redhat commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci-robot commented Oct 15, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci bot commented Oct 15, 2025

Uh oh!

openshift-ci-robot commented Oct 15, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci-robot commented Oct 15, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci-robot commented Oct 15, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci-robot commented Oct 15, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci-robot commented Oct 15, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci-robot commented Oct 15, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci-robot commented Oct 15, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

djoshy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

djoshy Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dkhater-redhat Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

djoshy Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dkhater-redhat Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

djoshy Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

dkhater-redhat Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

djoshy Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

yuqi-zhang Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

dkhater-redhat Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

djoshy Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

openshift-ci-robot commented Oct 17, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dkhater-redhat commented Oct 17, 2025

Uh oh!

pablintino left a comment

Choose a reason for hiding this comment

Uh oh!

dkhater-redhat commented Oct 15, 2025 •

edited

Loading

openshift-ci-robot commented Oct 15, 2025 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Oct 15, 2025 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Oct 15, 2025 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Oct 15, 2025 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Oct 15, 2025 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Oct 15, 2025 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Oct 15, 2025 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Oct 15, 2025 •

edited by openshift-ci bot

Loading

djoshy Oct 17, 2025 •

edited

Loading

djoshy Oct 21, 2025 •

edited

Loading

djoshy Oct 22, 2025 •

edited

Loading

openshift-ci-robot commented Oct 17, 2025 •

edited by openshift-ci bot

Loading