Skip to content

OTA-1531: [3/x] cvo: read version from release metadata on startup #1188

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

petr-muller
Copy link
Member

@petr-muller petr-muller commented May 7, 2025

Builds on #1185

CVO is typically executed in a Pod using the release payload image, which means its filesystem contains (among other content) the release metadata which the CVO can use to determine its OCP version, which is helpful to establish the feature gate enablement data to drive gated CVO behaviors.

It is useful to establish the feature gate enablement checker early in the CVO execution because it broadens the part of CVO code that can contain gated behavior (until the checker is established it is not possible to determine whether a gate is enabled or disabled)

Loading the full payload is still quite heavy operation and can only be done later in the execution (after the CVO acquires leader lease). The early metadata peek should read and utilize as few data as is necessary to establish the gate checker.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 7, 2025
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 7, 2025
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented May 7, 2025

@petr-muller: This pull request references OTA-1531 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set.

In response to this:

CVO is typically executed in a Pod using the release payload image, which means its filesystem contains (among other content) the release metadata which the CVO can use to determine its OCP version, which is helpful to establish the feature gate enablement data to drive gated CVO behaviors.

It is useful to establish the feature gate enablement checker early in the CVO execution because it broadens the part of CVO code that can contain gated behavior (until the checker is stablished it is not possible to determine whether a gate is enabled or disabled)

Loading the full payload is still quite heavy operation and can only be done later in the execution (after the CVO acquires leader lease). The early metadata peek should read and utilize as few data as is necessary to establish the gate checker.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Contributor

openshift-ci bot commented May 7, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 7, 2025
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented May 7, 2025

@petr-muller: This pull request references OTA-1531 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set.

In response to this:

Builds on #1185

CVO is typically executed in a Pod using the release payload image, which means its filesystem contains (among other content) the release metadata which the CVO can use to determine its OCP version, which is helpful to establish the feature gate enablement data to drive gated CVO behaviors.

It is useful to establish the feature gate enablement checker early in the CVO execution because it broadens the part of CVO code that can contain gated behavior (until the checker is stablished it is not possible to determine whether a gate is enabled or disabled)

Loading the full payload is still quite heavy operation and can only be done later in the execution (after the CVO acquires leader lease). The early metadata peek should read and utilize as few data as is necessary to establish the gate checker.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented May 7, 2025

@petr-muller: This pull request references OTA-1531 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set.

In response to this:

Builds on #1185

CVO is typically executed in a Pod using the release payload image, which means its filesystem contains (among other content) the release metadata which the CVO can use to determine its OCP version, which is helpful to establish the feature gate enablement data to drive gated CVO behaviors.

It is useful to establish the feature gate enablement checker early in the CVO execution because it broadens the part of CVO code that can contain gated behavior (until the checker is established it is not possible to determine whether a gate is enabled or disabled)

Loading the full payload is still quite heavy operation and can only be done later in the execution (after the CVO acquires leader lease). The early metadata peek should read and utilize as few data as is necessary to establish the gate checker.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@petr-muller
Copy link
Member Author

/test unit lint images

@petr-muller
Copy link
Member Author

/test e2e-agnostic-operator

@petr-muller petr-muller changed the title OTA-1531: cvo: read version from release metadata on startup OTA-1531: [3/x] cvo: read version from release metadata on startup May 7, 2025
@petr-muller
Copy link
Member Author

/test ci/prow/e2e-agnostic-operator

Copy link
Contributor

openshift-ci bot commented May 7, 2025

@petr-muller: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

/test e2e-agnostic-operator
/test e2e-agnostic-operator-techpreview
/test e2e-agnostic-ovn
/test e2e-agnostic-ovn-upgrade-into-change
/test e2e-agnostic-ovn-upgrade-into-change-techpreview
/test e2e-agnostic-ovn-upgrade-out-of-change
/test e2e-agnostic-ovn-upgrade-out-of-change-techpreview
/test e2e-aws-ovn-techpreview
/test e2e-hypershift
/test e2e-hypershift-conformance
/test gofmt
/test images
/test lint
/test unit
/test verify-deps

The following commands are available to trigger optional jobs:

/test e2e-agnostic-operator-devpreview
/test okd-scos-e2e-aws-ovn
/test okd-scos-images

Use /test all to run the following jobs that were automatically triggered:

pull-ci-openshift-cluster-version-operator-main-e2e-agnostic-operator
pull-ci-openshift-cluster-version-operator-main-e2e-agnostic-operator-devpreview
pull-ci-openshift-cluster-version-operator-main-e2e-agnostic-ovn
pull-ci-openshift-cluster-version-operator-main-e2e-agnostic-ovn-upgrade-into-change
pull-ci-openshift-cluster-version-operator-main-e2e-agnostic-ovn-upgrade-out-of-change
pull-ci-openshift-cluster-version-operator-main-e2e-aws-ovn-techpreview
pull-ci-openshift-cluster-version-operator-main-e2e-hypershift
pull-ci-openshift-cluster-version-operator-main-e2e-hypershift-conformance
pull-ci-openshift-cluster-version-operator-main-gofmt
pull-ci-openshift-cluster-version-operator-main-images
pull-ci-openshift-cluster-version-operator-main-lint
pull-ci-openshift-cluster-version-operator-main-okd-scos-e2e-aws-ovn
pull-ci-openshift-cluster-version-operator-main-unit
pull-ci-openshift-cluster-version-operator-main-verify-deps

In response to this:

/test ci/prow/e2e-agnostic-operator

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@petr-muller
Copy link
Member Author

/test e2e-agnostic-operator

@petr-muller
Copy link
Member Author

/test e2e-agnostic-operator e2e-agnostic-ovn-update-into-change

@petr-muller
Copy link
Member Author

/test e2e-agnostic-ovn-upgrade-into-change e2e-agnostic-ovn-upgrade-into-change-techpreview

CVO is typically executed in a Pod using the release payload image, which means its filesystem contains (among other content) the release metadata which the CVO can use to determine its OCP version, which is helpful to establish the feature gate enablement data to drive gated CVO behaviors.

It is useful to establish the feature gate enablement checker early in the CVO execution because it broadens the part of CVO code that can contain gated behavior (until the checker is stablished it is not possible to determine whether a gate is enabled or disabled)

Loading the full payload is still quite heavy operation and can only be done later in the execution (after the CVO acquires leader lease). The early metadata peek should read and utilize as few data as is necessary to establish the gate checker.
@petr-muller petr-muller force-pushed the ota-1531-03-load-metadata-earlyy branch from e2edeb6 to a6cf93c Compare May 9, 2025 12:45
@petr-muller petr-muller marked this pull request as ready for review May 9, 2025 12:45
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 9, 2025
if err != nil {
return nil, err
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This move only addresses a review comment from an earlier PR: #1185 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is interesting that github chose to interpret the change as stanza from 311 (the release part) is moved down to 318, instead of stanza 318 (the arch part) is moved up.

Github did not read our conversation carefully enough. 😄

Copy link
Contributor

openshift-ci bot commented May 9, 2025

@petr-muller: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-agnostic-operator-devpreview a6cf93c link false /test e2e-agnostic-operator-devpreview

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@petr-muller
Copy link
Member Author

/label no-qe

@openshift-ci openshift-ci bot added the no-qe Allows PRs to merge without qe-approved label label May 9, 2025
@petr-muller
Copy link
Member Author

@hongkailiu
Copy link
Member

/cc

@openshift-ci openshift-ci bot requested a review from hongkailiu May 12, 2025 12:58
Copy link
Member

@hongkailiu hongkailiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good to me.
I was kind of expecting releaseMetadata.Version is used somewhere because it's value has been determined earlier.
Will I see it in another PR soon?

if err != nil {
return nil, err
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is interesting that github chose to interpret the change as stanza from 311 (the release part) is moved down to 318, instead of stanza 318 (the arch part) is moved up.

Github did not read our conversation carefully enough. 😄

Comment on lines +213 to +214
cvoOcpVersion = releaseMetadata.Version
klog.Infof("Determined OCP version for this CVO: %q", cvoOcpVersion)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT:

Suggested change
cvoOcpVersion = releaseMetadata.Version
klog.Infof("Determined OCP version for this CVO: %q", cvoOcpVersion)
klog.Infof("Determined OCP version for this CVO: %q", releaseMetadata.Version)

(Definitely just a matter of choice) I would use "0.0.1-snapshot" in the other 2 places as well (and we could remove cvoOcpVersion). For example,

klog.Warningf("Failed to read release metadata to determine OCP version for this CVO (will use placeholder '0.0.1-snapshot'): %v", err)

I feel it would be easier a bit to read the code that way and without having to think if cvoOcpVersion is used later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was kind of expecting releaseMetadata.Version is used somewhere because it's value has been determined earlier. Will I see it in another PR soon?

Not the releaseMetadata.Version directly but cvoOcpVersion will be used later, it will either have the value read from the release metadata or the placeholder one. It makes sense to log the value that will be eventually used, and also we should keep the placeholder value in a single place, not inline it in logging strings.

Check out the followups:

The "determine OCP version" code will be made into a method:

// getOcpVersion peeks at the local release metadata to determine the version of OCP this CVO belongs to. This assumes
// the CVO is executing in a container from the payload image. This does not and should not fully load whole payload
// content, that is only loaded later once leader lease is acquired. Here we should only read as little data as possible
// to determine the version so we can establish enabled feature gate checker for all following code.
func (o *Options) getOcpVersion() string {
payloadRoot := payload.DefaultRootPath
if o.PayloadOverride != "" {
payloadRoot = payload.RootPath(o.PayloadOverride)
}
cvoOcpVersion := "0.0.1-snapshot"
// We cannot refuse to start CVO if for some reason we cannot determine the OCP version on startup from the local
// release metadata. The only consequence is we fail to determine enabled/disabled feature gates and will have to use
// some defaults.
releaseMetadata, err := payloadRoot.LoadReleaseMetadata()
switch {
case err != nil:
klog.Warningf("Failed to read release metadata to determine OCP version for this CVO (will use placeholder version %q): %v", cvoOcpVersion, err)
case releaseMetadata.Version == "":
klog.Warningf("Version missing from release metadata, cannot determine OCP version for this CVO (will use placeholder version %q): %v", cvoOcpVersion, err)
default:
cvoOcpVersion = releaseMetadata.Version
klog.Infof("Determined OCP version for this CVO: %q", cvoOcpVersion)
}
return cvoOcpVersion
}

And the determined version will be used to determine enable CVO feature gates:

cvoOcpVersion := o.getOcpVersion()
cvoGates := featuregates.DefaultCvoGates(cvoOcpVersion)

case err != nil:
klog.Warningf("Failed to read release metadata to determine OCP version for this CVO (will use placeholder version %q): %v", cvoOcpVersion, err)
case releaseMetadata.Version == "":
klog.Warningf("Version missing from release metadata, cannot determine OCP version for this CVO (will use placeholder version %q): %v", cvoOcpVersion, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would err have to be nil when it reaches this case?

Suggested change
klog.Warningf("Version missing from release metadata, cannot determine OCP version for this CVO (will use placeholder version %q): %v", cvoOcpVersion, err)
klog.Warningf("Version missing from release metadata, cannot determine OCP version for this CVO (will use placeholder version %q)", cvoOcpVersion)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, will address

@hongkailiu
Copy link
Member

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 12, 2025
Copy link
Contributor

openshift-ci bot commented May 12, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hongkailiu, petr-muller

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit 6fe4a71 into openshift:main May 12, 2025
14 of 15 checks passed
@petr-muller petr-muller deleted the ota-1531-03-load-metadata-earlyy branch May 12, 2025 18:11
@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

Distgit: cluster-version-operator
This PR has been included in build cluster-version-operator-container-v4.20.0-202505121910.p0.g6fe4a71.assembly.stream.el9.
All builds following this will include this PR.

petr-muller added a commit to petr-muller/cluster-version-operator that referenced this pull request May 13, 2025
Address an earlier review comment: openshift#1188 (comment)
petr-muller added a commit to petr-muller/cluster-version-operator that referenced this pull request May 13, 2025
Address an earlier review comment: openshift#1188 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. no-qe Allows PRs to merge without qe-approved label
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants