Skip to content

Conversation

kurlov
Copy link
Member

@kurlov kurlov commented Jul 1, 2025

  • Adds a new Go script to generates catalog-template.yaml from a new bundles.yaml file.
  • The bundles.yaml has a lists operator bundle images with versions, the oldest_supported_version and a list of broken_versions.
    • oldest_supported_version specifies what is the lowest supported version. Any version or channel before oldest_supported_version will be marked as deprecated.
    • broken_versions a list of versions which should be skipped. For each broken version X.Y.Z the script adds "skips" for all versions > X.Y.Z and < X.Y+2.0
  • Add CI step to checks which validates that catalog-template.yaml is up-to-date with bundles.yaml
  • Add CI to run go unit tests if cmd folder is changed

catalog-template.yaml changes:

  • Now it's autogenerated
  • All yaml anchors are dropped
  • Add schema: olm.bundle deprecation references for all version < oldest_supported_version. So not only channels are deprecated.
  • Add rhacs-3.63 channel
  • skips is only added for broken version + 2 minor version
  • Channels keeps all previous version within the same major version (e.g. 3.66 channel has all version <= 3.66.x)
  • latest channel has all 3.X.X versions
  • stable channel has all 4.X.X versions

Testing this PR catalogs on OCP 4.19:

image: quay.io/rhacs-eng/stackrox-operator-index@sha256:9345ef4e16b463205ab9dcf4446c5a34babc8c497ad9cbbeae327fb44b2f356d
commit: ca7b4bd

Now not only channels but versions are also deprecated.

deprecations

Test locally:

make generate-catalog-template
make clean && make valid-catalogs

This comment was marked as off-topic.

@kurlov kurlov changed the title WIP script to generate catalog-template.yaml DO NOT REVIEW YET: script to generate catalog-template.yaml Jul 1, 2025
@kurlov kurlov changed the title DO NOT REVIEW YET: script to generate catalog-template.yaml WIP: script to generate catalog-template.yaml Jul 1, 2025
@kurlov kurlov changed the title WIP: script to generate catalog-template.yaml Script to generate catalog-template.yaml Jul 9, 2025
@kurlov kurlov marked this pull request as ready for review July 10, 2025 12:23
@kurlov kurlov changed the title Script to generate catalog-template.yaml ROX-30064: Script to generate catalog-template.yaml Jul 10, 2025
@kurlov kurlov requested review from porridge and mclasmeier July 11, 2025 12:57
@msugakov msugakov self-requested a review July 11, 2025 13:46
Copy link
Contributor

@msugakov msugakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scratched the surface a bit. Got enough comments for the first round of review.
Please expect more review rounds and more comments.

@kurlov kurlov requested a review from msugakov July 15, 2025 10:05
@kurlov kurlov requested a review from msugakov August 18, 2025 14:32
Copy link
Contributor

@msugakov msugakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments on inter-diff.

Copy link
Contributor

@msugakov msugakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple more thoughts for today. Will make another pass after you update the PR.

Comment on lines 232 to 234
validForChannel := channel.YStreamVersion != nil &&
entry.Version.Major() == channel.YStreamVersion.Major() &&
entry.Version.Minor() <= channel.YStreamVersion.Minor()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This must be consistent with what addReplaces does (ref. versionsWithoutReplaces). Right now it is not if you consider what happens when we'd introduce 5.0.0. Do you have ideas how to make these two parts more consistent and at least make sure that when we change one we don't forget the other?

I at this point don't have a suggestion. Need to think about it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't able to find a nice solution, probably will come with some idea tomorrow

Copy link
Member Author

@kurlov kurlov Aug 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we add 5.0.0 to versionsWithoutReplaces that means that we want to introduce yet another non versioned channel (like stable and latest). So it's hard to implement a nice solution unless we add a new configuration to bundles.yaml (e.g.:

...
pivot_channels:
  - name: latest
    start_version: 3.62.0
  - name: stable
    start_version: 4.0.0
  - name: new
    start_version: 5.0.0
...

I think it's fine to keep the same deprecation messages for deprecated non versioned (pivotal) channels

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe there are plans to do that: introduce another rolling channel for 5.0.0+. stable will remain and will get new versions.

Comment on lines +9 to +10
- image: registry.redhat.io/advanced-cluster-security/rhacs-operator-bundle@sha256:7fd7595e6a61352088f9a3a345be03a6c0b9caa0bbc5ddd8c61ba1d38b2c3b8e
version: 3.62.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This remains stuck in my head and I keep thinking we need it. Right now, we have no validation that version is really the correct one for the image. We can check that by pulling the image's manifest and inspecting its labels. One of them should be version.
I think, we need such a check in CI. If you agree, could you please create a follow-up ticket?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it will protect from one scenario:
if all versions and images are correct but shuffled (e.g. 4.8.2 image tagged with version 4.8.1 and vise versa). This scenario will not break output catalogs though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it's still worth to introduce such check? It will significantly increase script execution time

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried about the case when I do

- image: registry.redhat.io/advanced-cluster-security/rhacs-operator-bundle@sha256:<wrong-digest>
  version: <correct-version>

If this image really exists, I'm not sure if there's any subsequent check that catches that.

Also, and may be even worse is the situation when we'd test wrong image before the release:

- image: quay.io/rhacs-eng/release-operator-bundle@sha256:<wrong-digest>
  version: <correct-version>

Here I may test the wrong thing and conclude it's ready to be released whereas the right thing may not be ready.

Yes, the check will delay the execution but it does not have to be invoked as part of the same generation command, or at least not by default. It can be set to only run in CI.
Besides, we don't need to pull image blobs. Pulling just the manifests should be enough. If we parallelize calls, may be it won't take more than a couple of minutes to gather all images.

Copy link
Member Author

@kurlov kurlov Aug 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If digest is wrong and image doesn't exist then valid-catalogs will fail with:

registry.redhat.io/advanced-cluster-security/rhacs-operator-bundle@sha256:94a00394bc5a8ef503fb59db0a7d0ae9e1110866e8aee8ba40cd864cea69ea1a: not found

If image is not from registry.redhat.io/advanced-cluster-security then our CI check-operator-images-are-only-released-images will fail.

If image exists and it's from registry.redhat.io/advanced-cluster-security but its image version is not presented in the catalog then valid-catalogs will fail. I tested it locally with removing 4.8.2 version and using 4.8.2 image for 4.8.1 and I got this error:

package "rhacs-operator", bundle "rhacs-operator.v4.8.2" not found in any channel entries

There is no 4.8.2 version in the bundles.yaml but I used 4.8.2 image for 4.8.1 version

I can add a TODO, I just want to highlight that pulling images and checking them will only protect us from shuffling versions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@porridge and I today briefly discussed https://issues.redhat.com/browse/ROX-30604 which is about the same thing. We should discuss it when all three of us are online.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just want to highlight that pulling images and checking them will only protect us from shuffling versions.

The code currently uses images and versions independently. There's no actual problem (currently) in case versions and images are shuffled.


It seems opm will fail in a couple of cases such as referencing the same image twice or the test you did with 4.8.1 and 4.8.2.

opm error messages aren't the most readable ones though.


If we don't implement the check we're at mercy of opm and have to hope it will keep failing where we need it to fail.
I'm getting more inclined to agree that we shouldn't implement the check, but like I wrote above let's discuss it together.

@kurlov kurlov requested a review from msugakov August 19, 2025 17:28
Copy link
Contributor

@msugakov msugakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick incomplete check.

if err != nil {
return fmt.Errorf("failed to read bundle list file: %v", err)
}
versions := getAllVersions(input.Images)
versions := getAllVersions(config.Images)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't understand your reply.

If it is more convenient for some functions to work with []*semver.Version instead of []InputBundleImage (and I agree it is), provide them with []*semver.Version.

My concern is that the line versions := getAllVersions(config.Images) is quite low-level in this high-level generateCatalogTemplateFile() function so that it makes me feel uncomfortable about it.

My suggestion is to push it lower in the stack: either to declare Versions []*semver.Version attribute as part of the Input struct and populate that in the readInputFile() call, or declare func (i *Input) Versions() []*semver.Version { ... } to construct that slice dynamically.
Whichever way, the line versions := getAllVersions(config.Images) will disappear from generateCatalogTemplateFile() and that's what I advocate for.

Comment on lines +9 to +10
- image: registry.redhat.io/advanced-cluster-security/rhacs-operator-bundle@sha256:7fd7595e6a61352088f9a3a345be03a6c0b9caa0bbc5ddd8c61ba1d38b2c3b8e
version: 3.62.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried about the case when I do

- image: registry.redhat.io/advanced-cluster-security/rhacs-operator-bundle@sha256:<wrong-digest>
  version: <correct-version>

If this image really exists, I'm not sure if there's any subsequent check that catches that.

Also, and may be even worse is the situation when we'd test wrong image before the release:

- image: quay.io/rhacs-eng/release-operator-bundle@sha256:<wrong-digest>
  version: <correct-version>

Here I may test the wrong thing and conclude it's ready to be released whereas the right thing may not be ready.

Yes, the check will delay the execution but it does not have to be invoked as part of the same generation command, or at least not by default. It can be set to only run in CI.
Besides, we don't need to pull image blobs. Pulling just the manifests should be enough. If we parallelize calls, may be it won't take more than a couple of minutes to gather all images.

Copy link
Contributor

@msugakov msugakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made another incomplete pass. I still skip unit tests.

@kurlov kurlov requested a review from msugakov August 21, 2025 15:28
Copy link
Contributor

@msugakov msugakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some things I spotted in the inter-diff

Comment on lines +109 to +117
// generatePackageWithIcon creates a new "olm.package" object with an operator icon.
func generatePackageWithIcon() (Package, error) {
data, err := os.ReadFile(iconFile)
if err != nil {
return Package{}, fmt.Errorf("failed to read %s: %v", iconFile, err)
}
iconBase64 := base64.StdEncoding.EncodeToString(data)

return Package{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is not what I meant in #137 (comment)

I meant that you may have a

func newPackage(iconBase64) {
	// doing this and only this
	return Package{
		Schema:         olmPackageSchema,
		Name:           rhacsOperator,
		DefaultChannel: stableChannelName,
		Icon: Icon{
			Base64data: iconBase64,
			MediaType:  "image/png",
		},
	}
}

Reading icon file in types.go feels wrong.
I suggest to keep generatePackageWithIcon() in generate.go but introduce newPackage() in types.go.

Comment on lines +211 to +214
for i, channel := range channels {
for _, entry := range channelEntries {
if channelShouldHaveEntry(channel, entry) {
channels[i].Entries = append(channels[i].Entries, entry)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a mutation of the original channel list.

I can't spot it. Please show it to me.

I see a mutation of .Entries in blah.Entries; blah stays the same, doesn't it?

Comment on lines +221 to +225
func (e *ChannelEntry) addSkipRange(version, previousYStreamVersion *semver.Version) {
skipRangeFrom := previousYStreamVersion
skipRangeTo := version
e.SkipRange = fmt.Sprintf(">= %s < %s", skipRangeFrom, skipRangeTo)
}
Copy link
Contributor

@msugakov msugakov Sep 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably there's no need to alias variables any more

Suggested change
func (e *ChannelEntry) addSkipRange(version, previousYStreamVersion *semver.Version) {
skipRangeFrom := previousYStreamVersion
skipRangeTo := version
e.SkipRange = fmt.Sprintf(">= %s < %s", skipRangeFrom, skipRangeTo)
}
func (e *ChannelEntry) addSkipRange(skipRangeFrom, skipRangeTo *semver.Version) {
e.SkipRange = fmt.Sprintf(">= %s < %s", skipRangeFrom, skipRangeTo)
}

IMPORTANT: I swapped the order of arguments compared to the original one. You also need to change the order in the calls to addSkipRange(), I'll make a suggestion.

version: version,
}
entry.addReplaces(version, previousEntryVersion)
entry.addSkipRange(version, previousYStreamVersion)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
entry.addSkipRange(version, previousYStreamVersion)
entry.addSkipRange(previousYStreamVersion, version)

Comment on lines +207 to +208
func TestValidateImageReferences(t *testing.T) {
tests := []struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can suggest a couple more cases. Just providing the references to test:

  1. bare-image-name-without-registry@sha256:6cdcf20771f9c46640b466f804190d00eaf2e59caee6d420436e78b283d177bf
  2. example.com/image:my-fancy-tag-v1@sha256:6cdcf20771f9c46640b466f804190d00eaf2e59caee6d420436e78b283d177bf
  3. example.com/image@md5:9241e37fcf7f3f88c5e944bd46b0a268
  4. example.com/image@sha256:0123456789

Comment on lines +288 to +304
if currentVersion.Major() != nextVersion.Major() {
if currentVersion.Major()+1 != nextVersion.Major() || nextVersion.Minor() != 0 || nextVersion.Patch() != 0 {
return fmt.Errorf("invalid version %s: a new major version should start from %d.0.0", nextVersion, currentVersion.Major()+1)
}
}

if currentVersion.Major() == nextVersion.Major() && currentVersion.Minor() != nextVersion.Minor() {
if currentVersion.Minor()+1 != nextVersion.Minor() || nextVersion.Patch() != 0 {
return fmt.Errorf("invalid version %s: a new minor version should be %s", currentVersion, nextVersion)
}
}

if currentVersion.Major() == nextVersion.Major() && currentVersion.Minor() == nextVersion.Minor() {
if nextVersion.Patch() != currentVersion.Patch()+1 {
return fmt.Errorf("invalid version %s: is not followed by %s", currentVersion, nextVersion)
}
}
Copy link
Contributor

@msugakov msugakov Sep 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's quite a bit of repetition and error messages don't seem the most optimal to me.
How about something like this?

Suggested change
if currentVersion.Major() != nextVersion.Major() {
if currentVersion.Major()+1 != nextVersion.Major() || nextVersion.Minor() != 0 || nextVersion.Patch() != 0 {
return fmt.Errorf("invalid version %s: a new major version should start from %d.0.0", nextVersion, currentVersion.Major()+1)
}
}
if currentVersion.Major() == nextVersion.Major() && currentVersion.Minor() != nextVersion.Minor() {
if currentVersion.Minor()+1 != nextVersion.Minor() || nextVersion.Patch() != 0 {
return fmt.Errorf("invalid version %s: a new minor version should be %s", currentVersion, nextVersion)
}
}
if currentVersion.Major() == nextVersion.Major() && currentVersion.Minor() == nextVersion.Minor() {
if nextVersion.Patch() != currentVersion.Patch()+1 {
return fmt.Errorf("invalid version %s: is not followed by %s", currentVersion, nextVersion)
}
}
if currentVersion.Major() != nextVersion.Major() {
expectedNextVersion = semver.New(currentVersion.Major()+1, 0, 0, "", "")
}
if currentVersion.Major() == nextVersion.Major() && currentVersion.Minor() != nextVersion.Minor() {
expectedNextVersion = semver.New(currentVersion.Major(), currentVersion.Minor()+1, 0, "", "")
}
if currentVersion.Major() == nextVersion.Major() && currentVersion.Minor() == nextVersion.Minor() {
expectedNextVersion = semver.New(currentVersion.Major(), currentVersion.Minor(), currentVersion.Patch()+1, "", "")
}
if expectedNextVersion.Major() != nextVersion.Major() || expectedNextVersion.Minor() != nextVersion.Minor() || expectedNextVersion.Patch() != nextVersion.Patch() {
return fmt.Errorf("unexpected version sequence [%s, %s]: %s should be followed by %s", currentVersion, nextVersion, currentVersion, expectedNextVersion)
}

Comment on lines +132 to +137
if err = hasGapInVersions(versions); err != nil {
return Configuration{}, fmt.Errorf("version gap detected: %v", err)
}
if err = validateBrokenVersions(brokens, versions); err != nil {
return Configuration{}, fmt.Errorf("invalid broken versions: %v", err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both hasGapInVersions and validateBrokenVersions seem to already produce meaningful error messages. I suggest just passing them through.

Suggested change
if err = hasGapInVersions(versions); err != nil {
return Configuration{}, fmt.Errorf("version gap detected: %v", err)
}
if err = validateBrokenVersions(brokens, versions); err != nil {
return Configuration{}, fmt.Errorf("invalid broken versions: %v", err)
}
if err = hasGapInVersions(versions); err != nil {
return Configuration{}, err
}
if err = validateBrokenVersions(brokens, versions); err != nil {
return Configuration{}, err
}


for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
err := validateImageReferences(tt.images)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll likely have less test setup if you'd tested validateImageReference() instead of validateImageReferences(). Feel free to resolve this comment as no-op.

Comment on lines +150 to +156
version: 4.3.6
- image: registry.redhat.io/advanced-cluster-security/rhacs-operator-bundle@sha256:1e8115bdc0ed2d01d7576524e9796606244963a1447eae55872674c0113fb9c4
version : 4.3.7
- image: registry.redhat.io/advanced-cluster-security/rhacs-operator-bundle@sha256:30f9384244819da013358325cec364bb210b3b33b4cf9680698531fe50562aac
version : 4.3.8
- image: registry.redhat.io/advanced-cluster-security/rhacs-operator-bundle@sha256:94fcee9ac22671bd18be77381214665d9289151703a9ed78c29cee02b92612f4
version : 4.4.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the deal about the whitespace between version and its colon : - version : instead of version:?

Comment on lines +9 to +10
- image: registry.redhat.io/advanced-cluster-security/rhacs-operator-bundle@sha256:7fd7595e6a61352088f9a3a345be03a6c0b9caa0bbc5ddd8c61ba1d38b2c3b8e
version: 3.62.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just want to highlight that pulling images and checking them will only protect us from shuffling versions.

The code currently uses images and versions independently. There's no actual problem (currently) in case versions and images are shuffled.


It seems opm will fail in a couple of cases such as referencing the same image twice or the test you did with 4.8.1 and 4.8.2.

opm error messages aren't the most readable ones though.


If we don't implement the check we're at mercy of opm and have to hope it will keep failing where we need it to fail.
I'm getting more inclined to agree that we shouldn't implement the check, but like I wrote above let's discuss it together.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants