Skip to content

ART-14831: Fix require_consistency check missing duplicate package versions#2692

Merged
openshift-merge-bot[bot] merged 1 commit intoopenshift-eng:mainfrom
locriandev:build-sync
Apr 1, 2026
Merged

ART-14831: Fix require_consistency check missing duplicate package versions#2692
openshift-merge-bot[bot] merged 1 commit intoopenshift-eng:mainfrom
locriandev:build-sync

Conversation

@locriandev
Copy link
Copy Markdown
Contributor

Summary

Fixes the rhcos.require_consistency check to properly detect when payload images have multiple versions of the same package installed.

Problem

The consistency check between RHCOS and payload images (e.g., driver-toolkit) was failing to detect duplicate package installations. When driver-toolkit had both kernel-5.14.0-427.116.1 and kernel-5.14.0-427.117.1 installed, the check would randomly pass or fail depending on which version was processed last.

This allowed build-sync to succeed and publish nightly payloads with kernel mismatches, which were only caught later by Prow CI with errors like:

ERROR: driver-toolkit and node kernel version mismatch: 
(5.14.0-427.116.1.el9_4.x86_64 != 5.14.0-427.117.1.el9_4.x86_64)

Root Cause

get_all_installed_package_build_dicts() returns a Dict[package_name, build_dict]. When multiple versions of the same package are installed, the dictionary key collision causes only the last version to be retained, hiding the duplicate.

Expected Behavior

After this fix:

  1. When driver-toolkit has multiple kernel versions installed, gen-payload exits with code 1
  2. This causes build-sync to fail (red build) before the nightly payload is created
  3. ART is alerted immediately instead of finding out later from Prow CI failures

Test

https://art-jenkins.apps.prod-stable-spoke1-dc-iad2.itup.redhat.com/job/aos-cd-builds/job/build%252Fbuild-sync-konflux/17374/consoleFull

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 1, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Apr 1, 2026

@locriandev: This pull request references ART-14831 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Summary

Fixes the rhcos.require_consistency check to properly detect when payload images have multiple versions of the same package installed.

Problem

The consistency check between RHCOS and payload images (e.g., driver-toolkit) was failing to detect duplicate package installations. When driver-toolkit had both kernel-5.14.0-427.116.1 and kernel-5.14.0-427.117.1 installed, the check would randomly pass or fail depending on which version was processed last.

This allowed build-sync to succeed and publish nightly payloads with kernel mismatches, which were only caught later by Prow CI with errors like:

ERROR: driver-toolkit and node kernel version mismatch: 
(5.14.0-427.116.1.el9_4.x86_64 != 5.14.0-427.117.1.el9_4.x86_64)

Root Cause

get_all_installed_package_build_dicts() returns a Dict[package_name, build_dict]. When multiple versions of the same package are installed, the dictionary key collision causes only the last version to be retained, hiding the duplicate.

Expected Behavior

After this fix:

  1. When driver-toolkit has multiple kernel versions installed, gen-payload exits with code 1
  2. This causes build-sync to fail (red build) before the nightly payload is created
  3. ART is alerted immediately instead of finding out later from Prow CI failures

Test

https://art-jenkins.apps.prod-stable-spoke1-dc-iad2.itup.redhat.com/job/aos-cd-builds/job/build%252Fbuild-sync-konflux/17374/consoleFull

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 1, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4cee01bd-ee92-43b3-acea-2cff246665b3

📥 Commits

Reviewing files that changed from the base of the PR and between b4f2435 and cd195a7.

📒 Files selected for processing (1)
  • doozer/doozerlib/cli/release_gen_payload.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • doozer/doozerlib/cli/release_gen_payload.py

Walkthrough

validate_pkg_consistency_req was changed to branch by self.runtime.build_system (Konflux vs Brew). Konflux now parses installed package NVRs from the build record and detects duplicate versions; Brew performs an installed-RPM pre-check for missing or multiple versions before falling back to existing lookup logic. Docstring expanded.

Changes

Cohort / File(s) Summary
Package Validation Logic
doozer/doozerlib/cli/release_gen_payload.py
Reworked validate_pkg_consistency_req to branch behavior by self.runtime.build_system. Konflux: retrieve and parse installed package NVRs from build record, fail on none or multiple matches, select single match and cache via rpm finder for comparison. Brew: scan installed RPM dicts for matching NVRs, fail on none or multiple versions, then fall back to prior get_all_installed_package_build_dicts-by-name lookup for RPM version comparison. Expanded docstring to document Konflux duplicate-version detection.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
doozer/doozerlib/cli/release_gen_payload.py (2)

2395-2400: Docstring is incomplete: Brew path also detects duplicate package versions.

The docstring mentions "For Konflux builds, this also detects when multiple versions..." but the Brew path (lines 2440-2468) also implements duplicate version detection now. Update the docstring to reflect that both build systems have this check.

📝 Proposed docstring update
     def validate_pkg_consistency_req(
         ...
     ) -> Optional[AssemblyIssue]:
         """
         Check that the specified package in the member is consistent with the RHCOS build.

-        For Konflux builds, this also detects when multiple versions of the same package
-        are installed (e.g., two different kernel versions), which should never happen.
+        This also detects when multiple versions of the same package are installed
+        (e.g., two different kernel versions), which should never happen. This check
+        applies to both Konflux and Brew builds.
         """
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@doozer/doozerlib/cli/release_gen_payload.py` around lines 2395 - 2400, Update
the function docstring (the docstring that starts "Check that the specified
package in the member is consistent with the RHCOS build.") to state that
duplicate-package-version detection applies to both Konflux builds and the Brew
code path as well; specifically mention that the Brew branch in this function
also detects multiple versions of the same package (e.g., two kernel versions).
Keep the rest of the docstring intact and ensure the wording references both
build systems so readers know duplicate-version checks run for Konflux and Brew
paths in this function.

2436-2438: Add a public method to PackageRpmFinder instead of accessing private attributes.

The code directly accesses private attributes _cache_packages() and _packages_build_dicts, coupling to internal implementation details. Consider adding a public method (e.g., get_package_build_dict(package_nvr)) to encapsulate this functionality. The class already has get_installed_packages_build_dicts() for a related use case.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@doozer/doozerlib/cli/release_gen_payload.py` around lines 2436 - 2438, The
code in release_gen_payload uses private members of PackageRpmFinder
(pkg_finder._cache_packages and pkg_finder._packages_build_dicts) which couples
to internals; add a public method on PackageRpmFinder (for example
get_package_build_dict(package_nvr)) that ensures packages are cached and
returns the build dict for a given package, reuse existing logic from
get_installed_packages_build_dicts where appropriate, then replace the calls in
release_gen_payload to call
pkg_finder.get_package_build_dict(matching_packages[0]) instead of touching
_cache_packages or _packages_build_dicts directly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@doozer/doozerlib/cli/release_gen_payload.py`:
- Around line 2471-2474: The code assumes member_nvrs[pkg] always exists but
get_all_installed_package_build_dicts() can return a different set than the
earlier finder, risking a KeyError; change the access to defensively check for
the key (use member_nvrs.get(pkg)) and if missing either raise a clear,
informative exception (include pkg, package_rpm_finder/self.package_rpm_finder
and that get_all_installed_package_build_dicts() did not return an entry) or
handle it gracefully (log the missing pkg and skip/continue), updating any
callers to expect the new error/flow; refer to
get_all_installed_package_build_dicts, member_nvrs, pkg and
package_rpm_finder/self.package_rpm_finder to locate the change.
- Around line 2446-2448: The parse_nvr call on rpm_nvr (from rpm_dict) can raise
ValueError for malformed NVRs; wrap the call to parse_nvr(rpm_nvr) in a
try/except ValueError block (around the code using rpm_dict['nvr'] and parsed)
and on exception log a warning (including rpm_nvr and pkg for context) and
continue/skip that entry so the loop does not crash; ensure you reference
rpm_dict, rpm_nvr, parse_nvr, parsed and pkg when adding the handling.
- Around line 2413-2416: The list comprehension that builds matching_packages
calls parse_nvr(pkg_nvr) which can raise ValueError for malformed NVRs; replace
the comprehension with an explicit loop in release_gen_payload.py that iterates
over installed_packages, calls parse_nvr(pkg_nvr) inside a try/except
ValueError, skips (and preferably logs a warning about) malformed pkg_nvr, and
only appends pkg_nvr when parsed['name'] == pkg; keep identifiers
matching_packages, installed_packages, pkg_nvr, pkg and parse_nvr so the change
is easy to locate.

---

Nitpick comments:
In `@doozer/doozerlib/cli/release_gen_payload.py`:
- Around line 2395-2400: Update the function docstring (the docstring that
starts "Check that the specified package in the member is consistent with the
RHCOS build.") to state that duplicate-package-version detection applies to both
Konflux builds and the Brew code path as well; specifically mention that the
Brew branch in this function also detects multiple versions of the same package
(e.g., two kernel versions). Keep the rest of the docstring intact and ensure
the wording references both build systems so readers know duplicate-version
checks run for Konflux and Brew paths in this function.
- Around line 2436-2438: The code in release_gen_payload uses private members of
PackageRpmFinder (pkg_finder._cache_packages and
pkg_finder._packages_build_dicts) which couples to internals; add a public
method on PackageRpmFinder (for example get_package_build_dict(package_nvr))
that ensures packages are cached and returns the build dict for a given package,
reuse existing logic from get_installed_packages_build_dicts where appropriate,
then replace the calls in release_gen_payload to call
pkg_finder.get_package_build_dict(matching_packages[0]) instead of touching
_cache_packages or _packages_build_dicts directly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9e13d5fe-17df-4fe8-9d90-0ba68323490b

📥 Commits

Reviewing files that changed from the base of the PR and between de37e29 and b4f2435.

📒 Files selected for processing (1)
  • doozer/doozerlib/cli/release_gen_payload.py

Comment on lines +2413 to +2416
matching_packages = [
pkg_nvr for pkg_nvr in installed_packages
if parse_nvr(pkg_nvr)['name'] == pkg
]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

parse_nvr can raise ValueError on malformed NVRs.

Per artcommonlib/rpm_utils.py:53, parse_nvr raises ValueError for invalid NVR formats. If installed_packages contains a malformed entry, this will cause an unhandled exception. While build records should have valid NVRs, defensive handling would improve robustness.

🛡️ Proposed defensive handling
             # Filter for packages matching the name we're checking
-            matching_packages = [
-                pkg_nvr for pkg_nvr in installed_packages
-                if parse_nvr(pkg_nvr)['name'] == pkg
-            ]
+            matching_packages = []
+            for pkg_nvr in installed_packages:
+                try:
+                    if parse_nvr(pkg_nvr)['name'] == pkg:
+                        matching_packages.append(pkg_nvr)
+                except ValueError:
+                    logger.warning(f"Skipping malformed NVR in installed_packages: {pkg_nvr}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
matching_packages = [
pkg_nvr for pkg_nvr in installed_packages
if parse_nvr(pkg_nvr)['name'] == pkg
]
matching_packages = []
for pkg_nvr in installed_packages:
try:
if parse_nvr(pkg_nvr)['name'] == pkg:
matching_packages.append(pkg_nvr)
except ValueError:
logger.warning(f"Skipping malformed NVR in installed_packages: {pkg_nvr}")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@doozer/doozerlib/cli/release_gen_payload.py` around lines 2413 - 2416, The
list comprehension that builds matching_packages calls parse_nvr(pkg_nvr) which
can raise ValueError for malformed NVRs; replace the comprehension with an
explicit loop in release_gen_payload.py that iterates over installed_packages,
calls parse_nvr(pkg_nvr) inside a try/except ValueError, skips (and preferably
logs a warning about) malformed pkg_nvr, and only appends pkg_nvr when
parsed['name'] == pkg; keep identifiers matching_packages, installed_packages,
pkg_nvr, pkg and parse_nvr so the change is easy to locate.

Comment on lines +2446 to +2448
rpm_nvr = rpm_dict['nvr']
parsed = parse_nvr(rpm_nvr)
if parsed['name'] == pkg:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Same parse_nvr ValueError risk in Brew path.

The parse_nvr call here has the same risk of raising ValueError on malformed NVRs as noted in the Konflux path.

🛡️ Proposed defensive handling
             for rpm_dict in all_rpm_dicts:
                 rpm_nvr = rpm_dict['nvr']
-                parsed = parse_nvr(rpm_nvr)
-                if parsed['name'] == pkg:
+                try:
+                    parsed = parse_nvr(rpm_nvr)
+                except ValueError:
+                    logger.warning(f"Skipping malformed RPM NVR: {rpm_nvr}")
+                    continue
+                if parsed['name'] == pkg:
                     # Extract package NVR from RPM NVR (RPMs have arch suffix)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
rpm_nvr = rpm_dict['nvr']
parsed = parse_nvr(rpm_nvr)
if parsed['name'] == pkg:
rpm_nvr = rpm_dict['nvr']
try:
parsed = parse_nvr(rpm_nvr)
except ValueError:
logger.warning(f"Skipping malformed RPM NVR: {rpm_nvr}")
continue
if parsed['name'] == pkg:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@doozer/doozerlib/cli/release_gen_payload.py` around lines 2446 - 2448, The
parse_nvr call on rpm_nvr (from rpm_dict) can raise ValueError for malformed
NVRs; wrap the call to parse_nvr(rpm_nvr) in a try/except ValueError block
(around the code using rpm_dict['nvr'] and parsed) and on exception log a
warning (including rpm_nvr and pkg for context) and continue/skip that entry so
the loop does not crash; ensure you reference rpm_dict, rpm_nvr, parse_nvr,
parsed and pkg when adding the handling.

Comment on lines +2471 to +2474
member_nvrs: Dict[str, Dict] = bri.get_all_installed_package_build_dicts(
package_rpm_finder or self.package_rpm_finder
) # by name
build = member_nvrs[pkg] # We know this exists now
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potential KeyError if methods return inconsistent data.

The comment states "We know this exists now" based on finding the package via get_all_installed_rpm_dicts(), but get_all_installed_package_build_dicts() is a separate method that queries different data. If these methods have any filtering differences, member_nvrs[pkg] could raise KeyError.

🛡️ Proposed defensive check
             # Get the build dict using the old method (now we know there's only one)
             member_nvrs: Dict[str, Dict] = bri.get_all_installed_package_build_dicts(
                 package_rpm_finder or self.package_rpm_finder
             )  # by name
-            build = member_nvrs[pkg]  # We know this exists now
+            build = member_nvrs.get(pkg)
+            if not build:
+                return AssemblyIssue(
+                    f"Package '{pkg}' found in RPM list but not in package build dicts for '{payload_tag}'",
+                    payload_tag,
+                    AssemblyIssueCode.FAILED_CONSISTENCY_REQUIREMENT,
+                )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
member_nvrs: Dict[str, Dict] = bri.get_all_installed_package_build_dicts(
package_rpm_finder or self.package_rpm_finder
) # by name
build = member_nvrs[pkg] # We know this exists now
member_nvrs: Dict[str, Dict] = bri.get_all_installed_package_build_dicts(
package_rpm_finder or self.package_rpm_finder
) # by name
build = member_nvrs.get(pkg)
if not build:
return AssemblyIssue(
f"Package '{pkg}' found in RPM list but not in package build dicts for '{payload_tag}'",
payload_tag,
AssemblyIssueCode.FAILED_CONSISTENCY_REQUIREMENT,
)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@doozer/doozerlib/cli/release_gen_payload.py` around lines 2471 - 2474, The
code assumes member_nvrs[pkg] always exists but
get_all_installed_package_build_dicts() can return a different set than the
earlier finder, risking a KeyError; change the access to defensively check for
the key (use member_nvrs.get(pkg)) and if missing either raise a clear,
informative exception (include pkg, package_rpm_finder/self.package_rpm_finder
and that get_all_installed_package_build_dicts() did not return an entry) or
handle it gracefully (log the missing pkg and skip/continue), updating any
callers to expect the new error/flow; refer to
get_all_installed_package_build_dicts, member_nvrs, pkg and
package_rpm_finder/self.package_rpm_finder to locate the change.

@joepvd
Copy link
Copy Markdown
Contributor

joepvd commented Apr 1, 2026

/approve
/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 1, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Apr 1, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: joepvd

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 1, 2026
rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Apr 1, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Apr 1, 2026

@locriandev: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/security cd195a7 link false /test security

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@joepvd
Copy link
Copy Markdown
Contributor

joepvd commented Apr 1, 2026

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 1, 2026
@openshift-merge-bot openshift-merge-bot bot merged commit 69878f0 into openshift-eng:main Apr 1, 2026
4 of 5 checks passed
@locriandev locriandev deleted the build-sync branch April 1, 2026 16:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants