CMP-4117: Expand ownership check for profile bundle controller by rhmdnd · Pull Request #1100 · ComplianceAsCode/compliance-operator

rhmdnd · 2026-02-28T14:37:01Z

The controller previously only watched ProfileBundle objects. When the profileparser Deployment's pods changed state, the controller was never notified.

Adding Owns means any change to the owned Deployment triggers a reconciliation of the parent ProfileBundle, so the controller is responsive to pod lifecycle events.

Also, once the controller found an existing pod with no startup error, it exited the controller reconcilation loop without requeue — regardless of whether the ProfileBundle was still in PENDING state. If the profileparser hadn't finished (or never ran due to a rollout delay), the controller would never check again.

This commit also updates the profile bundle controller to requeues every 10 seconds while the status is still DataStreamPending, ensuring the controller keeps monitoring until the profileparser either succeeds (sets VALID) or fails (sets INVALID / pod startup error detected).

This should improve the resilience of profile bundle parsing, especially in testing, where we delete deployments after modifying the profile bundle image to simulate operator updates.

Assisted-By: Opus 4.6

openshift-ci · 2026-02-28T14:37:14Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rhmdnd

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [rhmdnd]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

rhmdnd · 2026-02-28T14:41:49Z

Porting @xiaojiey's comment from #1098

I tested with a special scenario, when setting profilebundle to a non-exist image. I can see the requeue working as expected. The only concern is the unconditional requeueing every 10 seconds while PENDING, for example this CrashLoopBackOff scenario when the profileparser container crashes repeatedly.

$ kubectl patch profilebundle ocp4 -n openshift-compliance --type=merge \ \
    -p '{"spec":{"contentImage":"quay.io/nonexistent/invalid:latest"}}'
profilebundle.compliance.openshift.io/ocp4 patched
$ oc get pb -w
NAME     CONTENTIMAGE                                 CONTENTFILE         STATUS
ocp4     quay.io/nonexistent/invalid:latest           ssg-rhcos4-ds.xml   PENDING
rhcos4   ghcr.io/complianceascode/k8scontent:latest   ssg-rhcos4-ds.xml   VALID
$ oc get pod -w
NAME                                             READY   STATUS              RESTARTS      AGE
compliance-operator-69ccf667d-kknvb              1/1     Running             2 (22m ago)   22m
ocp4-openshift-compliance-pp-5489bf48bd-rrmnj    0/1     Init:ErrImagePull   0             29s
ocp4-openshift-compliance-pp-7489f9c4f8-pjtfg    1/1     Running             0             3m42s
rhcos4-openshift-compliance-pp-9d8c7f955-jtc64   1/1     Running             0             21m
ocp4-openshift-compliance-pp-5489bf48bd-rrmnj    0/1     Init:ImagePullBackOff   0             44s
ocp4-openshift-compliance-pp-5489bf48bd-rrmnj    0/1     Init:ErrImagePull       0             59s
ocp4-openshift-compliance-pp-5489bf48bd-rrmnj    0/1     Init:ImagePullBackOff   0             71s
ocp4-openshift-compliance-pp-5489bf48bd-rrmnj    0/1     Init:ErrImagePull       0             109s
ocp4-openshift-compliance-pp-5489bf48bd-rrmnj    0/1     Init:ImagePullBackOff   0             2m1s
ocp4-openshift-compliance-pp-5489bf48bd-rrmnj    0/1     Init:ErrImagePull       0             3m20s
$ oc logs pod/compliance-operator-69ccf667d-kknvb | grep requeueing | tail -n 10
{"level":"info","ts":"2026-02-28T12:02:56.892Z","logger":"profilebundlectrl","msg":"ProfileBundle still pending, requeueing to check status","Request.Namespace":"openshift-compliance","Request.Name":"ocp4"}
{"level":"info","ts":"2026-02-28T12:03:06.893Z","logger":"profilebundlectrl","msg":"ProfileBundle still pending, requeueing to check status","Request.Namespace":"openshift-compliance","Request.Name":"ocp4"}
{"level":"info","ts":"2026-02-28T12:03:16.894Z","logger":"profilebundlectrl","msg":"ProfileBundle still pending, requeueing to check status","Request.Namespace":"openshift-compliance","Request.Name":"ocp4"}
{"level":"info","ts":"2026-02-28T12:03:26.895Z","logger":"profilebundlectrl","msg":"ProfileBundle still pending, requeueing to check status","Request.Namespace":"openshift-compliance","Request.Name":"ocp4"}
{"level":"info","ts":"2026-02-28T12:03:36.896Z","logger":"profilebundlectrl","msg":"ProfileBundle still pending, requeueing to check status","Request.Namespace":"openshift-compliance","Request.Name":"ocp4"}
{"level":"info","ts":"2026-02-28T12:03:46.898Z","logger":"profilebundlectrl","msg":"ProfileBundle still pending, requeueing to check status","Request.Namespace":"openshift-compliance","Request.Name":"ocp4"}
{"level":"info","ts":"2026-02-28T12:03:56.899Z","logger":"profilebundlectrl","msg":"ProfileBundle still pending, requeueing to check status","Request.Namespace":"openshift-compliance","Request.Name":"ocp4"}
{"level":"info","ts":"2026-02-28T12:04:06.900Z","logger":"profilebundlectrl","msg":"ProfileBundle still pending, requeueing to check status","Request.Namespace":"openshift-compliance","Request.Name":"ocp4"}
{"level":"info","ts":"2026-02-28T12:04:16.902Z","logger":"profilebundlectrl","msg":"ProfileBundle still pending, requeueing to check status","Request.Namespace":"openshift-compliance","Request.Name":"ocp4"}
{"level":"info","ts":"2026-02-28T12:05:39.379Z","logger":"profilebundlectrl","msg":"ProfileBundle still pending, requeueing to check status","Request.Namespace":"openshift-compliance","Request.Name":"ocp4"}

Yeah - that's a good question. What's the most appropriate state for a ProfileBundle if the image is incorrect? I would think either PENDING or INVALID.

Without this PR, wouldn't the ProfileBundle be in PENDING state indefinitely since the profile parser container wouldn't be able to pull the content image? Is it essentially the same behavior but just more verbose about retrying every 10 seconds?

rhmdnd · 2026-02-28T14:42:18Z

I rebased this to pull in #1093 - which was affecting the parallel tests.

github-actions · 2026-02-28T14:44:30Z

🤖 To deploy this PR, run the following command:

make catalog-deploy CATALOG_IMG=ghcr.io/complianceascode/compliance-operator-catalog:1100-61357833d573cc3690f59f3aa5b8ca1a7938a02b

rhmdnd · 2026-02-28T16:49:55Z

/retest-required

rhmdnd · 2026-02-28T19:10:24Z

/retest-required

rhmdnd · 2026-02-28T23:00:10Z

/retest-required

rhmdnd · 2026-03-01T13:06:54Z

Images are failing to build in CI for some reason. Looks unrelated to this change, hence all the rechecks.

INFO[2026-03-01T00:00:32Z] Ran for 1h0m10s                              
ERRO[2026-03-01T00:00:32Z] Some steps failed:                           
ERRO[2026-03-01T00:00:32Z] 
  * could not run steps: step src failed: error occurred handling build src-arm64: build didn't start running within 1h0m0s (phase: Pending):

rhmdnd · 2026-03-01T13:07:13Z

/retest-required

rhmdnd · 2026-03-01T15:25:11Z

/retest-required

rhmdnd · 2026-03-01T19:03:39Z

/retest-required

rhmdnd · 2026-03-02T01:27:34Z

Got one green run on the serial tests - rechecking to see if this helps with the transient issues we've been seeing recently.

/test e2e-aws-serial
/test e2e-aws-serial-arm

rhmdnd · 2026-03-02T14:17:15Z

Updating the controller with some better logging to trace through what's happening with the pods such that the profile bundle is stuck in a pending state, despite the controller requeuing the request when it should.

github-actions · 2026-03-02T14:23:31Z

🤖 To deploy this PR, run the following command:

make catalog-deploy CATALOG_IMG=ghcr.io/complianceascode/compliance-operator-catalog:1100-56af6139fb44aef8820d6f08bc519dadd4275055

The controller previously only watched ProfileBundle objects. When the profileparser Deployment's pods changed state, the controller was never notified. Adding Owns means any change to the owned Deployment triggers a reconciliation of the parent ProfileBundle, so the controller is responsive to pod lifecycle events. Also, once the controller found an existing pod with no startup error, it exited the controller reconcilation loop without requeue — regardless of whether the ProfileBundle was still in PENDING state. If the profileparser hadn't finished (or never ran due to a rollout delay), the controller would never check again. This commit also updates the profile bundle controller to requeues every 10 seconds while the status is still DataStreamPending, ensuring the controller keeps monitoring until the profileparser either succeeds (sets VALID) or fails (sets INVALID / pod startup error detected). In particular, if the controller detects that the init containers for the profileparser have completed and the bundle is still in a PENDING state, we're deadlocked, and it should annotate the profileparser pod to rerun. This should improve the resilience of profile bundle parsing, especially in testing, where we delete deployments after modifying the profile bundle image to simulate operator updates. Assisted-By: Opus 4.6

github-actions · 2026-03-02T21:31:18Z

🤖 To deploy this PR, run the following command:

make catalog-deploy CATALOG_IMG=ghcr.io/complianceascode/compliance-operator-catalog:1100-918f62d3451f90a128b4a245da77c6dc80db979d

openshift-ci-robot · 2026-03-02T22:23:41Z

@rhmdnd: This pull request references CMP-4117 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.22.0" version, but no target version was set.

Details

In response to this:

The controller previously only watched ProfileBundle objects. When the profileparser Deployment's pods changed state, the controller was never notified.

Adding Owns means any change to the owned Deployment triggers a reconciliation of the parent ProfileBundle, so the controller is responsive to pod lifecycle events.

Also, once the controller found an existing pod with no startup error, it exited the controller reconcilation loop without requeue — regardless of whether the ProfileBundle was still in PENDING state. If the profileparser hadn't finished (or never ran due to a rollout delay), the controller would never check again.

This commit also updates the profile bundle controller to requeues every 10 seconds while the status is still DataStreamPending, ensuring the controller keeps monitoring until the profileparser either succeeds (sets VALID) or fails (sets INVALID / pod startup error detected).

This should improve the resilience of profile bundle parsing, especially in testing, where we delete deployments after modifying the profile bundle image to simulate operator updates.

Assisted-By: Opus 4.6

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

rhmdnd · 2026-03-03T01:20:28Z

Clean serial run - which is a good sign. Retesting to see if we can recreate the transient issue.

/test e2e-aws-serial-arm
/test e2e-aws-serial

rhmdnd · 2026-03-03T02:35:42Z

Failed to get a cluster, didn't make it to the serial tests:

/test e2e-aws-serial-arm
/test e2e-aws-serial

xiaojiey · 2026-03-03T09:23:21Z

/retest

openshift-ci · 2026-03-03T09:55:10Z

@rhmdnd: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-rosa	`918f62d`	link	true	`/test e2e-rosa`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

rhmdnd · 2026-03-03T12:22:16Z

/test e2e-aws-serial-arm
/test e2e-aws-serial

Vincent056 · 2026-03-03T18:30:07Z

pkg/controller/profilebundle/profilebundle_controller.go

 	return ctrl.NewControllerManagedBy(mgr).
 		Named("profilebundle-controller").
 		For(&compliancev1alpha1.ProfileBundle{}).
+		Owns(&appsv1.Deployment{}).


do we need to have this here? we will need to set ownership to the pb deployment for this to work. I am wondering if you could try without this line, I think the logic https://github.com/ComplianceAsCode/compliance-operator/pull/1100/changes#diff-60a126c27481c606ddae6ce2665cca69a035cd1bf59c98ee286752639ada0edaR280 here is enough here

Vincent056 · 2026-03-03T18:35:29Z

Porting @xiaojiey's comment from #1098

I tested with a special scenario, when setting profilebundle to a non-exist image. I can see the requeue working as expected. The only concern is the unconditional requeueing every 10 seconds while PENDING, for example this CrashLoopBackOff scenario when the profileparser container crashes repeatedly.

$ kubectl patch profilebundle ocp4 -n openshift-compliance --type=merge \ \
    -p '{"spec":{"contentImage":"quay.io/nonexistent/invalid:latest"}}'
profilebundle.compliance.openshift.io/ocp4 patched
$ oc get pb -w
NAME     CONTENTIMAGE                                 CONTENTFILE         STATUS
ocp4     quay.io/nonexistent/invalid:latest           ssg-rhcos4-ds.xml   PENDING
rhcos4   ghcr.io/complianceascode/k8scontent:latest   ssg-rhcos4-ds.xml   VALID
$ oc get pod -w
NAME                                             READY   STATUS              RESTARTS      AGE
compliance-operator-69ccf667d-kknvb              1/1     Running             2 (22m ago)   22m
ocp4-openshift-compliance-pp-5489bf48bd-rrmnj    0/1     Init:ErrImagePull   0             29s
ocp4-openshift-compliance-pp-7489f9c4f8-pjtfg    1/1     Running             0             3m42s
rhcos4-openshift-compliance-pp-9d8c7f955-jtc64   1/1     Running             0             21m
ocp4-openshift-compliance-pp-5489bf48bd-rrmnj    0/1     Init:ImagePullBackOff   0             44s
ocp4-openshift-compliance-pp-5489bf48bd-rrmnj    0/1     Init:ErrImagePull       0             59s
ocp4-openshift-compliance-pp-5489bf48bd-rrmnj    0/1     Init:ImagePullBackOff   0             71s
ocp4-openshift-compliance-pp-5489bf48bd-rrmnj    0/1     Init:ErrImagePull       0             109s
ocp4-openshift-compliance-pp-5489bf48bd-rrmnj    0/1     Init:ImagePullBackOff   0             2m1s
ocp4-openshift-compliance-pp-5489bf48bd-rrmnj    0/1     Init:ErrImagePull       0             3m20s
$ oc logs pod/compliance-operator-69ccf667d-kknvb | grep requeueing | tail -n 10
{"level":"info","ts":"2026-02-28T12:02:56.892Z","logger":"profilebundlectrl","msg":"ProfileBundle still pending, requeueing to check status","Request.Namespace":"openshift-compliance","Request.Name":"ocp4"}
{"level":"info","ts":"2026-02-28T12:03:06.893Z","logger":"profilebundlectrl","msg":"ProfileBundle still pending, requeueing to check status","Request.Namespace":"openshift-compliance","Request.Name":"ocp4"}
{"level":"info","ts":"2026-02-28T12:03:16.894Z","logger":"profilebundlectrl","msg":"ProfileBundle still pending, requeueing to check status","Request.Namespace":"openshift-compliance","Request.Name":"ocp4"}
{"level":"info","ts":"2026-02-28T12:03:26.895Z","logger":"profilebundlectrl","msg":"ProfileBundle still pending, requeueing to check status","Request.Namespace":"openshift-compliance","Request.Name":"ocp4"}
{"level":"info","ts":"2026-02-28T12:03:36.896Z","logger":"profilebundlectrl","msg":"ProfileBundle still pending, requeueing to check status","Request.Namespace":"openshift-compliance","Request.Name":"ocp4"}
{"level":"info","ts":"2026-02-28T12:03:46.898Z","logger":"profilebundlectrl","msg":"ProfileBundle still pending, requeueing to check status","Request.Namespace":"openshift-compliance","Request.Name":"ocp4"}
{"level":"info","ts":"2026-02-28T12:03:56.899Z","logger":"profilebundlectrl","msg":"ProfileBundle still pending, requeueing to check status","Request.Namespace":"openshift-compliance","Request.Name":"ocp4"}
{"level":"info","ts":"2026-02-28T12:04:06.900Z","logger":"profilebundlectrl","msg":"ProfileBundle still pending, requeueing to check status","Request.Namespace":"openshift-compliance","Request.Name":"ocp4"}
{"level":"info","ts":"2026-02-28T12:04:16.902Z","logger":"profilebundlectrl","msg":"ProfileBundle still pending, requeueing to check status","Request.Namespace":"openshift-compliance","Request.Name":"ocp4"}
{"level":"info","ts":"2026-02-28T12:05:39.379Z","logger":"profilebundlectrl","msg":"ProfileBundle still pending, requeueing to check status","Request.Namespace":"openshift-compliance","Request.Name":"ocp4"}

Yeah - that's a good question. What's the most appropriate state for a ProfileBundle if the image is incorrect? I would think either PENDING or INVALID.

Without this PR, wouldn't the ProfileBundle be in PENDING state indefinitely since the profile parser container wouldn't be able to pull the content image? Is it essentially the same behavior but just more verbose about retrying every 10 seconds?

maybe we can use exponential backoff here?

rhmdnd requested review from Vincent056, xiaojiey and yuumasato February 28, 2026 14:37

openshift-ci bot requested a review from Anna-Koudelkova February 28, 2026 14:37

openshift-ci bot added the approved label Feb 28, 2026

rhmdnd mentioned this pull request Feb 28, 2026

Expand ownership check for profile bundle controller #1098

Closed

rhmdnd force-pushed the fix-profile-bundle-ownership branch from 6135783 to 56af613 Compare March 2, 2026 14:16

rhmdnd mentioned this pull request Mar 2, 2026

CMP-3800: Add e2e test (33431) for ComplianceCheckResult label queries #981

Merged

rhmdnd force-pushed the fix-profile-bundle-ownership branch from 56af613 to 918f62d Compare March 2, 2026 21:23

rhmdnd changed the title ~~Expand ownership check for profile bundle controller~~ CMP-4117: Expand ownership check for profile bundle controller Mar 2, 2026

openshift-ci-robot added the jira/valid-reference label Mar 2, 2026

rhmdnd mentioned this pull request Mar 3, 2026

CMP-3846: Add test for successful compliance operator deletion (54055) #1007

Open

Vincent056 reviewed Mar 3, 2026

View reviewed changes

rhmdnd mentioned this pull request Mar 3, 2026

CMP-3814: Enhance TestScanHasProfileGUID to include OCP-47044 logic #1053

Merged

Conversation

rhmdnd commented Feb 28, 2026

Uh oh!

openshift-ci bot commented Feb 28, 2026

Uh oh!

rhmdnd commented Feb 28, 2026

Uh oh!

rhmdnd commented Feb 28, 2026

Uh oh!

github-actions bot commented Feb 28, 2026

Uh oh!

rhmdnd commented Feb 28, 2026

Uh oh!

rhmdnd commented Feb 28, 2026

Uh oh!

rhmdnd commented Feb 28, 2026

Uh oh!

rhmdnd commented Mar 1, 2026

Uh oh!

rhmdnd commented Mar 1, 2026

Uh oh!

rhmdnd commented Mar 1, 2026

Uh oh!

rhmdnd commented Mar 1, 2026

Uh oh!

rhmdnd commented Mar 2, 2026

Uh oh!

rhmdnd commented Mar 2, 2026

Uh oh!

github-actions bot commented Mar 2, 2026

Uh oh!

github-actions bot commented Mar 2, 2026

Uh oh!

openshift-ci-robot commented Mar 2, 2026

Uh oh!

rhmdnd commented Mar 3, 2026

Uh oh!

rhmdnd commented Mar 3, 2026

Uh oh!

xiaojiey commented Mar 3, 2026

Uh oh!

openshift-ci bot commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rhmdnd commented Mar 3, 2026

Uh oh!

Vincent056 Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Vincent056 commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

openshift-ci bot commented Mar 3, 2026 •

edited

Loading

Vincent056 Mar 3, 2026 •

edited

Loading