CMP-4117: Expand ownership check for profile bundle controller#1100
CMP-4117: Expand ownership check for profile bundle controller#1100rhmdnd wants to merge 1 commit intoComplianceAsCode:masterfrom
Conversation
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: rhmdnd The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
Porting @xiaojiey's comment from #1098
Yeah - that's a good question. What's the most appropriate state for a ProfileBundle if the image is incorrect? I would think either Without this PR, wouldn't the ProfileBundle be in |
|
I rebased this to pull in #1093 - which was affecting the parallel tests. |
|
🤖 To deploy this PR, run the following command: |
|
/retest-required |
2 similar comments
|
/retest-required |
|
/retest-required |
|
Images are failing to build in CI for some reason. Looks unrelated to this change, hence all the rechecks. |
|
/retest-required |
2 similar comments
|
/retest-required |
|
/retest-required |
|
Got one green run on the serial tests - rechecking to see if this helps with the transient issues we've been seeing recently. /test e2e-aws-serial |
6135783 to
56af613
Compare
|
Updating the controller with some better logging to trace through what's happening with the pods such that the profile bundle is stuck in a pending state, despite the controller requeuing the request when it should. |
|
🤖 To deploy this PR, run the following command: |
The controller previously only watched ProfileBundle objects. When the profileparser Deployment's pods changed state, the controller was never notified. Adding Owns means any change to the owned Deployment triggers a reconciliation of the parent ProfileBundle, so the controller is responsive to pod lifecycle events. Also, once the controller found an existing pod with no startup error, it exited the controller reconcilation loop without requeue — regardless of whether the ProfileBundle was still in PENDING state. If the profileparser hadn't finished (or never ran due to a rollout delay), the controller would never check again. This commit also updates the profile bundle controller to requeues every 10 seconds while the status is still DataStreamPending, ensuring the controller keeps monitoring until the profileparser either succeeds (sets VALID) or fails (sets INVALID / pod startup error detected). In particular, if the controller detects that the init containers for the profileparser have completed and the bundle is still in a PENDING state, we're deadlocked, and it should annotate the profileparser pod to rerun. This should improve the resilience of profile bundle parsing, especially in testing, where we delete deployments after modifying the profile bundle image to simulate operator updates. Assisted-By: Opus 4.6
56af613 to
918f62d
Compare
|
🤖 To deploy this PR, run the following command: |
|
@rhmdnd: This pull request references CMP-4117 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Clean serial run - which is a good sign. Retesting to see if we can recreate the transient issue. /test e2e-aws-serial-arm |
|
Failed to get a cluster, didn't make it to the serial tests: /test e2e-aws-serial-arm |
|
/retest |
|
@rhmdnd: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/test e2e-aws-serial-arm |
| return ctrl.NewControllerManagedBy(mgr). | ||
| Named("profilebundle-controller"). | ||
| For(&compliancev1alpha1.ProfileBundle{}). | ||
| Owns(&appsv1.Deployment{}). |
There was a problem hiding this comment.
do we need to have this here? we will need to set ownership to the pb deployment for this to work. I am wondering if you could try without this line, I think the logic https://github.com/ComplianceAsCode/compliance-operator/pull/1100/changes#diff-60a126c27481c606ddae6ce2665cca69a035cd1bf59c98ee286752639ada0edaR280 here is enough here
maybe we can use exponential backoff here? |
The controller previously only watched ProfileBundle objects. When the profileparser Deployment's pods changed state, the controller was never notified.
Adding Owns means any change to the owned Deployment triggers a reconciliation of the parent ProfileBundle, so the controller is responsive to pod lifecycle events.
Also, once the controller found an existing pod with no startup error, it exited the controller reconcilation loop without requeue — regardless of whether the ProfileBundle was still in PENDING state. If the profileparser hadn't finished (or never ran due to a rollout delay), the controller would never check again.
This commit also updates the profile bundle controller to requeues every 10 seconds while the status is still DataStreamPending, ensuring the controller keeps monitoring until the profileparser either succeeds (sets VALID) or fails (sets INVALID / pod startup error detected).
This should improve the resilience of profile bundle parsing, especially in testing, where we delete deployments after modifying the profile bundle image to simulate operator updates.
Assisted-By: Opus 4.6