-
Notifications
You must be signed in to change notification settings - Fork 18
Add RelatedObject property when compliant dry run overrides policy mismatch #377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add RelatedObject property when compliant dry run overrides policy mismatch #377
Conversation
571d7b4 to
21bf25b
Compare
606467e to
6a59903
Compare
|
/cc @JustinKuli |
|
What is no-op mean? |
|
Are you going to add e2e test? |
| } | ||
|
|
||
| r.setEvaluatedObject(obj.policy, obj.existingObj, !throwSpecViolation, "") | ||
| r.setEvaluatedObject(obj.policy, obj.existingObj, !throwSpecViolation, "", false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain me "!throwSpecViolation", according to " // The spec didn't require a change but throwSpecViolation indicates the status didn't match. Handle" We set compliant when there is mismatch(changed by kube api, mutator) I am confused
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
throwSpecViolation is initially true when handleKeys() notices a difference between the policy fields and the actual object fields.
Afterwards, the dry run validation handles edge cases where handleKeys found a difference between unset and empty fields. If the dry run Update did not update any fields, the policy is assumed to be compliant, and throwSpecViolation is inverted.
| // There are situations where updateNeeded is wrong in either direction: an update might not be |
In this case ACM-14577, the bad scc annotation in the policy is technically invalid, but the dry run validation can’t detect the error. During the dry run update request, an admission controller intercepts the request body and silently fixes the invalid annotation before the api server returns a response. This means the dry run update thinks the object matches the policy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
!throwSpecViolation => noncompliant in setEvaluatedObject func right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also can you add comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In ACM-14577, !throwSpecViolation evaluates to true, compliant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okie Can you add comment for this
api/v1/configurationpolicy_types.go
Outdated
| // DryRunNoOpOverride indicates that a difference was detected between the policy and the | ||
| // object, but a dry run update reported no changes. The dry run result overrides the policy | ||
| // assessment, meaning the object is considered compliant despite the initial mismatch. | ||
| DryRunNoOpOverride *bool `json:"dryRunNoOpOverride,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe MatchesAfterDryRun would be a better name? "Override" here makes me feel more like it's an option that the user could specify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this might not need to be a pointer.
|
@JustinKuli What do you think about skipping the E2E test for this? We could use a Gatekeeper mutation to test it instead, but if you think that's too much, I'm okay with skipping the E2E test. |
|
@yiraeChristineKim if creating an e2e test for this in a KinD cluster is too much of a burden, I would be ok with a test that just runs in our framework tests against an openshift cluster, since the openshift SCC annotation example would be relatively simple to test. However, I'd want us to try to make a test here first. I think that k8s will default the Or, @jan-law could find a better example for a test. |
5eca769 to
65d7d38
Compare
ref: https://issues.redhat.com/browse/ACM-14577 Tests scenarios where the policy reports a mismatched object, but the dry run validation reports a compliant object due to the apiserver adding or modifying fields in the request body. This commit also prevents timeouts in case40 by wrapping assertions with Eventually. Signed-off-by: Janelle Law <[email protected]>
65d7d38 to
f353b3a
Compare
| annotations: | ||
| test: test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Could you help me understand why the policy in case8 with metadataComplianceType: musthave is NonCompliant when both the policy and the service yaml do not contain any annotations:?
See diff below when I removed lines 6-7 annotations: test:test from the service.
compliant: NonCompliant
lastEvaluated: "2025-08-11T14:41:05Z"
lastEvaluatedGeneration: 1
relatedObjects:
- compliant: NonCompliant
object:
apiVersion: v1
kind: Service
metadata:
name: grc-policy-propagator-metrics
namespace: managed
properties:
createdByPolicy: false
diff: |
--- managed/grc-policy-propagator-metrics : existing
+++ managed/grc-policy-propagator-metrics : updated
@@ -1,9 +1,8 @@
apiVersion: v1
kind: Service
metadata:
- annotations: {}
creationTimestamp: "2025-08-11T14:40:35Z"
name: grc-policy-propagator-metrics
namespace: managed
resourceVersion: "2513"
uid: ddf71564-cba9-41fa-809f-b742bd85ab50
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's very odd!
Maybe https://github.com/open-cluster-management-io/config-policy-controller/blob/main/controllers/configurationpolicy_controller.go#L3524 is to blame... it seems like that could make the existing object "look like" it has an empty annotations object there when it really doesn't. Maybe the SetAnnotations and SetLabels calls should only be done when there are annotations/labels.
I think we should remove the annotation from your test YAML, and make sure that this situation works as we'd expect. But I'm surprised that no other tests are hitting this odd behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks to your test, I was able to narrow down what seems to be happening. When the service annotations are empty, the reflect.DeepEqual(dryRunUpdatedObj.Object, existingObjectCopy.Object) is coming back false because one of the objects has an empty annotations map, and the other has no annotations map. I don't exactly know why they are different, and especially why the same situation is not happening with the empty labels.
But adjusting this existing function seems like it could handle it (I've added the last 3 lines):
func removeFieldsForComparison(obj *unstructured.Unstructured) {
unstructured.RemoveNestedField(obj.Object, "metadata", "managedFields")
unstructured.RemoveNestedField(
obj.Object, "metadata", "annotations", "kubectl.kubernetes.io/last-applied-configuration",
)
// The generation might actually bump but the API output might be the same.
unstructured.RemoveNestedField(obj.Object, "metadata", "generation")
if len(obj.GetAnnotations()) == 0 {
unstructured.RemoveNestedField(obj.Object, "metadata", "annotations")
}
}
JustinKuli
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent tests, and nice work adding that field! I'm suggesting changing many of the return cases in checkAndUpdateResource to default to "false" for the new field, but maybe I'm missing a reason for it to be true?
| CreatedByPolicy reports whether the object was created by the configuration policy, which is | ||
| important when pruning is configured. | ||
| type: boolean | ||
| matchesAfterDryRun: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OperatorPolicy doesn't need this field, does it?
We can add an additional patch to the CRD to remove it, see deploy/crds/kustomize_operatorpolicy/remove-diff.json (and it will need to be referenced in deploy/crds/kustomize_operatorpolicy/kustomization.yaml)
17decb0 to
d28f3a2
Compare
ref: https://issues.redhat.com/browse/ACM-14577 Signed-off-by: Janelle Law <[email protected]>
While testing https://issues.redhat.com/browse/ACM-14577 , some objects reported missing annotations as an empty map while other objects omitted the annotations key. Handle empty annotations before comparing objects. Signed-off-by: Janelle Law <[email protected]>
d28f3a2 to
981f733
Compare
JustinKuli
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for the diligent work here!
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jan-law, JustinKuli The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
d99383c
into
open-cluster-management-io:main
Previously, the policy status would incorrectly have a noncompliant event followed immediately by a compliant event when a dryrun update would not make any changes on that object. This would cause the policy status to repeatedly update if the policy was reconciling often - for example if any of its watched objects were updating. Now only the correct "compliant" event should be emitted in that case. NOTE: This is a backport, but not just a cherry-pick. This incorporates some of the changes from these upstream PRs, with updates for tests: - open-cluster-management-io/config-policy-controller#404 - open-cluster-management-io/config-policy-controller#377 Refs: - https://issues.redhat.com/browse/ACM-25742 Signed-off-by: Justin Kulikauskas <[email protected]>
Previously, the policy status would incorrectly have a noncompliant event followed immediately by a compliant event when a dryrun update would not make any changes on that object. This would cause the policy status to repeatedly update if the policy was reconciling often - for example if any of its watched objects were updating. Now only the correct "compliant" event should be emitted in that case. NOTE: This is a backport, but not just a cherry-pick. This incorporates some of the changes from these upstream PRs, with updates for tests: - open-cluster-management-io/config-policy-controller#404 - open-cluster-management-io/config-policy-controller#377 Refs: - https://issues.redhat.com/browse/ACM-25742 Signed-off-by: Justin Kulikauskas <[email protected]>
Previously, the policy status would incorrectly have a noncompliant event followed immediately by a compliant event when a dryrun update would not make any changes on that object. This would cause the policy status to repeatedly update if the policy was reconciling often - for example if any of its watched objects were updating. Now only the correct "compliant" event should be emitted in that case. NOTE: This is a backport, but not just a cherry-pick. This incorporates some of the changes from these upstream PRs, with updates for tests: - open-cluster-management-io/config-policy-controller#404 - open-cluster-management-io/config-policy-controller#377 Refs: - https://issues.redhat.com/browse/ACM-25742 Signed-off-by: Justin Kulikauskas <[email protected]>
Previously, the policy status would incorrectly have a noncompliant event followed immediately by a compliant event when a dryrun update would not make any changes on that object. This would cause the policy status to repeatedly update if the policy was reconciling often - for example if any of its watched objects were updating. Now only the correct "compliant" event should be emitted in that case. NOTE: This is a backport, but not just a cherry-pick. This incorporates some of the changes from these upstream PRs, with updates for tests: - open-cluster-management-io/config-policy-controller#404 - open-cluster-management-io/config-policy-controller#377 Refs: - https://issues.redhat.com/browse/ACM-25742 Signed-off-by: Justin Kulikauskas <[email protected]>
Previously, the policy status would incorrectly have a noncompliant event followed immediately by a compliant event when a dryrun update would not make any changes on that object. This would cause the policy status to repeatedly update if the policy was reconciling often - for example if any of its watched objects were updating. Now only the correct "compliant" event should be emitted in that case. NOTE: This is a backport, but not just a cherry-pick. This incorporates some of the changes from these upstream PRs, with updates for tests: - open-cluster-management-io/config-policy-controller#404 - open-cluster-management-io/config-policy-controller#377 Refs: - https://issues.redhat.com/browse/ACM-25742 Signed-off-by: Justin Kulikauskas <[email protected]>

When configuration policies detect desired vs actual object mismatches, the dry run update validation sometimes responds with a "no-op", aka "no changes required" due to apiserver webhooks or mutating admission controllers modifying requests. The policy is assumed to be compliant without indicating any potential mismatch to users.
Since the requests are modified at the apiserver, we do not have other means of validating the policy before making the dry run request. To improve user experience when troubleshooting this scenario, this PR adds the property
(edit:DryRunNoOpOverrideMatchesAfterDryRun) when the policy detects a mismatch AND the no-op dry run update overrode the policy assessment.The
RelatedObjectPropertiesobject will appear for policies in any remediation mode. Previously, all properties were hidden for all compliant objects inInformmode.ref: https://issues.redhat.com/browse/ACM-14577