test: add inline event assertions for resource template events#7300
test: add inline event assertions for resource template events#7300nXtCyberNet wants to merge 1 commit intokarmada-io:masterfrom
Conversation
|
@nXtCyberNet: The label(s) DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
Welcome @nXtCyberNet! It looks like this is your first PR to karmada-io/karmada 🎉 |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly improves the end-to-end testing framework by embedding event assertions directly into existing test suites. The primary goal is to provide more robust validation of system behavior by checking for expected events during resource propagation and override operations. Additionally, it lays the groundwork for adopting the newer Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request enhances E2E test coverage by adding inline event assertions for resource template events. It also introduces foundational support for the events.k8s.io/v1 API. The changes are good, but there are opportunities to improve code maintainability by reducing duplication in the test files. I've added a few suggestions to refactor the new test code into helper functions. Additionally, a small simplification in the new event waiting utility is proposed for consistency.
| ginkgo.By("checking ApplyOverridePolicySucceed event on deployment", func() { | ||
| framework.WaitEventFitWith(kubeClient, deploymentNamespace, deploymentName, | ||
| func(event corev1.Event) bool { | ||
| return event.Reason == events.EventReasonApplyOverridePolicySucceed | ||
| }) | ||
| }) |
There was a problem hiding this comment.
This event checking logic is duplicated in several places within this file (e.g., lines 295-300, 445-450, 733-738, 829-834, 989-994). To improve maintainability, consider extracting this into a helper function. For example:
func expectEvent(kubeClient kubernetes.Interface, namespace, name, reason string) {
framework.WaitEventFitWith(kubeClient, namespace, name, func(event corev1.Event) bool {
return event.Reason == reason
})
}Then you can simplify the call site:
ginkgo.By("checking ApplyOverridePolicySucceed event on deployment", func() {
expectEvent(kubeClient, deploymentNamespace, deploymentName, events.EventReasonApplyOverridePolicySucceed)
})| ginkgo.By("checking ApplyPolicySucceed event", func() { | ||
| framework.WaitEventFitWith(kubeClient, deploymentNamespace, deploymentName, | ||
| func(event corev1.Event) bool { | ||
| return event.Reason == events.EventReasonApplyPolicySucceed | ||
| }) | ||
| }) | ||
|
|
||
| ginkgo.By("checking GetReplicasSucceed event", func() { | ||
| framework.WaitEventFitWith(kubeClient, deploymentNamespace, deploymentName, | ||
| func(event corev1.Event) bool { | ||
| return event.Reason == events.EventReasonGetReplicasSucceed | ||
| }) | ||
| }) |
There was a problem hiding this comment.
There is significant code duplication in this test for event assertions (e.g., here and in subsequent ginkgo.By blocks for ScheduleBindingSucceed, SyncWorkSucceed, etc.). To make the test more readable and maintainable, I recommend creating a helper function for event checking. For example:
func expectEvent(kubeClient kubernetes.Interface, namespace, name, reason string) {
framework.WaitEventFitWith(kubeClient, namespace, name, func(event corev1.Event) bool {
return event.Reason == reason
})
}This would simplify your test blocks to:
ginkgo.By("checking ApplyPolicySucceed event", func() {
expectEvent(kubeClient, deploymentNamespace, deploymentName, events.EventReasonApplyPolicySucceed)
})
ginkgo.By("checking GetReplicasSucceed event", func() {
expectEvent(kubeClient, deploymentNamespace, deploymentName, events.EventReasonGetReplicasSucceed)
})There was a problem hiding this comment.
Pull request overview
This PR expands E2E test coverage for Karmada controller-emitted events by adding inline event assertions to existing propagation and override test flows, and adds initial framework support for querying events.k8s.io/v1 Events.
Changes:
- Add inline
WaitEventFitWith(...)assertions for multiple event reasons in existing PropagationPolicy E2E scenarios (Deployment/Service/Job). - Add inline
WaitEventFitWith(...)assertions forApplyOverridePolicySucceedin existing OverridePolicy E2E scenarios. - Extend the E2E framework with a
WaitEventV1FitWith(...)helper to queryevents.k8s.io/v1Events.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| test/e2e/suites/base/propagationpolicy_test.go | Adds inline assertions validating expected event reasons during resource propagation flows. |
| test/e2e/suites/base/overridepolicy_test.go | Adds inline assertions validating override-application event emission in override scenarios. |
| test/e2e/framework/events.go | Introduces a helper to query events.k8s.io/v1 events (foundation for future migration). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #7300 +/- ##
==========================================
+ Coverage 42.02% 42.04% +0.01%
==========================================
Files 874 874
Lines 53547 53551 +4
==========================================
+ Hits 22502 22513 +11
+ Misses 29353 29348 -5
+ Partials 1692 1690 -2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This will close the target issue. Please adjust the wording: Fixes part of xx |
test/e2e/framework/events.go
Outdated
| } | ||
|
|
||
| // WaitEventV1FitWith uses events.k8s.io/v1 to do the same thing as WaitEventFitWith | ||
| func WaitEventV1FitWith(kubeClient kubernetes.Interface, namespace string, involvedObj string, fit func(eventsv1.Event) bool) { |
There was a problem hiding this comment.
Support for eventsv1 can be added later.
There was a problem hiding this comment.
@XiShanYongYe-Chang btw I didn’t use slices.ContainsFunc here because of the somewhat unpredictable nature of the events/v1 API (ordering, transient updates, etc.). does this issue still exist in newer versions, or is it safe to use ContainsFunc now? just asking for my knowledge, thanks.
There was a problem hiding this comment.
I don't quite understand why we can't use ContainsFunc.
|
Once this PR is ready, you can squash all the commits into one and then ask me to review it. |
befa68d to
b0b462e
Compare
Hi, I’ve squashed all the commits into one. The PR is ready for review, please take a look |
|
Thanks |
| return true | ||
| }) | ||
| }) | ||
| ginkgo.By("checking GetDependenciesSucceed event on deployment", func() { |
There was a problem hiding this comment.
There are also dependency distribution scenarios for other resources in this test file. Why were these not tested together?
There was a problem hiding this comment.
You're right — those scenarios were missed earlier as each When block creates isolated deployments; I'll added the dependency propagation assertions for all remaining resources.
| deployment.GetLabels()["foo"] == "exist" && | ||
| deployment.GetLabels()["app"] == "nginx" | ||
| }) | ||
| ginkgo.By("checking ApplyOverridePolicySucceed event on deployment", func() { |
There was a problem hiding this comment.
The check on an event can be moved before the check on resource distribution, as shown in line 116.
Can the same test be performed on ClusterOverridePolicy?
There was a problem hiding this comment.
Hi @XiShanYongYe-Chang, thank you for the feedback. I’ll move the event assertion before the resource distribution check in all affected It blocks. Currently, this PR is focused on the resource template subtask; I’ll add ApplyOverridePolicySucceed assertions to the ClusterOverridePolicy tests in another PR
| framework.CreatePropagationPolicy(karmadaClient, highPriorityPolicy) | ||
| framework.WaitDeploymentPresentOnClusterFitWith(preemptingClusterName, deployment.Namespace, deployment.Name, func(*appsv1.Deployment) bool { return true }) | ||
| }) | ||
| ginkgo.By("checking PreemptPolicySucceed event on deployment", func() { |
There was a problem hiding this comment.
Similarly, I think event checks can be moved before resource checks.
| return true | ||
| }) | ||
|
|
||
| patch := []map[string]any{{"op": "replace", "path": "/spec/ports/0/port", "value": updateServicePort}} |
There was a problem hiding this comment.
Why do we need to remove these codes? Same as below.
| return *deployment.Spec.Replicas == updateDeploymentReplicas | ||
| }) | ||
| // resource template events — fired on the Deployment object itself | ||
| ginkgo.By("checking ApplyPolicySucceed event", func() { |
There was a problem hiding this comment.
I think the event check here can be divided into two parts. The first part is the event for resource distribution, which can be checked before WaitDeploymentPresentOnClustersFitWith. The second part is the event for status collection, which can be checked after WaitDeploymentPresentOnClustersFitWith. WDYT
There was a problem hiding this comment.
One concern I want to raise before finalizing the grouping approach:
Since Kubernetes events are best-effort and not guaranteed to be delivered (they can be dropped under memory pressure or aggregated), is there a risk that WaitEventFitWith could cause a flaky test failure simply because an event was dropped rather than because the feature actually broke?
I noticed the existing event assertions in the codebase (e.g. WorkDispatching, DependencyPolicyConflict) make the same assumption. Just wanted to confirm this tradeoff is acceptable for this PR as well before proceeding — I can adjust the approach if needed.
There was a problem hiding this comment.
One follow-up thought — if events are dropped, wouldn’t the existing By block checks also fail for the same reason? In that case, the flakiness risk isn’t specific to WaitEventFitWith but applies to the current approach as well.
There was a problem hiding this comment.
Very good question.
I can adjust the approach if needed.
Do you have any good ideas?
There was a problem hiding this comment.
One follow-up thought — if events are dropped, wouldn’t the existing By block checks also fail for the same reason? In that case, the flakiness risk isn’t specific to WaitEventFitWith but applies to the current approach as well.
@gemini-code-assist
Do you have any good suggestions?
There was a problem hiding this comment.
That's a very insightful question, nXtCyberNet. You've hit on a crucial point regarding the reliability of Kubernetes events in E2E tests. You are absolutely correct that Kubernetes events are best-effort and not guaranteed to be delivered, which can indeed introduce flakiness into tests that strictly rely on their presence.
Here are some suggestions to consider when dealing with event assertions in E2E tests, especially when flakiness is a concern:
-
Prioritize State-Based Assertions: For most E2E tests, the primary goal is to verify the observable end-state of the system. For instance, in the "deployment propagation testing" scenario, the most critical assertion is that the
Deploymentis correctly present and configured on the member clusters. TheApplyPolicySucceedevent is an indication that the policy application process started or completed successfully, but the ultimate success is reflected in the resource's actual state. -
Event Checks for Diagnostics/Observability: Events are invaluable for debugging and understanding the internal flow of the system. If an event assertion is primarily for diagnostic purposes (i.e., the test would still pass if the event was missed but the desired state was achieved), consider making it a "soft" assertion. This could involve:
- Non-blocking checks: Instead of
WaitEventFitWith(which blocks and fails on timeout), you could use a non-blocking check that attempts to find the event within a very short period and logs its presence or absence without failing the test. This would require a different helper function or a custom implementation within the test. - Conditional logging: Log if the event is found, but don't fail the test if it's not, as long as the primary state assertion passes.
- Non-blocking checks: Instead of
-
Critical Event Assertions: If an event itself represents a critical, observable outcome that cannot be directly verified by the resource's state (e.g., an audit event that must be generated for compliance), then
WaitEventFitWithis appropriate. In such cases, the flakiness risk due to event delivery guarantees is an inherent trade-off. To mitigate this, ensure thatWaitEventFitWithhas sufficiently generous timeouts and retry mechanisms to account for potential delays in event propagation, but be aware that a truly dropped event will still cause a failure.
For the ApplyPolicySucceed event in the current snippet, since its success is largely reflected by the WaitDeploymentPresentOnClustersFitWith call, it might be a candidate for a more lenient, diagnostic-focused check if the goal is to reduce flakiness. However, if the project explicitly wants to assert on the presence of this event as a strong signal, then the current WaitEventFitWith is acceptable, with the understanding of the potential flakiness.
Given the PR's focus on adding event assertions, a good compromise might be to ensure that WaitEventFitWith is configured with reasonable timeouts and retries, and to add comments clarifying the intent behind each event assertion (e.g., whether it's a critical functional check or a diagnostic aid).
There was a problem hiding this comment.
Based on the above analysis, I think that our current configuration with a longer timeout should be acceptable. If we find out later that events are being dropped, then we can consider the Non-blocking checks: again. WDYT @nXtCyberNet
|
Hi @nXtCyberNet The first PR may have a lot of reviews, but subsequent PRs should be much better if they are based on this template. |
63b48eb to
a2448d4
Compare
b462171 to
6f47ecf
Compare
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Signed-off-by: nXtCyberNet <rohantech2005@gmail.com>
6f47ecf to
bad794f
Compare
What type of PR is this?
/kind feature
/kind test
What this PR does / why we need it:
This PR adds inline event assertions to existing E2E test cases to improve event coverage for resource template events, as discussed in the related issue.
Following the community decision to use Approach 2 (Inline Event Assertions), event checks are added directly inside existing test cases at the point where the relevant operation completes — no new test cases or files are introduced.
What changed
test/e2e/framework/events.goWaitEventV1FitWithhelper to supportevents.k8s.io/v1Event API alongside the existingcore/v1helper (foundation for future migration)test/e2e/suites/base/propagationpolicy_test.goAdded inline event assertions to existing Deployment, Service, and Job propagation test cases:
ApplyPolicySucceed— verified on Deployment, Service, JobGetReplicasSucceed— verified on Deployment, JobScheduleBindingSucceed— verified on Deployment, Service, JobSyncWorkSucceed— verified on Deployment, Service, JobAggregateStatusSucceed— verified on Deployment, JobReflectStatusSucceed— verified on Work object during Deployment propagationInterpretHealthSucceed— verified on Work object during Deployment propagationSyncWorkloadSucceed— verified on Work object during Deployment propagationtest/e2e/suites/base/overridepolicy_test.goAdded inline event assertions to existing override policy test cases:
ApplyOverridePolicySucceed— verified across OverridePolicy and OverrideRules test cases with Deploymenttest/e2e/suites/base/preemption_test.goAdded inline event assertions to existing preemption test cases:
PreemptPolicySucceed— verified when high-priority PropagationPolicy preempts low-priority PropagationPolicy, when PropagationPolicy preempts ClusterPropagationPolicy, and when high-priority ClusterPropagationPolicy preempts low-priority ClusterPropagationPolicytest/e2e/suites/base/dependenciesdistributor_test.goAdded inline event assertions to existing dependency propagation test cases:
GetDependenciesSucceed— verified in configmap automatic propagation tests with both PropagationPolicy and ClusterPropagationPolicyWhich issue(s) this PR fixes:
Fixes part of #7252
Special notes for your reviewer:
How was this tested
All assertions were verified by running the E2E suite locally against a kind-based Karmada setup and confirming the expected events fire during the propagation flow.
Notes
GetDependenciesFailed,GetReplicasFailed, andApplyPolicyFailedare not covered in this PR as they require specific failure conditions that are difficult to construct reliably in E2E testsevents.k8s.io/v1support added inframework/events.gois not yet used in assertions — it is foundation work for the planned migration fromGetEventRecorderFortoGetEventRecorderDoes this PR introduce a user-facing change?: