Skip to content

test: add inline event assertions for resource template events#7300

Open
nXtCyberNet wants to merge 1 commit intokarmada-io:masterfrom
nXtCyberNet:issue/e2e_test
Open

test: add inline event assertions for resource template events#7300
nXtCyberNet wants to merge 1 commit intokarmada-io:masterfrom
nXtCyberNet:issue/e2e_test

Conversation

@nXtCyberNet
Copy link
Copy Markdown

@nXtCyberNet nXtCyberNet commented Mar 18, 2026

What type of PR is this?

/kind feature
/kind test


What this PR does / why we need it:

This PR adds inline event assertions to existing E2E test cases to improve event coverage for resource template events, as discussed in the related issue.

Following the community decision to use Approach 2 (Inline Event Assertions), event checks are added directly inside existing test cases at the point where the relevant operation completes — no new test cases or files are introduced.

What changed

test/e2e/framework/events.go

  • Added WaitEventV1FitWith helper to support events.k8s.io/v1 Event API alongside the existing core/v1 helper (foundation for future migration)

test/e2e/suites/base/propagationpolicy_test.go

Added inline event assertions to existing Deployment, Service, and Job propagation test cases:

  • ApplyPolicySucceed — verified on Deployment, Service, Job
  • GetReplicasSucceed — verified on Deployment, Job
  • ScheduleBindingSucceed — verified on Deployment, Service, Job
  • SyncWorkSucceed — verified on Deployment, Service, Job
  • AggregateStatusSucceed — verified on Deployment, Job
  • ReflectStatusSucceed — verified on Work object during Deployment propagation
  • InterpretHealthSucceed — verified on Work object during Deployment propagation
  • SyncWorkloadSucceed — verified on Work object during Deployment propagation

test/e2e/suites/base/overridepolicy_test.go

Added inline event assertions to existing override policy test cases:

  • ApplyOverridePolicySucceed — verified across OverridePolicy and OverrideRules test cases with Deployment

test/e2e/suites/base/preemption_test.go

Added inline event assertions to existing preemption test cases:

  • PreemptPolicySucceed — verified when high-priority PropagationPolicy preempts low-priority PropagationPolicy, when PropagationPolicy preempts ClusterPropagationPolicy, and when high-priority ClusterPropagationPolicy preempts low-priority ClusterPropagationPolicy

test/e2e/suites/base/dependenciesdistributor_test.go

Added inline event assertions to existing dependency propagation test cases:

  • GetDependenciesSucceed — verified in configmap automatic propagation tests with both PropagationPolicy and ClusterPropagationPolicy

Which issue(s) this PR fixes:

Fixes part of #7252


Special notes for your reviewer:

How was this tested

All assertions were verified by running the E2E suite locally against a kind-based Karmada setup and confirming the expected events fire during the propagation flow.

export KUBECONFIG=$HOME/.kube/karmada.config
go test ./test/e2e/suites/base/... \
  -v \
  --ginkgo.focus="BasicCase|OverridePolicy|OverrideRules|Preemption|DependenciesDistributor" \
  --ginkgo.timeout=60m \
  -- \
  --host-context=karmada-host \
  --karmada-context=karmada-apiserver \
  --poll-timeout=300s

Notes

  • GetDependenciesFailed, GetReplicasFailed, and ApplyPolicyFailed are not covered in this PR as they require specific failure conditions that are difficult to construct reliably in E2E tests
  • events.k8s.io/v1 support added in framework/events.go is not yet used in assertions — it is foundation work for the planned migration from GetEventRecorderFor to GetEventRecorder

Does this PR introduce a user-facing change?:

NONE

Copilot AI review requested due to automatic review settings March 18, 2026 17:15
@karmada-bot karmada-bot added the kind/feature Categorizes issue or PR as related to a new feature. label Mar 18, 2026
@karmada-bot
Copy link
Copy Markdown
Contributor

@nXtCyberNet: The label(s) kind/test cannot be applied, because the repository doesn't have them.

Details

In response to this:

What type of PR is this?

/kind feature
/kind test


What this PR does / why we need it:

This PR improves event coverage in existing E2E test cases by adding inline event assertions for resource template events.

Following the agreed community approach (Approach 2: Inline Event Assertions), event validations are integrated directly into existing test flows at the point where operations complete. This avoids introducing new test files while ensuring better observability and validation of system behavior.

Additionally, foundational support for the events.k8s.io/v1 API is introduced to enable future migration.


Which issue(s) this PR fixes:

Relates to #7252


Special notes for your reviewer:

  • Inline assertions are added without altering test structure or introducing new test cases
  • Focused on validating successful event flows across propagation, override, preemption, and dependency distribution scenarios
  • Failure event scenarios are intentionally excluded due to difficulty in reliably reproducing them in E2E environments
  • events.k8s.io/v1 support is added but not yet actively used in assertions (foundation for future migration)

Does this PR introduce a user-facing change?:

NONE

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@karmada-bot
Copy link
Copy Markdown
Contributor

Welcome @nXtCyberNet! It looks like this is your first PR to karmada-io/karmada 🎉

@karmada-bot karmada-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Mar 18, 2026
@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly improves the end-to-end testing framework by embedding event assertions directly into existing test suites. The primary goal is to provide more robust validation of system behavior by checking for expected events during resource propagation and override operations. Additionally, it lays the groundwork for adopting the newer events.k8s.io/v1 API for events, ensuring future compatibility and capabilities.

Highlights

  • Enhanced E2E Test Coverage: Inline event assertions were added to existing E2E test cases to improve coverage for resource template events, focusing on successful event flows across propagation, override, preemption, and dependency distribution scenarios.
  • Future API Migration Support: Foundational support for the events.k8s.io/v1 API was introduced, preparing for future migration, though it is not yet actively used in assertions.
  • Streamlined Event Validation: Event validations are now integrated directly into existing test flows at the point where operations complete, avoiding new test files and enhancing observability.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances E2E test coverage by adding inline event assertions for resource template events. It also introduces foundational support for the events.k8s.io/v1 API. The changes are good, but there are opportunities to improve code maintainability by reducing duplication in the test files. I've added a few suggestions to refactor the new test code into helper functions. Additionally, a small simplification in the new event waiting utility is proposed for consistency.

Comment on lines +125 to +130
ginkgo.By("checking ApplyOverridePolicySucceed event on deployment", func() {
framework.WaitEventFitWith(kubeClient, deploymentNamespace, deploymentName,
func(event corev1.Event) bool {
return event.Reason == events.EventReasonApplyOverridePolicySucceed
})
})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This event checking logic is duplicated in several places within this file (e.g., lines 295-300, 445-450, 733-738, 829-834, 989-994). To improve maintainability, consider extracting this into a helper function. For example:

func expectEvent(kubeClient kubernetes.Interface, namespace, name, reason string) {
	framework.WaitEventFitWith(kubeClient, namespace, name, func(event corev1.Event) bool {
		return event.Reason == reason
	})
}

Then you can simplify the call site:

ginkgo.By("checking ApplyOverridePolicySucceed event on deployment", func() {
	expectEvent(kubeClient, deploymentNamespace, deploymentName, events.EventReasonApplyOverridePolicySucceed)
})

Comment on lines +108 to +120
ginkgo.By("checking ApplyPolicySucceed event", func() {
framework.WaitEventFitWith(kubeClient, deploymentNamespace, deploymentName,
func(event corev1.Event) bool {
return event.Reason == events.EventReasonApplyPolicySucceed
})
})

ginkgo.By("checking GetReplicasSucceed event", func() {
framework.WaitEventFitWith(kubeClient, deploymentNamespace, deploymentName,
func(event corev1.Event) bool {
return event.Reason == events.EventReasonGetReplicasSucceed
})
})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is significant code duplication in this test for event assertions (e.g., here and in subsequent ginkgo.By blocks for ScheduleBindingSucceed, SyncWorkSucceed, etc.). To make the test more readable and maintainable, I recommend creating a helper function for event checking. For example:

func expectEvent(kubeClient kubernetes.Interface, namespace, name, reason string) {
	framework.WaitEventFitWith(kubeClient, namespace, name, func(event corev1.Event) bool {
		return event.Reason == reason
	})
}

This would simplify your test blocks to:

ginkgo.By("checking ApplyPolicySucceed event", func() {
	expectEvent(kubeClient, deploymentNamespace, deploymentName, events.EventReasonApplyPolicySucceed)
})

ginkgo.By("checking GetReplicasSucceed event", func() {
	expectEvent(kubeClient, deploymentNamespace, deploymentName, events.EventReasonGetReplicasSucceed)
})

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR expands E2E test coverage for Karmada controller-emitted events by adding inline event assertions to existing propagation and override test flows, and adds initial framework support for querying events.k8s.io/v1 Events.

Changes:

  • Add inline WaitEventFitWith(...) assertions for multiple event reasons in existing PropagationPolicy E2E scenarios (Deployment/Service/Job).
  • Add inline WaitEventFitWith(...) assertions for ApplyOverridePolicySucceed in existing OverridePolicy E2E scenarios.
  • Extend the E2E framework with a WaitEventV1FitWith(...) helper to query events.k8s.io/v1 Events.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
test/e2e/suites/base/propagationpolicy_test.go Adds inline assertions validating expected event reasons during resource propagation flows.
test/e2e/suites/base/overridepolicy_test.go Adds inline assertions validating override-application event emission in override scenarios.
test/e2e/framework/events.go Introduces a helper to query events.k8s.io/v1 events (foundation for future migration).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 18, 2026

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 42.04%. Comparing base (0529685) to head (bad794f).
⚠️ Report is 14 commits behind head on master.
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7300      +/-   ##
==========================================
+ Coverage   42.02%   42.04%   +0.01%     
==========================================
  Files         874      874              
  Lines       53547    53551       +4     
==========================================
+ Hits        22502    22513      +11     
+ Misses      29353    29348       -5     
+ Partials     1692     1690       -2     
Flag Coverage Δ
unittests 42.04% <ø> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@XiShanYongYe-Chang
Copy link
Copy Markdown
Member

Fixes #7252

This will close the target issue. Please adjust the wording: Fixes part of xx

}

// WaitEventV1FitWith uses events.k8s.io/v1 to do the same thing as WaitEventFitWith
func WaitEventV1FitWith(kubeClient kubernetes.Interface, namespace string, involvedObj string, fit func(eventsv1.Event) bool) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Support for eventsv1 can be added later.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okayy , I will remove it

Copy link
Copy Markdown
Author

@nXtCyberNet nXtCyberNet Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@XiShanYongYe-Chang btw I didn’t use slices.ContainsFunc here because of the somewhat unpredictable nature of the events/v1 API (ordering, transient updates, etc.). does this issue still exist in newer versions, or is it safe to use ContainsFunc now? just asking for my knowledge, thanks.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand why we can't use ContainsFunc.

@XiShanYongYe-Chang
Copy link
Copy Markdown
Member

Once this PR is ready, you can squash all the commits into one and then ask me to review it.

@nXtCyberNet
Copy link
Copy Markdown
Author

Once this PR is ready, you can squash all the commits into one and then ask me to review it.

Hi, I’ve squashed all the commits into one. The PR is ready for review, please take a look

@XiShanYongYe-Chang
Copy link
Copy Markdown
Member

Thanks
/assign

return true
})
})
ginkgo.By("checking GetDependenciesSucceed event on deployment", func() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are also dependency distribution scenarios for other resources in this test file. Why were these not tested together?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right — those scenarios were missed earlier as each When block creates isolated deployments; I'll added the dependency propagation assertions for all remaining resources.

deployment.GetLabels()["foo"] == "exist" &&
deployment.GetLabels()["app"] == "nginx"
})
ginkgo.By("checking ApplyOverridePolicySucceed event on deployment", func() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check on an event can be moved before the check on resource distribution, as shown in line 116.

Can the same test be performed on ClusterOverridePolicy?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @XiShanYongYe-Chang, thank you for the feedback. I’ll move the event assertion before the resource distribution check in all affected It blocks. Currently, this PR is focused on the resource template subtask; I’ll add ApplyOverridePolicySucceed assertions to the ClusterOverridePolicy tests in another PR

framework.CreatePropagationPolicy(karmadaClient, highPriorityPolicy)
framework.WaitDeploymentPresentOnClusterFitWith(preemptingClusterName, deployment.Namespace, deployment.Name, func(*appsv1.Deployment) bool { return true })
})
ginkgo.By("checking PreemptPolicySucceed event on deployment", func() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, I think event checks can be moved before resource checks.

return true
})

patch := []map[string]any{{"op": "replace", "path": "/spec/ports/0/port", "value": updateServicePort}}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to remove these codes? Same as below.

return *deployment.Spec.Replicas == updateDeploymentReplicas
})
// resource template events — fired on the Deployment object itself
ginkgo.By("checking ApplyPolicySucceed event", func() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the event check here can be divided into two parts. The first part is the event for resource distribution, which can be checked before WaitDeploymentPresentOnClustersFitWith. The second part is the event for status collection, which can be checked after WaitDeploymentPresentOnClustersFitWith. WDYT

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One concern I want to raise before finalizing the grouping approach:

Since Kubernetes events are best-effort and not guaranteed to be delivered (they can be dropped under memory pressure or aggregated), is there a risk that WaitEventFitWith could cause a flaky test failure simply because an event was dropped rather than because the feature actually broke?

I noticed the existing event assertions in the codebase (e.g. WorkDispatching, DependencyPolicyConflict) make the same assumption. Just wanted to confirm this tradeoff is acceptable for this PR as well before proceeding — I can adjust the approach if needed.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One follow-up thought — if events are dropped, wouldn’t the existing By block checks also fail for the same reason? In that case, the flakiness risk isn’t specific to WaitEventFitWith but applies to the current approach as well.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good question.

I can adjust the approach if needed.

Do you have any good ideas?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One follow-up thought — if events are dropped, wouldn’t the existing By block checks also fail for the same reason? In that case, the flakiness risk isn’t specific to WaitEventFitWith but applies to the current approach as well.

@gemini-code-assist
Do you have any good suggestions?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a very insightful question, nXtCyberNet. You've hit on a crucial point regarding the reliability of Kubernetes events in E2E tests. You are absolutely correct that Kubernetes events are best-effort and not guaranteed to be delivered, which can indeed introduce flakiness into tests that strictly rely on their presence.

Here are some suggestions to consider when dealing with event assertions in E2E tests, especially when flakiness is a concern:

  1. Prioritize State-Based Assertions: For most E2E tests, the primary goal is to verify the observable end-state of the system. For instance, in the "deployment propagation testing" scenario, the most critical assertion is that the Deployment is correctly present and configured on the member clusters. The ApplyPolicySucceed event is an indication that the policy application process started or completed successfully, but the ultimate success is reflected in the resource's actual state.

  2. Event Checks for Diagnostics/Observability: Events are invaluable for debugging and understanding the internal flow of the system. If an event assertion is primarily for diagnostic purposes (i.e., the test would still pass if the event was missed but the desired state was achieved), consider making it a "soft" assertion. This could involve:

    • Non-blocking checks: Instead of WaitEventFitWith (which blocks and fails on timeout), you could use a non-blocking check that attempts to find the event within a very short period and logs its presence or absence without failing the test. This would require a different helper function or a custom implementation within the test.
    • Conditional logging: Log if the event is found, but don't fail the test if it's not, as long as the primary state assertion passes.
  3. Critical Event Assertions: If an event itself represents a critical, observable outcome that cannot be directly verified by the resource's state (e.g., an audit event that must be generated for compliance), then WaitEventFitWith is appropriate. In such cases, the flakiness risk due to event delivery guarantees is an inherent trade-off. To mitigate this, ensure that WaitEventFitWith has sufficiently generous timeouts and retry mechanisms to account for potential delays in event propagation, but be aware that a truly dropped event will still cause a failure.

For the ApplyPolicySucceed event in the current snippet, since its success is largely reflected by the WaitDeploymentPresentOnClustersFitWith call, it might be a candidate for a more lenient, diagnostic-focused check if the goal is to reduce flakiness. However, if the project explicitly wants to assert on the presence of this event as a strong signal, then the current WaitEventFitWith is acceptable, with the understanding of the potential flakiness.

Given the PR's focus on adding event assertions, a good compromise might be to ensure that WaitEventFitWith is configured with reasonable timeouts and retries, and to add comments clarifying the intent behind each event assertion (e.g., whether it's a critical functional check or a diagnostic aid).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the above analysis, I think that our current configuration with a longer timeout should be acceptable. If we find out later that events are being dropped, then we can consider the Non-blocking checks: again. WDYT @nXtCyberNet

Copy link
Copy Markdown
Member

@XiShanYongYe-Chang XiShanYongYe-Chang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot

@XiShanYongYe-Chang
Copy link
Copy Markdown
Member

Hi @nXtCyberNet The first PR may have a lot of reviews, but subsequent PRs should be much better if they are based on this template.

@nXtCyberNet nXtCyberNet force-pushed the issue/e2e_test branch 3 times, most recently from b462171 to 6f47ecf Compare March 21, 2026 07:54
@karmada-bot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from xishanyongye-chang. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Signed-off-by: nXtCyberNet <rohantech2005@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/feature Categorizes issue or PR as related to a new feature. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants