Skip to content

Conversation

Park-Jiyeonn
Copy link
Contributor

What this PR does

This PR adds a FilterFunc to the ResisterResultSavingToInformer method to prevent scheduling results from being written to Pods that are being deleted.

Why

As described in issue #426, when a Pod is being deleted from the cluster, the informer may still receive update events, but attempts to annotate such Pods will fail because they no longer exist.

By adding a FilterFunc, we ensure that only Pods not marked for deletion (DeletionTimestamp == nil) are processed, preventing errors during update handling.

Related Issue

Fixes: #426

Additional Notes

  • This change prevents unnecessary update attempts on terminating Pods.
  • Ensures the simulator behaves more robustly in real cluster environments.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. labels Jun 16, 2025
@k8s-ci-robot k8s-ci-robot requested a review from 196Ikuchil June 16, 2025 03:12
@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jun 16, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @Park-Jiyeonn. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 16, 2025
Copy link
Member

@sanposhiho sanposhiho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/cc @utam0k @ordovicia @saza-ku
/ok-to-test

@k8s-ci-robot k8s-ci-robot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Jun 16, 2025
@k8s-ci-robot
Copy link
Contributor

@sanposhiho: GitHub didn't allow me to request PR reviews from the following users: ordovicia, saza-ku.

Note that only kubernetes-sigs members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @utam0k @ordovicia @saza-ku
/ok-to-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot requested a review from utam0k June 16, 2025 19:43
@k8s-ci-robot k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jun 16, 2025
Copy link
Contributor

@saza-ku saza-ku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution.

While filtering out Pods where DeletionTimestamp != nil can mitigate the issue, it doesn't prevent it completely. A Pod could still be deleted during the execution of UpdateFunc even if its DeletionTimestamp was nil at the beginning. This suggests that it might be impossible to avoid this race condition entirely.

As an alternative, what do you think about configuring the FilterFunc so that the UpdateFunc is executed only when necessary? The UpdateFunc should run when a Pod has been successfully scheduled. Therefore, I suggest triggering the UpdateFunc only when the nodeName of the Pod has changed between the update, as this indicates that scheduling is complete.

@Park-Jiyeonn
Copy link
Contributor Author

As an alternative, what do you think about configuring the FilterFunc so that the UpdateFunc is executed only when necessary? The UpdateFunc should run when a Pod has been successfully scheduled. Therefore, I suggest triggering the UpdateFunc only when the nodeName of the Pod has changed between the update, as this indicates that scheduling is complete.

If, during scheduling, a Pod doesn't pass the filter function and we only consider changes in the node name, then we won't be able to record the reason why it didn't pass the filter function. Do you mean that we don't need to care about Pods that fail to be scheduled?

@saza-ku
Copy link
Contributor

saza-ku commented Jun 23, 2025

Ah, you're right. This will prevent from recording filtered out scheduling results.

So I wonder if your fix eliminates the error reported by #426.
@LY-today Do you think this resolves your issue?

@LY-today
Copy link
Contributor

Ah, you're right. This will prevent from recording filtered out scheduling results.

So I wonder if your fix eliminates the error reported by #426. @LY-today Do you think this resolves your issue?

can be solved

@Park-Jiyeonn
Copy link
Contributor Author

Hi @saza-ku

Just wanted to kindly follow up on this PR 🙇
Let me know if there's anything else I should revise or clarify.

Thank you again for your time and feedback!

@Park-Jiyeonn
Copy link
Contributor Author

@sanposhiho

Hope you're doing well! Just wanted to follow up on this PR again. I'd really appreciate it if you could take a look or let me know if there's anything else I should improve 🙇

Looking forward to your feedback. Thank you so much for your time!

@Park-Jiyeonn
Copy link
Contributor Author

@sanposhiho @saza-ku PTAL

@Park-Jiyeonn
Copy link
Contributor Author

@sanposhiho @saza-ku PTAL, Thx!

@sanposhiho
Copy link
Member

Sorry, I noticed you pinged me many times, but I got my laptop broken and hence will be away from oss work until I get a new one (will arrive this week or next).

@saza-ku
Copy link
Contributor

saza-ku commented Aug 14, 2025

@Park-Jiyeonn Sorry for being late.

Actually I'm uncertain about the purpose of this change because of the following reasons.

Could you please provide more context or elaborate on the reasoning behind this change?

@Park-Jiyeonn
Copy link
Contributor Author

@Park-Jiyeonn Sorry for being late.

Actually I'm uncertain about the purpose of this change because of the following reasons.

Could you please provide more context or elaborate on the reasoning behind this change?

Regarding the concern you mentioned — I believe I addressed this in my earlier comment #issuecomment-2995142507, but let me briefly summarize again:

  • The main goal of this change is to prevent unnecessary update attempts on Pods that are already terminating (DeletionTimestamp != nil), which caused the error in [bug]:ResisterResultSavingToInformer err #426.
  • Even though the simulator continues to operate with the error, this fix improves robustness and reduces noise in logs during scheduling simulations.

@sanposhiho
Copy link
Member

but attempts to annotate such Pods will fail because they no longer exist.

Why not just ignore NotFound error when applying?

@Park-Jiyeonn
Copy link
Contributor Author

but attempts to annotate such Pods will fail because they no longer exist.

Why not just ignore NotFound error when applying?

That’s exactly the purpose of this change, I added the FilterFunc is to ignore those NotFound errors

@sanposhiho
Copy link
Member

But, that is not perfect depending on the timing. If we just ignore the not found errors at the fetch API, that's perfect prevention, no?

@Park-Jiyeonn
Copy link
Contributor Author

But, that is not perfect depending on the timing. If we just ignore the not found errors at the fetch API, that's perfect prevention, no?

do you mean we don't need the filterFunc?

@sanposhiho
Copy link
Member

sanposhiho commented Aug 17, 2025

I'd say we can keep your filter to reduce the num of fetch API calls (performance benefit), and we can add an additional error handling to ignore not-found errors for a perfect prevention of the issue.

@Park-Jiyeonn
Copy link
Contributor Author

I'd say we can keep your filter to reduce the num of fetch API calls (performance benefit), and we can add an additional error handling to ignore not-found errors for a perfect prevention of the issue.

How about add a warning before return false?

         FilterFunc: func(obj interface{}) bool {
			if pod, ok := obj.(*corev1.Pod); ok {
				if pod.DeletionTimestamp != nil {
                    add some warning or error
					return false
				}
			}

@sanposhiho
Copy link
Member

sanposhiho commented Aug 17, 2025

I don't think we need warning. Warning implies something goes wrong, but in this case, nothing goes wrong: It's just that the pod is being deleted (i.e., users would have no action to solve this warning)

@Park-Jiyeonn
Copy link
Contributor Author

I don't think we need warning. Warning implies something goes wrong, but in this case, nothing goes wrong: It's just that the pod is being deleted (i.e., users would have no action to solve this warning)

it's OK! Additionally ignore NotFound on Get/Update.

Copy link
Member

@sanposhiho sanposhiho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 18, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Park-Jiyeonn, sanposhiho

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 18, 2025
@k8s-ci-robot k8s-ci-robot merged commit 2df9398 into kubernetes-sigs:master Aug 18, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[bug]:ResisterResultSavingToInformer err
5 participants