fix: add FilterFunc to skip deleted Pods in ResisterResultSavingToInf… #435

Park-Jiyeonn · 2025-06-16T03:11:55Z

What this PR does

This PR adds a FilterFunc to the ResisterResultSavingToInformer method to prevent scheduling results from being written to Pods that are being deleted.

Why

As described in issue #426, when a Pod is being deleted from the cluster, the informer may still receive update events, but attempts to annotate such Pods will fail because they no longer exist.

By adding a FilterFunc, we ensure that only Pods not marked for deletion (DeletionTimestamp == nil) are processed, preventing errors during update handling.

Related Issue

Fixes: #426

Additional Notes

This change prevents unnecessary update attempts on terminating Pods.
Ensures the simulator behaves more robustly in real cluster environments.

k8s-ci-robot · 2025-06-16T03:12:05Z

Hi @Park-Jiyeonn. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

sanposhiho

/cc @utam0k @ordovicia @saza-ku
/ok-to-test

k8s-ci-robot · 2025-06-16T19:43:20Z

@sanposhiho: GitHub didn't allow me to request PR reviews from the following users: ordovicia, saza-ku.

Note that only kubernetes-sigs members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @utam0k @ordovicia @saza-ku
/ok-to-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

saza-ku

Thank you for your contribution.

While filtering out Pods where DeletionTimestamp != nil can mitigate the issue, it doesn't prevent it completely. A Pod could still be deleted during the execution of UpdateFunc even if its DeletionTimestamp was nil at the beginning. This suggests that it might be impossible to avoid this race condition entirely.

As an alternative, what do you think about configuring the FilterFunc so that the UpdateFunc is executed only when necessary? The UpdateFunc should run when a Pod has been successfully scheduled. Therefore, I suggest triggering the UpdateFunc only when the nodeName of the Pod has changed between the update, as this indicates that scheduling is complete.

Park-Jiyeonn · 2025-06-23T06:45:34Z

As an alternative, what do you think about configuring the FilterFunc so that the UpdateFunc is executed only when necessary? The UpdateFunc should run when a Pod has been successfully scheduled. Therefore, I suggest triggering the UpdateFunc only when the nodeName of the Pod has changed between the update, as this indicates that scheduling is complete.

If, during scheduling, a Pod doesn't pass the filter function and we only consider changes in the node name, then we won't be able to record the reason why it didn't pass the filter function. Do you mean that we don't need to care about Pods that fail to be scheduled?

saza-ku · 2025-06-23T07:33:11Z

Ah, you're right. This will prevent from recording filtered out scheduling results.

So I wonder if your fix eliminates the error reported by #426.
@LY-today Do you think this resolves your issue?

LY-today · 2025-06-27T07:22:27Z

Ah, you're right. This will prevent from recording filtered out scheduling results.

So I wonder if your fix eliminates the error reported by #426. @LY-today Do you think this resolves your issue?

can be solved

Park-Jiyeonn · 2025-08-01T02:29:03Z

Hi @saza-ku

Just wanted to kindly follow up on this PR 🙇
Let me know if there's anything else I should revise or clarify.

Thank you again for your time and feedback!

Park-Jiyeonn · 2025-08-03T12:43:21Z

@sanposhiho

Hope you're doing well! Just wanted to follow up on this PR again. I'd really appreciate it if you could take a look or let me know if there's anything else I should improve 🙇

Looking forward to your feedback. Thank you so much for your time!

Park-Jiyeonn · 2025-08-12T03:31:51Z

@sanposhiho @saza-ku PTAL

Park-Jiyeonn · 2025-08-13T02:24:52Z

@sanposhiho @saza-ku PTAL, Thx!

sanposhiho · 2025-08-13T02:39:24Z

Sorry, I noticed you pinged me many times, but I got my laptop broken and hence will be away from oss work until I get a new one (will arrive this week or next).

saza-ku · 2025-08-14T02:32:22Z

@Park-Jiyeonn Sorry for being late.

Actually I'm uncertain about the purpose of this change because of the following reasons.

This PR does not fully eliminate the error (as I mentioned at fix: add FilterFunc to skip deleted Pods in ResisterResultSavingToInf… #435 (review))
In my understanding even with the error present, the simulator operates without any noticeable issues.

Could you please provide more context or elaborate on the reasoning behind this change?

Park-Jiyeonn · 2025-08-14T03:30:44Z

@Park-Jiyeonn Sorry for being late.

Actually I'm uncertain about the purpose of this change because of the following reasons.

This PR does not fully eliminate the error (as I mentioned at fix: add FilterFunc to skip deleted Pods in ResisterResultSavingToInf… #435 (review))

In my understanding even with the error present, the simulator operates without any noticeable issues.

Could you please provide more context or elaborate on the reasoning behind this change?

Regarding the concern you mentioned — I believe I addressed this in my earlier comment #issuecomment-2995142507, but let me briefly summarize again:

The main goal of this change is to prevent unnecessary update attempts on Pods that are already terminating (DeletionTimestamp != nil), which caused the error in [bug]：ResisterResultSavingToInformer err #426.
Even though the simulator continues to operate with the error, this fix improves robustness and reduces noise in logs during scheduling simulations.

sanposhiho · 2025-08-16T05:40:21Z

but attempts to annotate such Pods will fail because they no longer exist.

Why not just ignore NotFound error when applying?

Park-Jiyeonn · 2025-08-17T03:45:08Z

but attempts to annotate such Pods will fail because they no longer exist.

Why not just ignore NotFound error when applying?

That’s exactly the purpose of this change, I added the FilterFunc is to ignore those NotFound errors

sanposhiho · 2025-08-17T04:11:08Z

But, that is not perfect depending on the timing. If we just ignore the not found errors at the fetch API, that's perfect prevention, no?

Park-Jiyeonn · 2025-08-17T04:36:51Z

But, that is not perfect depending on the timing. If we just ignore the not found errors at the fetch API, that's perfect prevention, no?

do you mean we don't need the filterFunc?

sanposhiho · 2025-08-17T05:18:12Z

I'd say we can keep your filter to reduce the num of fetch API calls (performance benefit), and we can add an additional error handling to ignore not-found errors for a perfect prevention of the issue.

Park-Jiyeonn · 2025-08-17T06:19:11Z

I'd say we can keep your filter to reduce the num of fetch API calls (performance benefit), and we can add an additional error handling to ignore not-found errors for a perfect prevention of the issue.

How about add a warning before return false?

         FilterFunc: func(obj interface{}) bool {
			if pod, ok := obj.(*corev1.Pod); ok {
				if pod.DeletionTimestamp != nil {
                    add some warning or error
					return false
				}
			}

sanposhiho · 2025-08-17T07:20:58Z

I don't think we need warning. Warning implies something goes wrong, but in this case, nothing goes wrong: It's just that the pod is being deleted (i.e., users would have no action to solve this warning)

…ormer

Park-Jiyeonn · 2025-08-17T08:39:40Z

I don't think we need warning. Warning implies something goes wrong, but in this case, nothing goes wrong: It's just that the pod is being deleted (i.e., users would have no action to solve this warning)

it's OK! Additionally ignore NotFound on Get/Update.

sanposhiho

/lgtm
/approve

k8s-ci-robot · 2025-08-18T00:24:45Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Park-Jiyeonn, sanposhiho

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [sanposhiho]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. labels Jun 16, 2025

k8s-ci-robot requested a review from 196Ikuchil June 16, 2025 03:12

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jun 16, 2025

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 16, 2025

LY-today mentioned this pull request Jun 16, 2025

[bug]：ResisterResultSavingToInformer err #426

Closed

sanposhiho reviewed Jun 16, 2025

View reviewed changes

k8s-ci-robot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Jun 16, 2025

k8s-ci-robot requested a review from utam0k June 16, 2025 19:43

k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jun 16, 2025

saza-ku reviewed Jun 22, 2025

View reviewed changes

Park-Jiyeonn requested review from saza-ku and sanposhiho August 4, 2025 02:55

Park-Jiyeonn force-pushed the feat/filter-func branch from d89840d to a50456d Compare August 17, 2025 08:16

fix: add FilterFunc to skip deleted Pods in ResisterResultSavingToInf…

9fddd8c

…ormer

Park-Jiyeonn force-pushed the feat/filter-func branch from a50456d to 9fddd8c Compare August 17, 2025 08:29

sanposhiho approved these changes Aug 18, 2025

View reviewed changes

k8s-ci-robot assigned sanposhiho Aug 18, 2025

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 18, 2025

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 18, 2025

k8s-ci-robot merged commit 2df9398 into kubernetes-sigs:master Aug 18, 2025
5 checks passed

fix: add FilterFunc to skip deleted Pods in ResisterResultSavingToInf… #435

fix: add FilterFunc to skip deleted Pods in ResisterResultSavingToInf… #435

Conversation

Park-Jiyeonn commented Jun 16, 2025

What this PR does

Why

Related Issue

Additional Notes

Uh oh!

k8s-ci-robot commented Jun 16, 2025

Uh oh!

sanposhiho left a comment

Choose a reason for hiding this comment

Uh oh!

k8s-ci-robot commented Jun 16, 2025

Uh oh!

saza-ku left a comment

Choose a reason for hiding this comment

Uh oh!

Park-Jiyeonn commented Jun 23, 2025

Uh oh!

saza-ku commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LY-today commented Jun 27, 2025

Uh oh!

Park-Jiyeonn commented Aug 1, 2025

Uh oh!

Park-Jiyeonn commented Aug 3, 2025

Uh oh!

Park-Jiyeonn commented Aug 12, 2025

Uh oh!

Park-Jiyeonn commented Aug 13, 2025

Uh oh!

sanposhiho commented Aug 13, 2025

Uh oh!

saza-ku commented Aug 14, 2025

Uh oh!

Park-Jiyeonn commented Aug 14, 2025

Uh oh!

sanposhiho commented Aug 16, 2025

Uh oh!

Park-Jiyeonn commented Aug 17, 2025

Uh oh!

sanposhiho commented Aug 17, 2025

Uh oh!

Park-Jiyeonn commented Aug 17, 2025

Uh oh!

sanposhiho commented Aug 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Park-Jiyeonn commented Aug 17, 2025

Uh oh!

sanposhiho commented Aug 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Park-Jiyeonn commented Aug 17, 2025

Uh oh!

sanposhiho left a comment

Choose a reason for hiding this comment

Uh oh!

k8s-ci-robot commented Aug 18, 2025

Uh oh!

Uh oh!

Uh oh!

saza-ku commented Jun 23, 2025 •

edited

Loading

sanposhiho commented Aug 17, 2025 •

edited

Loading

sanposhiho commented Aug 17, 2025 •

edited

Loading