Skip to content

Comments

fix: prevent nil pointer panic in NodeLifecycle handleDisruption#2516

Open
Yashika0724 wants to merge 2 commits intoopenyurtio:masterfrom
Yashika0724:fix-nodelifecycle-nil-panic
Open

fix: prevent nil pointer panic in NodeLifecycle handleDisruption#2516
Yashika0724 wants to merge 2 commits intoopenyurtio:masterfrom
Yashika0724:fix-nodelifecycle-nil-panic

Conversation

@Yashika0724
Copy link

What does this PR do?

This PR fixes a potential nil pointer dereference in the NodeLifecycle controller when handling the transition out of disruption mode.


Background / Problem

The handleDisruption logic assumes that all nodes have corresponding entries in nodeHealthMap.
However, during edge or network-disrupted scenarios, this assumption may not always hold.

In particular, if node health updates return early due to API errors or pod listing failures, some nodes
may not be populated in the health map. When the controller later exits disruption mode and iterates
over all nodes, this can lead to a nil dereference and controller panic.


Why is this change needed?

This code path is exercised during recovery from disruption, which is a common and sensitive phase
in edge environments with intermittent connectivity. A controller panic at this point stops node
health processing, including tainting and eviction logic, until the controller is restarted.


What does this change do?

The controller now defensively initializes node health data when it is missing before updating
timestamps. This preserves existing behavior for healthy nodes while preventing a possible crash.


Scope of change

  • Adds a nil check in handleDisruption
  • Initializes node health data only when it is missing
  • No behavioral changes to normal node health processing

Testing

This change is small and defensive in nature. It follows existing controller patterns and does not
affect normal execution paths.

Signed-off-by: Yashika0724 <ssyashika1311@gmail.com>
@Yashika0724 Yashika0724 requested a review from a team as a code owner February 4, 2026 11:34
@Yashika0724
Copy link
Author

Hi @zyjhtangtang,
This PR fixes a potential nil pointer panic in the NodeLifecycle controller when exiting disruption mode.
In some edge or network-disrupted scenarios, a node may be missing from nodeHealthMap, and this change defensively handles that case without altering normal behavior.

…eDisruption

Signed-off-by: Yashika <ssyashika1311@gmail.com>
@sonarqubecloud
Copy link

sonarqubecloud bot commented Feb 5, 2026

@codecov
Copy link

codecov bot commented Feb 9, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 44.17%. Comparing base (4479acb) to head (2ed64d4).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2516      +/-   ##
==========================================
+ Coverage   44.16%   44.17%   +0.01%     
==========================================
  Files         399      399              
  Lines       26579    26584       +5     
==========================================
+ Hits        11738    11743       +5     
  Misses      13776    13776              
  Partials     1065     1065              
Flag Coverage Δ
unittests 44.17% <100.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Yashika0724
Copy link
Author

Hi @zyjhtangtang
I pushed a follow-up commit that fixes the golangci-lint (gci formatting) issue in this PR.
All checks are passing on the latest commit, the remaining failing golangci-lint check appears to be from an earlier workflow run.a
Could you please re-run the golangci-lint CI job when convenient?
Thanks a lot for your time and review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant