Skip to content

fix(NSC): add mutex to protect nodeport healthcheck maps from data race#2021

Closed
Aprazor wants to merge 1 commit intocloudnativelabs:masterfrom
Aprazor:fix/nodeport-healthcheck-data-race
Closed

fix(NSC): add mutex to protect nodeport healthcheck maps from data race#2021
Aprazor wants to merge 1 commit intocloudnativelabs:masterfrom
Aprazor:fix/nodeport-healthcheck-data-race

Conversation

@Aprazor
Copy link
Copy Markdown
Contributor

@Aprazor Aprazor commented Mar 18, 2026

What type of PR is this?

bug

What this PR does / why we need it:

UpdateServicesInfo() writes serviceInfoMap and endpointsInfoMap from the controller's sync goroutine, while the HTTP handler Handler() reads them from net/http's goroutine pool. This is a data race — any concurrent NodePort health check HTTP request during a sync cycle races with the map write.

The fix adds a sync.RWMutex to nphcServicesInfo and guards writes with Lock/Unlock and reads with RLock/RUnlock, allowing concurrent health check reads while blocking only during sync writes.

Which issue(s) this PR is related to:

None found.

Was AI used during the creation of this PR?

  • What tool was used: Claude Code
  • To what extent was the tool used? Code review identified the race condition, human reviewed and confirmed the fix
  • If drafted, how detailed of a plan did you create for the AI? Detailed — traced the concurrent access pattern across goroutines, confirmed no existing synchronization, designed minimal mutex fix
  • Help us understand if a human was in the loop or not for this PR? Yes — human confirmed the finding, reviewed the diff, and approved before submission

What, if any, amount of integration testing was done with this change in a Kubernetes environment?

Unit tests pass. No integration testing — the race requires concurrent HTTP requests during sync to trigger.

Does this PR introduce a breaking change?

NONE

Anything else the reviewer should know that wasn't already covered?

The race would be detectable with go test -race under load. RWMutex was chosen over a regular Mutex because health check reads are far more frequent than sync writes.

UpdateServicesInfo writes serviceInfoMap and endpointsInfoMap from the
controller sync goroutine, while the HTTP handler reads them from
net/http's goroutine pool. This is a data race. Add sync.RWMutex to
nphcServicesInfo and guard writes with Lock/Unlock and reads with
RLock/RUnlock.
@aauren
Copy link
Copy Markdown
Collaborator

aauren commented Mar 23, 2026

Hi @Aprazor — please see my comment on #2020 (#2020 (comment)) for feedback on how to consolidate your PRs. Thanks!

@Aprazor
Copy link
Copy Markdown
Contributor Author

Aprazor commented Mar 23, 2026

Superseded by #2041 (consolidated NSC PR per @aauren's feedback)

@Aprazor Aprazor closed this Mar 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants