Skip to content

[channel] Fix race condition on channel.Ready.Reset()#446

Merged
liran-funaro merged 1 commit intohyperledger:mainfrom
liran-funaro:channel-race
Mar 19, 2026
Merged

[channel] Fix race condition on channel.Ready.Reset()#446
liran-funaro merged 1 commit intohyperledger:mainfrom
liran-funaro:channel-race

Conversation

@liran-funaro
Copy link
Copy Markdown
Contributor

@liran-funaro liran-funaro commented Mar 18, 2026

Type of change

  • Bug fix

Description

Root cause: The Ready.Reset() method was experiencing a race condition due to non-atomic struct field replacement. The original implementation used *r = *NewReady() to reset the struct, which performs a multi-step memory copy operation that replaces all fields (channels ready, closed, and once). This struct copy is not atomic.

Concurrently, other goroutines calling WaitForReady() or WaitForAllReady() would read from the same channel fields (r.ready and r.closed) via select statements. This created a classic data race where one goroutine was writing to struct fields while others were simultaneously reading from them.

Race scenario:

  1. Goroutine A calls Reset() and begins copying new struct fields (non-atomic multi-step operation)
  2. Goroutine B calls WaitForReady() and attempts to read r.ready or r.closed channels
  3. Data race occurs: Goroutine A writes to memory while Goroutine B reads from the same memory locations

The fix: Uses atomic pointer indirection by moving channels into an inner ready struct and storing only an atomic.Pointer[ready] in the outer Ready struct. This enables atomic swapping of the entire internal state via CompareAndSwap(), eliminating the race condition while maintaining the same external API.

Related issues

@liran-funaro liran-funaro requested a review from cendhu March 18, 2026 15:03
@liran-funaro liran-funaro added the bug Something isn't working label Mar 18, 2026
@cendhu
Copy link
Copy Markdown
Contributor

cendhu commented Mar 18, 2026

Please add root cause in all PR description to save review time.

@cendhu
Copy link
Copy Markdown
Contributor

cendhu commented Mar 18, 2026

An example #410

It was generated by Bob.

@coveralls
Copy link
Copy Markdown

coveralls commented Mar 18, 2026

Coverage Status

coverage: 90.387% (+0.1%) from 90.284%
when pulling b1a49aa on liran-funaro:channel-race
into 355e2a9 on hyperledger:main.

Signed-off-by: Liran Funaro <liran.funaro@gmail.com>
@liran-funaro
Copy link
Copy Markdown
Contributor Author

Please add root cause in all PR description to save review time.

@cendhu done

Copy link
Copy Markdown
Contributor

@cendhu cendhu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@liran-funaro liran-funaro merged commit cf13f9f into hyperledger:main Mar 19, 2026
17 checks passed
@liran-funaro liran-funaro deleted the channel-race branch March 19, 2026 09:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants