-
Notifications
You must be signed in to change notification settings - Fork 13
[data-race] Monitoring the queues (sidecar) when the queue is updated with a new qeuue #449
Description
In the sidecar, we update the blockToBeCommitted queue upon recovery.
The monitoring loop might check the queue length concurrently.
This is only an apparent race. Not a real issue.
Technical Analysis
Race Location:
- Write (Line 188 in
sendBlocksAndReceiveStatus()): ReplacesblockToBeCommittedchannel during recovery - Read (Line 408 in
monitorQueues()): Reads channel length for Prometheus metrics
Root Cause:
During sidecar recovery/reconnection to coordinator, sendBlocksAndReceiveStatus() replaces the entire blockToBeCommitted channel to drop enqueued blocks from the previous session, while the monitoring goroutine concurrently reads the channel length.
Why It's Benign:
- Both
len()on a channel and channel pointer replacement are atomic operations in Go - Worst case: monitoring briefly reads from old channel before seeing new one
- No functional impact on block processing or status delivery
- Race window is extremely small (only during recovery)
Recommended Solution: atomic.Pointer with Generics
Use atomic.Pointer[chan *common.Block] for type-safe atomic operations:
import "sync/atomic"
type Service struct {
// ... existing fields ...
blockToBeCommitted atomic.Pointer[chan *common.Block]
committedBlock atomic.Pointer[chan *common.Block]
}
// In New():
blockToBeCommittedChan := make(chan *common.Block, c.ChannelBufferSize)
s.blockToBeCommitted.Store(&blockToBeCommittedChan)
committedBlockChan := make(chan *common.Block, c.ChannelBufferSize)
s.committedBlock.Store(&committedBlockChan)
// In sendBlocksAndReceiveStatus() (line 188):
newChan := make(chan *common.Block, s.config.ChannelBufferSize)
s.blockToBeCommitted.Store(&newChan)
// In monitorQueues() (line 408):
blockChan := s.blockToBeCommitted.Load()
if blockChan != nil {
promutil.SetGauge(m.yetToBeCommittedBlocksQueueSize, len(*blockChan))
}
// Wherever the channel is used:
outputBlock := *s.blockToBeCommitted.Load()Benefits:
- Type-safe with generics (Go 1.19+)
- No mutex overhead
- Lock-free atomic operations
- Clean, idiomatic Go code
Alternative Solutions:
Option 2: Mutex Protection
type Service struct {
blockToBeCommittedMu sync.RWMutex
blockToBeCommitted chan *common.Block
}Simpler but adds mutex overhead.
Option 3: Accept & Document
Document as known benign race; suppress in race detector config.
Priority: Low - cosmetic issue for tooling, not a production bug.
Technical Analysis
Race Location:
- Write (Line 188 in
sendBlocksAndReceiveStatus()): ReplacesblockToBeCommittedchannel during recovery - Read (Line 408 in
monitorQueues()): Reads channel length for Prometheus metrics
Root Cause:
During sidecar recovery/reconnection to coordinator, sendBlocksAndReceiveStatus() replaces the entire blockToBeCommitted channel to drop enqueued blocks from the previous session, while the monitoring goroutine concurrently reads the channel length.
Why It's Benign:
- Both
len()on a channel and channel pointer replacement are atomic operations in Go - Worst case: monitoring briefly reads from old channel before seeing new one
- No functional impact on block processing or status delivery
- Race window is extremely small (only during recovery)
Potential Solutions:
Option 1: Mutex Protection (Recommended for clean CI)
type Service struct {
blockToBeCommittedMu sync.RWMutex
blockToBeCommitted chan *common.Block
}Protect both write (channel replacement) and read (len()) operations with RWMutex.
Option 2: atomic.Value (More complex)
Store channel in atomic.Value for atomic pointer operations.
Option 3: Accept & Document (Simplest)
Document as known benign race; suppress in race detector config.
Recommendation: Option 1 provides clean race detector runs for CI (#442) with negligible overhead since monitoring is ticker-based and recovery is rare.
Priority: Low - cosmetic issue for tooling, not a production bug.
- discovered by [CI] Enable race detection for unit tests in CI #442
==================
WARNING: DATA RACE
Read at 0x00c0003b8508 by goroutine 129:
github.com/hyperledger/fabric-x-committer/service/sidecar.(*Service).monitorQueues()
/home/runner/work/fabric-x-committer/fabric-x-committer/service/sidecar/sidecar.go:408 +0xfd
github.com/hyperledger/fabric-x-committer/service/sidecar.(*Service).monitorQueues-fm()
<autogenerated>:1 +0x3e
github.com/hyperledger/fabric-x-committer/utils/monitoring.(*Provider).StartPrometheusServer.func2()
/home/runner/work/fabric-x-committer/fabric-x-committer/utils/monitoring/provider.go:120 +0x58
golang.org/x/sync/errgroup.(*Group).Go.func1()
/home/runner/go/pkg/mod/golang.org/x/sync@v0.19.0/errgroup/errgroup.go:93 +0x86
Previous write at 0x00c0003b8508 by goroutine 112:
github.com/hyperledger/fabric-x-committer/service/sidecar.(*Service).sendBlocksAndReceiveStatus()
/home/runner/work/fabric-x-committer/fabric-x-committer/service/sidecar/sidecar.go:188 +0x644
github.com/hyperledger/fabric-x-committer/service/sidecar.(*Service).Run.func4.1()
/home/runner/work/fabric-x-committer/fabric-x-committer/service/sidecar/sidecar.go:157 +0x13c
github.com/hyperledger/fabric-x-committer/utils/connection.Sustain()
/home/runner/work/fabric-x-committer/fabric-x-committer/utils/connection/lifecycle.go:45 +0x159
github.com/hyperledger/fabric-x-committer/service/sidecar.(*Service).Run.func4()
/home/runner/work/fabric-x-committer/fabric-x-committer/service/sidecar/sidecar.go:153 +0xf9
golang.org/x/sync/errgroup.(*Group).Go.func1()
/home/runner/go/pkg/mod/golang.org/x/sync@v0.19.0/errgroup/errgroup.go:93 +0x86
Goroutine 129 (running) created at:
golang.org/x/sync/errgroup.(*Group).Go()
/home/runner/go/pkg/mod/golang.org/x/sync@v0.19.0/errgroup/errgroup.go:78 +0x11c
github.com/hyperledger/fabric-x-committer/utils/monitoring.(*Provider).StartPrometheusServer()
/home/runner/work/fabric-x-committer/fabric-x-committer/utils/monitoring/provider.go:119 +0xfe4
github.com/hyperledger/fabric-x-committer/service/sidecar.(*Service).Run.func1()
/home/runner/work/fabric-x-committer/fabric-x-committer/service/sidecar/sidecar.go:122 +0x152
Goroutine 112 (running) created at:
golang.org/x/sync/errgroup.(*Group).Go()
/home/runner/go/pkg/mod/golang.org/x/sync@v0.19.0/errgroup/errgroup.go:78 +0x11c
github.com/hyperledger/fabric-x-committer/service/sidecar.(*Service).Run()
/home/runner/work/fabric-x-committer/fabric-x-committer/service/sidecar/sidecar.go:151 +0x944
github.com/hyperledger/fabric-x-committer/utils/connection.StartService.func1()
/home/runner/work/fabric-x-committer/fabric-x-committer/utils/connection/server_util.go:177 +0x9e
golang.org/x/sync/errgroup.(*Group).Go.func1()
/home/runner/go/pkg/mod/golang.org/x/sync@v0.19.0/errgroup/errgroup.go:93 +0x86
==================