-
Notifications
You must be signed in to change notification settings - Fork 13
[data race] VC #451
Description
This issue depicts two race conditions.
Comprehensive Root Cause Analysis: Data Races in Validator-Committer Pipeline
Executive Summary
Both data races originate from the same architectural flaw: the validator goroutine continues to access shared data structures after transferring ownership to the committer goroutine via a channel, violating Go's concurrency safety principles.
The Architecture
The system uses a pipeline pattern where:
- Validator goroutine validates transactions and creates a
validatedTransactionsstruct - Sends this struct through a channel to the Committer goroutine
- Committer processes and commits the transactions
The Shared Data Structure
type validatedTransactions struct {
validTxNonBlindWrites transactionToWrites // map[TxID]namespaceToWrites
validTxBlindWrites transactionToWrites // map[TxID]namespaceToWrites
newWrites transactionToWrites // map[TxID]namespaceToWrites
readToTxIDs readToTransactions
invalidTxStatus map[TxID]committerpb.Status
txIDToHeight transactionIDToHeight
}These are reference types (maps) - when passed through the channel, both goroutines share the same underlying data.
Root Cause: Ownership Violation
Location: validator.go, lines 108-125
// Step 2: construct validated transactions
vTxs := &validatedTransactions{
validTxNonBlindWrites: prepTx.txIDToNsNonBlindWrites, // Shared reference
validTxBlindWrites: prepTx.txIDToNsBlindWrites, // Shared reference
newWrites: prepTx.txIDToNsNewWrites, // Shared reference
readToTxIDs: prepTx.readToTxIDs,
invalidTxStatus: prepTx.invalidTxIDStatus,
txIDToHeight: prepTx.txIDToHeight,
}
if err := vTxs.invalidateTxsOnReadConflicts(nsToReadConflicts); err != nil {
return err
}
promutil.Observe(v.metrics.validatorTxBatchLatencySeconds, time.Since(start))
outgoingValidatedTransactions.Write(vTxs) // ← Ownership transferred here
logger.Debugf("Validator sent batch...")
vTxs.Debug() // ← BUG: Accessing after ownership transfer!The Problem: After Write(vTxs), the validator no longer owns this data, yet it immediately calls vTxs.Debug() which reads the maps.
Data Race #1: Read-Write Conflict on Map Insertion
Timeline:
- Validator (goroutine 5697): Calls
vTxs.Debug()→ readslen(v.validTxNonBlindWrites)at line 128 - Committer (goroutine 5704): Calls
populateVersionsAndCategorizeBlindWrites()→ writes tovalidTxNonBlindWritesviagetOrCreate()at line 225
Committer's Write Operation (committer.go:212-225):
for curTxID, txWrites := range vTx.validTxBlindWrites {
for ns, nsWrites := range txWrites {
nsState := state[ns]
for i, key := range nsWrites.keys {
if ver, present := nsState[string(key)]; present {
nextVer := ver + 1
vTx.validTxNonBlindWrites.getOrCreate(curTxID, ns).append(...) // ← WRITE
} else {
vTx.newWrites.getOrCreate(curTxID, ns).append(...)
}
}
}
}Validator's Read Operation (validator.go:46-56):
func (v *validatedTransactions) Debug() {
if !logger.IsEnabledFor(zapcore.DebugLevel) {
return
}
logger.Debugf("total validated: %d\n\t"+
"valid non-blind writes: %d\n\t"+ // ← READ: len(v.validTxNonBlindWrites)
...
}Race Condition: Concurrent map read (length check) and write (insertion) without synchronization.
Data Race #2: Read-Write Conflict on Map Deletion
Timeline:
- Validator (goroutine 2517): Calls
vTxs.Debug()→ readslen(v.validTxNonBlindWrites)at line 128 - Committer (goroutine 2485): Calls
updateInvalidTxs()→ deletes fromvalidTxNonBlindWritesat line 187
Committer's Write Operation (validator.go:184-193):
func (v *validatedTransactions) updateInvalidTxs(txIDs []TxID, status committerpb.Status) {
for _, tID := range txIDs {
delete(v.validTxNonBlindWrites, tID) // ← WRITE (delete)
delete(v.validTxBlindWrites, tID)
delete(v.newWrites, tID)
v.invalidTxStatus[tID] = status
}
}Called from committer.go:168:
vTx.updateInvalidTxs(duplicates, committerpb.Status_REJECTED_DUPLICATE_TX_ID)Race Condition: Concurrent map read (length check) and write (deletion) without synchronization.
Why This Is Dangerous
- Undefined Behavior: Go maps are not thread-safe. Concurrent read/write causes undefined behavior
- Potential Crashes: Can lead to runtime panics or memory corruption
- Data Corruption: May produce incorrect results silently
- Intermittent Failures: Race conditions are timing-dependent, making them hard to reproduce
The Fix: Respect Ownership Transfer
Current Code (validator.go:120-125):
promutil.Observe(v.metrics.validatorTxBatchLatencySeconds, time.Since(start))
outgoingValidatedTransactions.Write(vTxs) // Transfer ownership
logger.Debugf("Validator sent batch...")
vTxs.Debug() // ← BUG: Access after transferFixed Code:
promutil.Observe(v.metrics.validatorTxBatchLatencySeconds, time.Since(start))
// Debug BEFORE transferring ownership
logger.Debugf("Validator sent batch validated TXs to the committer (%d non blind writes and %d blind writes)",
len(vTxs.validTxNonBlindWrites), len(vTxs.validTxBlindWrites))
vTxs.Debug()
// After this point, vTxs is owned by the committer - DO NOT ACCESS
outgoingValidatedTransactions.Write(vTxs)Key Principles Violated
-
Channel Ownership Transfer: In Go, sending data through a channel transfers logical ownership. The sender must not access the data afterward.
-
Happens-Before Relationship: There's no happens-before guarantee between the channel send and subsequent operations in the sender goroutine vs. the receiver's processing.
-
Map Concurrency Safety: Go maps require external synchronization for concurrent access. Reading (even just
len()) while another goroutine writes is undefined behavior.
Prevention Strategy
For this codebase:
- Move all debugging/logging that accesses
vTxsto before the channel write - Add a comment:
// DO NOT ACCESS vTxs AFTER THIS POINT - Consider using
go vetwith race detector in CI:go test -race
General best practices:
- Treat channel sends as ownership transfers
- Never access data after sending through a channel
- Use
sync.Mutexif shared access is truly needed - Prefer message passing over shared memory
Case 1 output
==================
WARNING: DATA RACE
Write at 0x00c00595c6f8 by goroutine 5704:
github.com/hyperledger/fabric-x-committer/service/vc.(*transactionCommitter).populateVersionsAndCategorizeBlindWrites()
/home/runner/work/fabric-x-committer/fabric-x-committer/service/vc/committer.go:225 +0x15b9
github.com/hyperledger/fabric-x-committer/service/vc.(*transactionCommitter).commitTransactions()
/home/runner/work/fabric-x-committer/fabric-x-committer/service/vc/committer.go:105 +0xc4
github.com/hyperledger/fabric-x-committer/service/vc.(*transactionCommitter).commit()
/home/runner/work/fabric-x-committer/fabric-x-committer/service/vc/committer.go:87 +0x533
github.com/hyperledger/fabric-x-committer/service/vc.(*transactionCommitter).run.func1()
/home/runner/work/fabric-x-committer/fabric-x-committer/service/vc/committer.go:58 +0x58
golang.org/x/sync/errgroup.(*Group).Go.func1()
/home/runner/go/pkg/mod/golang.org/x/sync@v0.19.0/errgroup/errgroup.go:93 +0x86
Previous read at 0x00c00595c6f8 by goroutine 5697:
github.com/hyperledger/fabric-x-committer/service/vc.(*transactionValidator).validate()
/home/runner/work/fabric-x-committer/fabric-x-committer/service/vc/validator.go:128 +0x792
github.com/hyperledger/fabric-x-committer/service/vc.(*transactionValidator).run.func1()
/home/runner/work/fabric-x-committer/fabric-x-committer/service/vc/validator.go:81 +0x58
golang.org/x/sync/errgroup.(*Group).Go.func1()
/home/runner/go/pkg/mod/golang.org/x/sync@v0.19.0/errgroup/errgroup.go:93 +0x86
Goroutine 5704 (running) created at:
golang.org/x/sync/errgroup.(*Group).Go()
/home/runner/go/pkg/mod/golang.org/x/sync@v0.19.0/errgroup/errgroup.go:78 +0x11c
github.com/hyperledger/fabric-x-committer/service/vc.(*transactionCommitter).run()
/home/runner/work/fabric-x-committer/fabric-x-committer/service/vc/committer.go:57 +0x213
github.com/hyperledger/fabric-x-committer/service/vc.(*ValidatorCommitterService).Run.func5()
/home/runner/work/fabric-x-committer/fabric-x-committer/service/vc/validator_committer_service.go:157 +0xa4
golang.org/x/sync/errgroup.(*Group).Go.func1()
/home/runner/go/pkg/mod/golang.org/x/sync@v0.19.0/errgroup/errgroup.go:93 +0x86
Goroutine 5697 (running) created at:
golang.org/x/sync/errgroup.(*Group).Go()
/home/runner/go/pkg/mod/golang.org/x/sync@v0.19.0/errgroup/errgroup.go:78 +0x11c
github.com/hyperledger/fabric-x-committer/service/vc.(*transactionValidator).run()
/home/runner/work/fabric-x-committer/fabric-x-committer/service/vc/validator.go:80 +0x213
github.com/hyperledger/fabric-x-committer/service/vc.(*ValidatorCommitterService).Run.func4()
/home/runner/work/fabric-x-committer/fabric-x-committer/service/vc/validator_committer_service.go:152 +0xa4
golang.org/x/sync/errgroup.(*Group).Go.func1()
/home/runner/go/pkg/mod/golang.org/x/sync@v0.19.0/errgroup/errgroup.go:93 +0x86
==================
Case 2 output
==================
WARNING: DATA RACE
Write at 0x00c0004fe3c0 by goroutine 2485:
runtime.mapdelete_faststr()
/opt/hostedtoolcache/go/1.26.1/x64/src/internal/runtime/maps/runtime_faststr.go:402 +0x0
github.com/hyperledger/fabric-x-committer/service/vc.(*validatedTransactions).updateInvalidTxs()
/home/runner/work/fabric-x-committer/fabric-x-committer/service/vc/validator.go:187 +0x636
github.com/hyperledger/fabric-x-committer/service/vc.(*transactionCommitter).commitTransactions()
/home/runner/work/fabric-x-committer/fabric-x-committer/service/vc/committer.go:168 +0x515
github.com/hyperledger/fabric-x-committer/service/vc.(*transactionCommitter).commit()
/home/runner/work/fabric-x-committer/fabric-x-committer/service/vc/committer.go:87 +0x533
github.com/hyperledger/fabric-x-committer/service/vc.(*transactionCommitter).run.func1()
/home/runner/work/fabric-x-committer/fabric-x-committer/service/vc/committer.go:58 +0x58
golang.org/x/sync/errgroup.(*Group).Go.func1()
/home/runner/go/pkg/mod/golang.org/x/sync@v0.19.0/errgroup/errgroup.go:93 +0x86
Previous read at 0x00c0004fe3c0 by goroutine 2517:
github.com/hyperledger/fabric-x-committer/service/vc.(*transactionValidator).validate()
/home/runner/work/fabric-x-committer/fabric-x-committer/service/vc/validator.go:128 +0x771
github.com/hyperledger/fabric-x-committer/service/vc.(*transactionValidator).run.func1()
/home/runner/work/fabric-x-committer/fabric-x-committer/service/vc/validator.go:81 +0x58
golang.org/x/sync/errgroup.(*Group).Go.func1()
/home/runner/go/pkg/mod/golang.org/x/sync@v0.19.0/errgroup/errgroup.go:93 +0x86
Goroutine 2485 (running) created at:
golang.org/x/sync/errgroup.(*Group).Go()
/home/runner/go/pkg/mod/golang.org/x/sync@v0.19.0/errgroup/errgroup.go:78 +0x11c
github.com/hyperledger/fabric-x-committer/service/vc.(*transactionCommitter).run()
/home/runner/work/fabric-x-committer/fabric-x-committer/service/vc/committer.go:57 +0x213
github.com/hyperledger/fabric-x-committer/service/vc.(*ValidatorCommitterService).Run.func5()
/home/runner/work/fabric-x-committer/fabric-x-committer/service/vc/validator_committer_service.go:157 +0xa4
golang.org/x/sync/errgroup.(*Group).Go.func1()
/home/runner/go/pkg/mod/golang.org/x/sync@v0.19.0/errgroup/errgroup.go:93 +0x86
Goroutine 2517 (running) created at:
golang.org/x/sync/errgroup.(*Group).Go()
/home/runner/go/pkg/mod/golang.org/x/sync@v0.19.0/errgroup/errgroup.go:78 +0x11c
github.com/hyperledger/fabric-x-committer/service/vc.(*transactionValidator).run()
/home/runner/work/fabric-x-committer/fabric-x-committer/service/vc/validator.go:80 +0x213
github.com/hyperledger/fabric-x-committer/service/vc.(*ValidatorCommitterService).Run.func4()
/home/runner/work/fabric-x-committer/fabric-x-committer/service/vc/validator_committer_service.go:152 +0xa4
golang.org/x/sync/errgroup.(*Group).Go.func1()
/home/runner/go/pkg/mod/golang.org/x/sync@v0.19.0/errgroup/errgroup.go:93 +0x86
==================