Skip to content

[data race] VC #451

@liran-funaro

Description

@liran-funaro

This issue depicts two race conditions.


Comprehensive Root Cause Analysis: Data Races in Validator-Committer Pipeline

Executive Summary

Both data races originate from the same architectural flaw: the validator goroutine continues to access shared data structures after transferring ownership to the committer goroutine via a channel, violating Go's concurrency safety principles.


The Architecture

The system uses a pipeline pattern where:

  1. Validator goroutine validates transactions and creates a validatedTransactions struct
  2. Sends this struct through a channel to the Committer goroutine
  3. Committer processes and commits the transactions

The Shared Data Structure

type validatedTransactions struct {
    validTxNonBlindWrites transactionToWrites  // map[TxID]namespaceToWrites
    validTxBlindWrites    transactionToWrites  // map[TxID]namespaceToWrites
    newWrites             transactionToWrites  // map[TxID]namespaceToWrites
    readToTxIDs           readToTransactions
    invalidTxStatus       map[TxID]committerpb.Status
    txIDToHeight          transactionIDToHeight
}

These are reference types (maps) - when passed through the channel, both goroutines share the same underlying data.


Root Cause: Ownership Violation

Location: validator.go, lines 108-125

// Step 2: construct validated transactions
vTxs := &validatedTransactions{
    validTxNonBlindWrites: prepTx.txIDToNsNonBlindWrites,  // Shared reference
    validTxBlindWrites:    prepTx.txIDToNsBlindWrites,     // Shared reference
    newWrites:             prepTx.txIDToNsNewWrites,        // Shared reference
    readToTxIDs:           prepTx.readToTxIDs,
    invalidTxStatus:       prepTx.invalidTxIDStatus,
    txIDToHeight:          prepTx.txIDToHeight,
}
if err := vTxs.invalidateTxsOnReadConflicts(nsToReadConflicts); err != nil {
    return err
}

promutil.Observe(v.metrics.validatorTxBatchLatencySeconds, time.Since(start))
outgoingValidatedTransactions.Write(vTxs)  // ← Ownership transferred here

logger.Debugf("Validator sent batch...")
vTxs.Debug()  // ← BUG: Accessing after ownership transfer!

The Problem: After Write(vTxs), the validator no longer owns this data, yet it immediately calls vTxs.Debug() which reads the maps.


Data Race #1: Read-Write Conflict on Map Insertion

Timeline:

  1. Validator (goroutine 5697): Calls vTxs.Debug() → reads len(v.validTxNonBlindWrites) at line 128
  2. Committer (goroutine 5704): Calls populateVersionsAndCategorizeBlindWrites() → writes to validTxNonBlindWrites via getOrCreate() at line 225

Committer's Write Operation (committer.go:212-225):

for curTxID, txWrites := range vTx.validTxBlindWrites {
    for ns, nsWrites := range txWrites {
        nsState := state[ns]
        for i, key := range nsWrites.keys {
            if ver, present := nsState[string(key)]; present {
                nextVer := ver + 1
                vTx.validTxNonBlindWrites.getOrCreate(curTxID, ns).append(...)  // ← WRITE
            } else {
                vTx.newWrites.getOrCreate(curTxID, ns).append(...)
            }
        }
    }
}

Validator's Read Operation (validator.go:46-56):

func (v *validatedTransactions) Debug() {
    if !logger.IsEnabledFor(zapcore.DebugLevel) {
        return
    }
    logger.Debugf("total validated: %d\n\t"+
        "valid non-blind writes: %d\n\t"+  // ← READ: len(v.validTxNonBlindWrites)
        ...
}

Race Condition: Concurrent map read (length check) and write (insertion) without synchronization.


Data Race #2: Read-Write Conflict on Map Deletion

Timeline:

  1. Validator (goroutine 2517): Calls vTxs.Debug() → reads len(v.validTxNonBlindWrites) at line 128
  2. Committer (goroutine 2485): Calls updateInvalidTxs() → deletes from validTxNonBlindWrites at line 187

Committer's Write Operation (validator.go:184-193):

func (v *validatedTransactions) updateInvalidTxs(txIDs []TxID, status committerpb.Status) {
    for _, tID := range txIDs {
        delete(v.validTxNonBlindWrites, tID)  // ← WRITE (delete)
        delete(v.validTxBlindWrites, tID)
        delete(v.newWrites, tID)
        v.invalidTxStatus[tID] = status
    }
}

Called from committer.go:168:

vTx.updateInvalidTxs(duplicates, committerpb.Status_REJECTED_DUPLICATE_TX_ID)

Race Condition: Concurrent map read (length check) and write (deletion) without synchronization.


Why This Is Dangerous

  1. Undefined Behavior: Go maps are not thread-safe. Concurrent read/write causes undefined behavior
  2. Potential Crashes: Can lead to runtime panics or memory corruption
  3. Data Corruption: May produce incorrect results silently
  4. Intermittent Failures: Race conditions are timing-dependent, making them hard to reproduce

The Fix: Respect Ownership Transfer

Current Code (validator.go:120-125):

promutil.Observe(v.metrics.validatorTxBatchLatencySeconds, time.Since(start))
outgoingValidatedTransactions.Write(vTxs)  // Transfer ownership

logger.Debugf("Validator sent batch...")
vTxs.Debug()  // ← BUG: Access after transfer

Fixed Code:

promutil.Observe(v.metrics.validatorTxBatchLatencySeconds, time.Since(start))

// Debug BEFORE transferring ownership
logger.Debugf("Validator sent batch validated TXs to the committer (%d non blind writes and %d blind writes)",
    len(vTxs.validTxNonBlindWrites), len(vTxs.validTxBlindWrites))
vTxs.Debug()

// After this point, vTxs is owned by the committer - DO NOT ACCESS
outgoingValidatedTransactions.Write(vTxs)

Key Principles Violated

  1. Channel Ownership Transfer: In Go, sending data through a channel transfers logical ownership. The sender must not access the data afterward.

  2. Happens-Before Relationship: There's no happens-before guarantee between the channel send and subsequent operations in the sender goroutine vs. the receiver's processing.

  3. Map Concurrency Safety: Go maps require external synchronization for concurrent access. Reading (even just len()) while another goroutine writes is undefined behavior.


Prevention Strategy

For this codebase:

  • Move all debugging/logging that accesses vTxs to before the channel write
  • Add a comment: // DO NOT ACCESS vTxs AFTER THIS POINT
  • Consider using go vet with race detector in CI: go test -race

General best practices:

  • Treat channel sends as ownership transfers
  • Never access data after sending through a channel
  • Use sync.Mutex if shared access is truly needed
  • Prefer message passing over shared memory

Case 1 output

==================
  WARNING: DATA RACE
  Write at 0x00c00595c6f8 by goroutine 5704:
    github.com/hyperledger/fabric-x-committer/service/vc.(*transactionCommitter).populateVersionsAndCategorizeBlindWrites()
        /home/runner/work/fabric-x-committer/fabric-x-committer/service/vc/committer.go:225 +0x15b9
    github.com/hyperledger/fabric-x-committer/service/vc.(*transactionCommitter).commitTransactions()
        /home/runner/work/fabric-x-committer/fabric-x-committer/service/vc/committer.go:105 +0xc4
    github.com/hyperledger/fabric-x-committer/service/vc.(*transactionCommitter).commit()
        /home/runner/work/fabric-x-committer/fabric-x-committer/service/vc/committer.go:87 +0x533
    github.com/hyperledger/fabric-x-committer/service/vc.(*transactionCommitter).run.func1()
        /home/runner/work/fabric-x-committer/fabric-x-committer/service/vc/committer.go:58 +0x58
    golang.org/x/sync/errgroup.(*Group).Go.func1()
        /home/runner/go/pkg/mod/golang.org/x/sync@v0.19.0/errgroup/errgroup.go:93 +0x86
  
  Previous read at 0x00c00595c6f8 by goroutine 5697:
    github.com/hyperledger/fabric-x-committer/service/vc.(*transactionValidator).validate()
        /home/runner/work/fabric-x-committer/fabric-x-committer/service/vc/validator.go:128 +0x792
    github.com/hyperledger/fabric-x-committer/service/vc.(*transactionValidator).run.func1()
        /home/runner/work/fabric-x-committer/fabric-x-committer/service/vc/validator.go:81 +0x58
    golang.org/x/sync/errgroup.(*Group).Go.func1()
        /home/runner/go/pkg/mod/golang.org/x/sync@v0.19.0/errgroup/errgroup.go:93 +0x86
  
  Goroutine 5704 (running) created at:
    golang.org/x/sync/errgroup.(*Group).Go()
        /home/runner/go/pkg/mod/golang.org/x/sync@v0.19.0/errgroup/errgroup.go:78 +0x11c
    github.com/hyperledger/fabric-x-committer/service/vc.(*transactionCommitter).run()
        /home/runner/work/fabric-x-committer/fabric-x-committer/service/vc/committer.go:57 +0x213
    github.com/hyperledger/fabric-x-committer/service/vc.(*ValidatorCommitterService).Run.func5()
        /home/runner/work/fabric-x-committer/fabric-x-committer/service/vc/validator_committer_service.go:157 +0xa4
    golang.org/x/sync/errgroup.(*Group).Go.func1()
        /home/runner/go/pkg/mod/golang.org/x/sync@v0.19.0/errgroup/errgroup.go:93 +0x86
  
  Goroutine 5697 (running) created at:
    golang.org/x/sync/errgroup.(*Group).Go()
        /home/runner/go/pkg/mod/golang.org/x/sync@v0.19.0/errgroup/errgroup.go:78 +0x11c
    github.com/hyperledger/fabric-x-committer/service/vc.(*transactionValidator).run()
        /home/runner/work/fabric-x-committer/fabric-x-committer/service/vc/validator.go:80 +0x213
    github.com/hyperledger/fabric-x-committer/service/vc.(*ValidatorCommitterService).Run.func4()
        /home/runner/work/fabric-x-committer/fabric-x-committer/service/vc/validator_committer_service.go:152 +0xa4
    golang.org/x/sync/errgroup.(*Group).Go.func1()
        /home/runner/go/pkg/mod/golang.org/x/sync@v0.19.0/errgroup/errgroup.go:93 +0x86
  ==================

Case 2 output

 ==================
  WARNING: DATA RACE
  Write at 0x00c0004fe3c0 by goroutine 2485:
    runtime.mapdelete_faststr()
        /opt/hostedtoolcache/go/1.26.1/x64/src/internal/runtime/maps/runtime_faststr.go:402 +0x0
    github.com/hyperledger/fabric-x-committer/service/vc.(*validatedTransactions).updateInvalidTxs()
        /home/runner/work/fabric-x-committer/fabric-x-committer/service/vc/validator.go:187 +0x636
    github.com/hyperledger/fabric-x-committer/service/vc.(*transactionCommitter).commitTransactions()
        /home/runner/work/fabric-x-committer/fabric-x-committer/service/vc/committer.go:168 +0x515
    github.com/hyperledger/fabric-x-committer/service/vc.(*transactionCommitter).commit()
        /home/runner/work/fabric-x-committer/fabric-x-committer/service/vc/committer.go:87 +0x533
    github.com/hyperledger/fabric-x-committer/service/vc.(*transactionCommitter).run.func1()
        /home/runner/work/fabric-x-committer/fabric-x-committer/service/vc/committer.go:58 +0x58
    golang.org/x/sync/errgroup.(*Group).Go.func1()
        /home/runner/go/pkg/mod/golang.org/x/sync@v0.19.0/errgroup/errgroup.go:93 +0x86
  
  Previous read at 0x00c0004fe3c0 by goroutine 2517:
    github.com/hyperledger/fabric-x-committer/service/vc.(*transactionValidator).validate()
        /home/runner/work/fabric-x-committer/fabric-x-committer/service/vc/validator.go:128 +0x771
    github.com/hyperledger/fabric-x-committer/service/vc.(*transactionValidator).run.func1()
        /home/runner/work/fabric-x-committer/fabric-x-committer/service/vc/validator.go:81 +0x58
    golang.org/x/sync/errgroup.(*Group).Go.func1()
        /home/runner/go/pkg/mod/golang.org/x/sync@v0.19.0/errgroup/errgroup.go:93 +0x86
  
  Goroutine 2485 (running) created at:
    golang.org/x/sync/errgroup.(*Group).Go()
        /home/runner/go/pkg/mod/golang.org/x/sync@v0.19.0/errgroup/errgroup.go:78 +0x11c
    github.com/hyperledger/fabric-x-committer/service/vc.(*transactionCommitter).run()
        /home/runner/work/fabric-x-committer/fabric-x-committer/service/vc/committer.go:57 +0x213
    github.com/hyperledger/fabric-x-committer/service/vc.(*ValidatorCommitterService).Run.func5()
        /home/runner/work/fabric-x-committer/fabric-x-committer/service/vc/validator_committer_service.go:157 +0xa4
    golang.org/x/sync/errgroup.(*Group).Go.func1()
        /home/runner/go/pkg/mod/golang.org/x/sync@v0.19.0/errgroup/errgroup.go:93 +0x86
  
  Goroutine 2517 (running) created at:
    golang.org/x/sync/errgroup.(*Group).Go()
        /home/runner/go/pkg/mod/golang.org/x/sync@v0.19.0/errgroup/errgroup.go:78 +0x11c
    github.com/hyperledger/fabric-x-committer/service/vc.(*transactionValidator).run()
        /home/runner/work/fabric-x-committer/fabric-x-committer/service/vc/validator.go:80 +0x213
    github.com/hyperledger/fabric-x-committer/service/vc.(*ValidatorCommitterService).Run.func4()
        /home/runner/work/fabric-x-committer/fabric-x-committer/service/vc/validator_committer_service.go:152 +0xa4
    golang.org/x/sync/errgroup.(*Group).Go.func1()
        /home/runner/go/pkg/mod/golang.org/x/sync@v0.19.0/errgroup/errgroup.go:93 +0x86
  ==================

Metadata

Metadata

Assignees

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions