Skip to content

[BUG] Systematic data races: 20+ extension global maps unprotected, wrong lock in SetProviderService, RouterChain.Route lockless #3247

@AlexStocks

Description

@AlexStocks

Verification Checklist

Go Version

1.24

Dubbo-go Version

v3.3.1

OS

Linux / All platforms

Bug Description

A comprehensive audit of the dubbo-go codebase reveals 44 data race conditions across the framework. The most critical is a systematic pattern in the common/extension/ package where 20+ global maps are accessed without any synchronization, plus a confirmed bug where SetProviderService uses the wrong lock variable. These issues will cause fatal error: concurrent map read and map write crashes in production.

Running go test -race on any test that exercises dynamic service registration/deregistration, configuration updates, or concurrent RPC calls will reproduce these races.


Category 1: common/extension/ — 20+ global maps with zero lock protection (P0)

Every file in common/extension/ follows the exact same unsafe pattern:

var things = make(map[string]func() Thing)

func SetThing(name string, v func() Thing) {
    things[name] = v  // write: no lock
}

func GetThing(name string) Thing {
    if things[name] == nil {  // read: no lock
        panic("...")
    }
    return things[name]()
}

Affected files (all in common/extension/):

File Global map(s) Extra risk
filter.go filters, rejectedExecutionHandler Has UnregisterFilter + GetAllFilterNames (runtime mutation)
protocol.go protocols Hot path: every invoker creation
registry.go registries
cluster.go clusters Hot path: invoker reorganization
loadbalance.go loadbalances Has UnregisterLoadbalance (runtime mutation)
configurator.go configurator Config center hot reload path
config_center.go configCenters
config_center_factory.go configCenterFactories
config.go configs
metadata_report_factory.go metaDataReportFactories
proxy_factory.go proxyFactories
tps_limit.go tpsLimitStrategy, tpsLimiter
auth.go authenticators, accessKeyStorages
rest_client.go restClients
rest_server.go restServers
config_reader.go configReaders, defaults GetDefaultConfigReader returns internal map reference!
config_post_processor.go processors
router_factory.go routers GetRouterFactories returns internal map reference!
registry_directory.go directories, defaultDirectory
service_discovery.go discoveryCreatorMap
service_instance_selector_factory.go serviceInstanceSelectorMappings
logger.go logs
otel_trace.go traceExporterMap Iterated in shutdown callback

Worst cases: GetDefaultConfigReader() and GetRouterFactories() return the internal map reference directly — callers iterate the returned map without any lock, while Set* functions write to the same map concurrently. This is guaranteed to crash.

Recommended fix: Introduce a generic Registry[T] container with built-in sync.RWMutex:

type Registry[T any] struct {
    mu    sync.RWMutex
    items map[string]T
}

func (r *Registry[T]) Set(name string, v T) {
    r.mu.Lock()
    defer r.mu.Unlock()
    r.items[name] = v
}

func (r *Registry[T]) Get(name string) (T, bool) {
    r.mu.RLock()
    defer r.mu.RUnlock()
    v, ok := r.items[name]
    return v, ok
}

func (r *Registry[T]) Snapshot() map[string]T {
    r.mu.RLock()
    defer r.mu.RUnlock()
    m := make(map[string]T, len(r.items))
    for k, v := range r.items {
        m[k] = v
    }
    return m
}

This would fix all 20+ files with a single refactor.


Category 2: dubbo.go SetProviderService uses wrong lock (P0)

File: dubbo.go:300-306

var (
    consumerServices = map[string]*client.ClientDefinition{}
    conLock          sync.RWMutex     // for consumerServices
    providerServices = map[string]*server.ServiceDefinition{}
    proLock          sync.RWMutex     // for providerServices
)

func SetProviderService(svc common.RPCService) {
    conLock.Lock()           // BUG: should be proLock!
    defer conLock.Unlock()
    providerServices[...] = ...  // writing providerServices with wrong lock
}

SetProviderService uses conLock (consumer lock) to protect writes to providerServices, but loadProvider() (L219-220) uses proLock.RLock() to protect reads of the same map. Two different locks protecting the same map = no protection at all. Concurrent SetProviderService and loadProvider will crash.

Also: GetConsumerConnection (L308-310) reads consumerServices without holding conLock at all.

Fix: Change L301 conLock.Lock()proLock.Lock(). Add conLock.RLock() to GetConsumerConnection.


Category 3: RouterChain.Route reads invokers without lock (P0)

File: cluster/router/chain/chain.go:55-67, 91-98

func (c *RouterChain) Route(...) []base.Invoker {
    finalInvokers := make([]base.Invoker, 0, len(c.invokers))  // read: NO LOCK
    for _, invoker := range c.invokers {                        // iterate: NO LOCK
        // ...
    }
}

func (c *RouterChain) SetInvokers(invokers []base.Invoker) {
    c.mutex.Lock()           // write: has lock
    defer c.mutex.Unlock()
    c.invokers = invokers
}

Route() is the hottest path in the framework (called on every RPC), but reads c.invokers without any lock. SetInvokers writes with mutex.Lock(). Slice header tear (reading partially-updated data/len/cap) can cause out-of-bounds panic or routing to wrong nodes.

Fix: Add c.mutex.RLock() in Route(), snapshot the slice before iteration.


Category 4: config/service.go returns internal map references (P0)

File: config/service.go:91-93, 102-104

func GetProviderServiceMap() map[string]common.RPCService {
    return proServices  // returns internal map without lock or copy
}

Caller provider_config.go:171 iterates the returned map with for range, while SetProviderService (L54-62, holding proServicesLock) writes to the same map concurrently. No lock held during iteration = crash.

Fix: Return a copy under lock.


Category 5: RegistryDirectory fields accessed without locks (P1)

Multiple fields in RegistryDirectory are accessed concurrently without proper synchronization:

  1. IsAvailable() reads cacheInvokers without invokersLock (L534-546) — setNewInvokers() writes with lock, but IsAvailable() reads without lock
  2. Destroy() reads/writes cacheInvokers without invokersLock (L549-573)
  3. configurators slice appended in convertUrl() (L379) without lock, read in overrideUrl() (L575-579) without lock
  4. SubscribedUrl written in goroutine (L165) without lock, read in Destroy() (L559)

Category 6: Other notable races (P1-P2)

Issue File Description
rootConfig global config/config_loader.go:37,56 Written by Load()/SetRootConfig(), read by GetRootConfig() everywhere, no atomic/lock
overrideSubscribeListener.configurator registry/protocol/protocol.go:338-394 Written in Notify() goroutine, read in doOverrideIfNecessary(), no lock
hystrix Filter.res map filter/hystrix/filter.go:130-178 Written under configLoadMutex.Lock(), read in hystrix.Do callback without lock
applicationName global metrics/common.go:46-63 Written by InitAppInfo(), read by metrics goroutine, no atomic
versionInt map × 2 protocol/dubbo/impl/hessian.go:166, hessian2/hessian_response.go:405 Concurrent reads from multiple connection goroutines
customizers slice extension/service_instance_customizer.go:28-41 AddCustomizers appends+sorts, GetCustomizers returns internal slice reference
scheduledLookUp goroutine registry/nacos/registry.go:233-238 Uses time.Sleep without context, can't exit promptly on shutdown
Subscribe timeout goroutine leak registry/directory/directory.go:162-211 Timeout returns error but goroutines continue running indefinitely

Summary

Category Count Severity
extension global maps (systematic) 20+ maps across 22 files P0
Wrong lock (SetProviderService) 1 P0
RouterChain.Route lockless 1 P0
config/service.go internal map exposure 2 maps P0
RegistryDirectory field races 4 fields P1
Other races 10+ P1-P2

Reproduction

go test -race ./...

Any test exercising concurrent service registration + RPC invocation will trigger race detector reports.

Suggested Approach

  1. Introduce extension.Registry[T] generic container — fixes 20+ files in one refactor
  2. Fix dubbo.go:301 wrong lock — one-line fix, highest ROI
  3. Add RLock to RouterChain.Route — critical hot path
  4. Return map copies from config/service.go — straightforward
  5. Add locks to RegistryDirectory field accesses — systematic review needed

Metadata

Metadata

Assignees

No one assigned

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions