-
Notifications
You must be signed in to change notification settings - Fork 994
Description
Verification Checklist
- I have searched the existing Issues and believe this is not a duplicate
Go Version
1.24
Dubbo-go Version
v3.3.1
OS
Linux / All platforms
Bug Description
A comprehensive audit of the dubbo-go codebase reveals 44 data race conditions across the framework. The most critical is a systematic pattern in the common/extension/ package where 20+ global maps are accessed without any synchronization, plus a confirmed bug where SetProviderService uses the wrong lock variable. These issues will cause fatal error: concurrent map read and map write crashes in production.
Running go test -race on any test that exercises dynamic service registration/deregistration, configuration updates, or concurrent RPC calls will reproduce these races.
Category 1: common/extension/ — 20+ global maps with zero lock protection (P0)
Every file in common/extension/ follows the exact same unsafe pattern:
var things = make(map[string]func() Thing)
func SetThing(name string, v func() Thing) {
things[name] = v // write: no lock
}
func GetThing(name string) Thing {
if things[name] == nil { // read: no lock
panic("...")
}
return things[name]()
}Affected files (all in common/extension/):
| File | Global map(s) | Extra risk |
|---|---|---|
filter.go |
filters, rejectedExecutionHandler |
Has UnregisterFilter + GetAllFilterNames (runtime mutation) |
protocol.go |
protocols |
Hot path: every invoker creation |
registry.go |
registries |
|
cluster.go |
clusters |
Hot path: invoker reorganization |
loadbalance.go |
loadbalances |
Has UnregisterLoadbalance (runtime mutation) |
configurator.go |
configurator |
Config center hot reload path |
config_center.go |
configCenters |
|
config_center_factory.go |
configCenterFactories |
|
config.go |
configs |
|
metadata_report_factory.go |
metaDataReportFactories |
|
proxy_factory.go |
proxyFactories |
|
tps_limit.go |
tpsLimitStrategy, tpsLimiter |
|
auth.go |
authenticators, accessKeyStorages |
|
rest_client.go |
restClients |
|
rest_server.go |
restServers |
|
config_reader.go |
configReaders, defaults |
GetDefaultConfigReader returns internal map reference! |
config_post_processor.go |
processors |
|
router_factory.go |
routers |
GetRouterFactories returns internal map reference! |
registry_directory.go |
directories, defaultDirectory |
|
service_discovery.go |
discoveryCreatorMap |
|
service_instance_selector_factory.go |
serviceInstanceSelectorMappings |
|
logger.go |
logs |
|
otel_trace.go |
traceExporterMap |
Iterated in shutdown callback |
Worst cases: GetDefaultConfigReader() and GetRouterFactories() return the internal map reference directly — callers iterate the returned map without any lock, while Set* functions write to the same map concurrently. This is guaranteed to crash.
Recommended fix: Introduce a generic Registry[T] container with built-in sync.RWMutex:
type Registry[T any] struct {
mu sync.RWMutex
items map[string]T
}
func (r *Registry[T]) Set(name string, v T) {
r.mu.Lock()
defer r.mu.Unlock()
r.items[name] = v
}
func (r *Registry[T]) Get(name string) (T, bool) {
r.mu.RLock()
defer r.mu.RUnlock()
v, ok := r.items[name]
return v, ok
}
func (r *Registry[T]) Snapshot() map[string]T {
r.mu.RLock()
defer r.mu.RUnlock()
m := make(map[string]T, len(r.items))
for k, v := range r.items {
m[k] = v
}
return m
}This would fix all 20+ files with a single refactor.
Category 2: dubbo.go SetProviderService uses wrong lock (P0)
File: dubbo.go:300-306
var (
consumerServices = map[string]*client.ClientDefinition{}
conLock sync.RWMutex // for consumerServices
providerServices = map[string]*server.ServiceDefinition{}
proLock sync.RWMutex // for providerServices
)
func SetProviderService(svc common.RPCService) {
conLock.Lock() // BUG: should be proLock!
defer conLock.Unlock()
providerServices[...] = ... // writing providerServices with wrong lock
}SetProviderService uses conLock (consumer lock) to protect writes to providerServices, but loadProvider() (L219-220) uses proLock.RLock() to protect reads of the same map. Two different locks protecting the same map = no protection at all. Concurrent SetProviderService and loadProvider will crash.
Also: GetConsumerConnection (L308-310) reads consumerServices without holding conLock at all.
Fix: Change L301 conLock.Lock() → proLock.Lock(). Add conLock.RLock() to GetConsumerConnection.
Category 3: RouterChain.Route reads invokers without lock (P0)
File: cluster/router/chain/chain.go:55-67, 91-98
func (c *RouterChain) Route(...) []base.Invoker {
finalInvokers := make([]base.Invoker, 0, len(c.invokers)) // read: NO LOCK
for _, invoker := range c.invokers { // iterate: NO LOCK
// ...
}
}
func (c *RouterChain) SetInvokers(invokers []base.Invoker) {
c.mutex.Lock() // write: has lock
defer c.mutex.Unlock()
c.invokers = invokers
}Route() is the hottest path in the framework (called on every RPC), but reads c.invokers without any lock. SetInvokers writes with mutex.Lock(). Slice header tear (reading partially-updated data/len/cap) can cause out-of-bounds panic or routing to wrong nodes.
Fix: Add c.mutex.RLock() in Route(), snapshot the slice before iteration.
Category 4: config/service.go returns internal map references (P0)
File: config/service.go:91-93, 102-104
func GetProviderServiceMap() map[string]common.RPCService {
return proServices // returns internal map without lock or copy
}Caller provider_config.go:171 iterates the returned map with for range, while SetProviderService (L54-62, holding proServicesLock) writes to the same map concurrently. No lock held during iteration = crash.
Fix: Return a copy under lock.
Category 5: RegistryDirectory fields accessed without locks (P1)
Multiple fields in RegistryDirectory are accessed concurrently without proper synchronization:
IsAvailable()readscacheInvokerswithoutinvokersLock(L534-546) —setNewInvokers()writes with lock, butIsAvailable()reads without lockDestroy()reads/writescacheInvokerswithoutinvokersLock(L549-573)configuratorsslice appended inconvertUrl()(L379) without lock, read inoverrideUrl()(L575-579) without lockSubscribedUrlwritten in goroutine (L165) without lock, read inDestroy()(L559)
Category 6: Other notable races (P1-P2)
| Issue | File | Description |
|---|---|---|
rootConfig global |
config/config_loader.go:37,56 |
Written by Load()/SetRootConfig(), read by GetRootConfig() everywhere, no atomic/lock |
overrideSubscribeListener.configurator |
registry/protocol/protocol.go:338-394 |
Written in Notify() goroutine, read in doOverrideIfNecessary(), no lock |
hystrix Filter.res map |
filter/hystrix/filter.go:130-178 |
Written under configLoadMutex.Lock(), read in hystrix.Do callback without lock |
applicationName global |
metrics/common.go:46-63 |
Written by InitAppInfo(), read by metrics goroutine, no atomic |
versionInt map × 2 |
protocol/dubbo/impl/hessian.go:166, hessian2/hessian_response.go:405 |
Concurrent reads from multiple connection goroutines |
customizers slice |
extension/service_instance_customizer.go:28-41 |
AddCustomizers appends+sorts, GetCustomizers returns internal slice reference |
scheduledLookUp goroutine |
registry/nacos/registry.go:233-238 |
Uses time.Sleep without context, can't exit promptly on shutdown |
Subscribe timeout goroutine leak |
registry/directory/directory.go:162-211 |
Timeout returns error but goroutines continue running indefinitely |
Summary
| Category | Count | Severity |
|---|---|---|
| extension global maps (systematic) | 20+ maps across 22 files | P0 |
Wrong lock (SetProviderService) |
1 | P0 |
RouterChain.Route lockless |
1 | P0 |
config/service.go internal map exposure |
2 maps | P0 |
RegistryDirectory field races |
4 fields | P1 |
| Other races | 10+ | P1-P2 |
Reproduction
go test -race ./...Any test exercising concurrent service registration + RPC invocation will trigger race detector reports.
Suggested Approach
- Introduce
extension.Registry[T]generic container — fixes 20+ files in one refactor - Fix
dubbo.go:301wrong lock — one-line fix, highest ROI - Add
RLocktoRouterChain.Route— critical hot path - Return map copies from
config/service.go— straightforward - Add locks to
RegistryDirectoryfield accesses — systematic review needed