-
Notifications
You must be signed in to change notification settings - Fork 994
Description
Background
After auditing the dubbo-go codebase for memory leaks (#3246) and data races (#3247), the root causes point to several structural issues that have accumulated over the project's 10-year history. This issue proposes a prioritized improvement roadmap.
1. Redesign common.URL — the core data structure needs a major overhaul
common.URL is the single most important type in the framework. Every invoker, exporter, registry, and configuration path revolves around it. But it has grown organically into a god object with serious problems:
Current issues:
params(url.Values) exposed viaGetParams()without lock or copy — callers mutate internalsattributes(map[string]any) is write-only — noDeleteAttribute, noClearAttributesAddParamvsSetParamsemantic confusion (Addappends, not overwrites) causing silent accumulationMergeURLreadsanotherUrl.attributeswithoutattributesLock— data raceGetCacheInvokerMapKeyallocates a full temporary URL on every call — unnecessary GC pressureSubURLpointer creates implicit reference chains that prevent GC- Multiple key generation methods (
Key(),GetCacheInvokerMapKey(),ServiceKey()) with different formats, causing key mismatch bugs (e.g., invoker blacklist)
Proposed redesign:
- Make
paramsimmutable after construction — provideWithParam(k, v)that returns a new URL (or at minimum, return copies from getters) - Add
DeleteAttribute(key)andClearAttributes()methods - Deprecate
AddParam— it's almost always misused whenSetParamis intended - Unify key generation into a single canonical
Key()method - Consider splitting URL into
URLSpec(immutable identity) +URLState(mutable runtime state) - Add
Clone()that does proper deep copy of attributes
2. Make common/extension/ concurrency-safe — introduce Registry[T]
The extension package has 20+ global maps, all following the same unsafe pattern (no locks, some returning internal map references). This is the single largest source of potential crashes.
Proposed solution: A generic thread-safe registry:
package extension
type Registry[T any] struct {
mu sync.RWMutex
items map[string]T
}
func NewRegistry[T any]() *Registry[T] { ... }
func (r *Registry[T]) Register(name string, v T) { ... }
func (r *Registry[T]) Get(name string) (T, bool) { ... }
func (r *Registry[T]) MustGet(name string) T { ... } // panics if missing
func (r *Registry[T]) Unregister(name string) { ... }
func (r *Registry[T]) Snapshot() map[string]T { ... } // returns copy
func (r *Registry[T]) Names() []string { ... } // returns copyThis single abstraction would fix all 22 files in common/extension/ and eliminate an entire class of bugs. It also standardizes the API (current code mixes Set*/Get*/Unregister*/GetAll* inconsistently).
3. Implement proper lifecycle management for invokers/exporters
Current state:
registryProtocol.Destroy()doesn't cleanoverrideListenersorserviceConfigurationListenersRegistryDirectory.Destroy()doesn't cleancacheInvokersMap- Invoker blacklist entries become permanently orphaned due to key mismatch
exporterChangeableWrapperURLs are set after being stored in the map, creating a race window- No
Closeable/Lifecycleinterface enforced — each component invents its own cleanup pattern
Proposed improvements:
- Define a
Lifecycleinterface:Start(),Stop(),IsRunning() bool - Enforce cleanup contracts: every
Storemust have a correspondingDeletepath - Add integration tests that verify zero goroutine/memory leaks after full startup-shutdown cycle
- Use
context.Contextconsistently for cancellation (replacetime.Sleeppolling patterns in nacos, etc.)
4. Enforce go test -race in CI
Currently, many data races go undetected because -race is not mandatory in CI. Given the scale of race conditions found (#3247), this is the single most impactful process change.
Action items:
- Add
-raceflag to CI test runs (at minimum for core packages) - Fix all existing race detector failures
- Block merges on race detector failures
5. API hygiene — stop exposing internal state
Several public APIs return internal references that callers can mutate, breaking encapsulation:
| API | Returns | Risk |
|---|---|---|
URL.GetParams() |
Internal url.Values map |
Caller mutation + race |
GetDefaultConfigReader() |
Internal defaults map |
Concurrent iteration crash |
GetRouterFactories() |
Internal routers map |
Concurrent iteration crash |
GetProviderServiceMap() |
Internal proServices map |
Concurrent iteration crash |
GetConsumerServiceMap() |
Internal conServices map |
Concurrent iteration crash |
GetAllCustomShutdownCallbacks() |
Internal *list.List |
Concurrent mutation |
GetCustomizers() |
Internal slice reference | Concurrent append+sort |
Rule: Public getters must return copies, never internal references. This should be a review checklist item.
6. Reduce config package coupling
The ongoing refactor (PRs #3231, #3232) to remove config package dependencies from protocol implementations is the right direction. But the approach needs more care:
- Don't silently change error semantics (e.g.,
LoadRegistrieschanging from panic to silent skip) - Don't break public APIs without deprecation (e.g.,
WithMethodsignature change) - Validate configuration at init time (don't remove init-time validation without replacement)
- Use URL attributes consistently — document which attributes are expected and where they're set
7. Standardize error handling — eliminate panic in library code
The extension package uses panic for missing registrations:
func GetProtocol(name string) base.Protocol {
if protocols[name] == nil {
panic("protocol for [" + name + "] is not existing, ...")
}
}This is inappropriate for a library/framework. A missing extension should return an error, not crash the entire process. The caller can decide whether to panic.
Rule: Reserve panic for truly unrecoverable programmer errors. Extension lookups should return (T, error).
Priority
| Priority | Item | Impact | Effort |
|---|---|---|---|
| P0 | Fix dubbo.go:301 wrong lock |
Crash prevention | 1 line |
| P0 | Enable -race in CI |
Prevent future regressions | Config change |
| P0 | extension.Registry[T] |
Fix 20+ race-prone files | Medium (1 week) |
| P1 | URL DeleteAttribute + cleanup |
Memory leak prevention | Small |
| P1 | Fix RouterChain.Route lockless read |
Crash prevention | Small |
| P1 | Return copies from public getters | Crash prevention | Small |
| P2 | URL redesign | Long-term maintainability | Large |
| P2 | Lifecycle interface | Leak prevention | Medium |
| P2 | Eliminate panics | API quality | Medium |
Happy to help with implementation of any of these items. Related issues: #3246, #3247.