Skip to content

[Enhancement] Codebase quality improvements: URL redesign, extension safety, lifecycle management, API hygiene #3248

@AlexStocks

Description

@AlexStocks

Background

After auditing the dubbo-go codebase for memory leaks (#3246) and data races (#3247), the root causes point to several structural issues that have accumulated over the project's 10-year history. This issue proposes a prioritized improvement roadmap.


1. Redesign common.URL — the core data structure needs a major overhaul

common.URL is the single most important type in the framework. Every invoker, exporter, registry, and configuration path revolves around it. But it has grown organically into a god object with serious problems:

Current issues:

  • params (url.Values) exposed via GetParams() without lock or copy — callers mutate internals
  • attributes (map[string]any) is write-only — no DeleteAttribute, no ClearAttributes
  • AddParam vs SetParam semantic confusion (Add appends, not overwrites) causing silent accumulation
  • MergeURL reads anotherUrl.attributes without attributesLock — data race
  • GetCacheInvokerMapKey allocates a full temporary URL on every call — unnecessary GC pressure
  • SubURL pointer creates implicit reference chains that prevent GC
  • Multiple key generation methods (Key(), GetCacheInvokerMapKey(), ServiceKey()) with different formats, causing key mismatch bugs (e.g., invoker blacklist)

Proposed redesign:

  1. Make params immutable after construction — provide WithParam(k, v) that returns a new URL (or at minimum, return copies from getters)
  2. Add DeleteAttribute(key) and ClearAttributes() methods
  3. Deprecate AddParam — it's almost always misused when SetParam is intended
  4. Unify key generation into a single canonical Key() method
  5. Consider splitting URL into URLSpec (immutable identity) + URLState (mutable runtime state)
  6. Add Clone() that does proper deep copy of attributes

2. Make common/extension/ concurrency-safe — introduce Registry[T]

The extension package has 20+ global maps, all following the same unsafe pattern (no locks, some returning internal map references). This is the single largest source of potential crashes.

Proposed solution: A generic thread-safe registry:

package extension

type Registry[T any] struct {
    mu    sync.RWMutex
    items map[string]T
}

func NewRegistry[T any]() *Registry[T] { ... }
func (r *Registry[T]) Register(name string, v T) { ... }
func (r *Registry[T]) Get(name string) (T, bool) { ... }
func (r *Registry[T]) MustGet(name string) T { ... }  // panics if missing
func (r *Registry[T]) Unregister(name string) { ... }
func (r *Registry[T]) Snapshot() map[string]T { ... }  // returns copy
func (r *Registry[T]) Names() []string { ... }         // returns copy

This single abstraction would fix all 22 files in common/extension/ and eliminate an entire class of bugs. It also standardizes the API (current code mixes Set*/Get*/Unregister*/GetAll* inconsistently).


3. Implement proper lifecycle management for invokers/exporters

Current state:

  • registryProtocol.Destroy() doesn't clean overrideListeners or serviceConfigurationListeners
  • RegistryDirectory.Destroy() doesn't clean cacheInvokersMap
  • Invoker blacklist entries become permanently orphaned due to key mismatch
  • exporterChangeableWrapper URLs are set after being stored in the map, creating a race window
  • No Closeable/Lifecycle interface enforced — each component invents its own cleanup pattern

Proposed improvements:

  1. Define a Lifecycle interface: Start(), Stop(), IsRunning() bool
  2. Enforce cleanup contracts: every Store must have a corresponding Delete path
  3. Add integration tests that verify zero goroutine/memory leaks after full startup-shutdown cycle
  4. Use context.Context consistently for cancellation (replace time.Sleep polling patterns in nacos, etc.)

4. Enforce go test -race in CI

Currently, many data races go undetected because -race is not mandatory in CI. Given the scale of race conditions found (#3247), this is the single most impactful process change.

Action items:

  1. Add -race flag to CI test runs (at minimum for core packages)
  2. Fix all existing race detector failures
  3. Block merges on race detector failures

5. API hygiene — stop exposing internal state

Several public APIs return internal references that callers can mutate, breaking encapsulation:

API Returns Risk
URL.GetParams() Internal url.Values map Caller mutation + race
GetDefaultConfigReader() Internal defaults map Concurrent iteration crash
GetRouterFactories() Internal routers map Concurrent iteration crash
GetProviderServiceMap() Internal proServices map Concurrent iteration crash
GetConsumerServiceMap() Internal conServices map Concurrent iteration crash
GetAllCustomShutdownCallbacks() Internal *list.List Concurrent mutation
GetCustomizers() Internal slice reference Concurrent append+sort

Rule: Public getters must return copies, never internal references. This should be a review checklist item.


6. Reduce config package coupling

The ongoing refactor (PRs #3231, #3232) to remove config package dependencies from protocol implementations is the right direction. But the approach needs more care:

  • Don't silently change error semantics (e.g., LoadRegistries changing from panic to silent skip)
  • Don't break public APIs without deprecation (e.g., WithMethod signature change)
  • Validate configuration at init time (don't remove init-time validation without replacement)
  • Use URL attributes consistently — document which attributes are expected and where they're set

7. Standardize error handling — eliminate panic in library code

The extension package uses panic for missing registrations:

func GetProtocol(name string) base.Protocol {
    if protocols[name] == nil {
        panic("protocol for [" + name + "] is not existing, ...")
    }
}

This is inappropriate for a library/framework. A missing extension should return an error, not crash the entire process. The caller can decide whether to panic.

Rule: Reserve panic for truly unrecoverable programmer errors. Extension lookups should return (T, error).


Priority

Priority Item Impact Effort
P0 Fix dubbo.go:301 wrong lock Crash prevention 1 line
P0 Enable -race in CI Prevent future regressions Config change
P0 extension.Registry[T] Fix 20+ race-prone files Medium (1 week)
P1 URL DeleteAttribute + cleanup Memory leak prevention Small
P1 Fix RouterChain.Route lockless read Crash prevention Small
P1 Return copies from public getters Crash prevention Small
P2 URL redesign Long-term maintainability Large
P2 Lifecycle interface Leak prevention Medium
P2 Eliminate panics API quality Medium

Happy to help with implementation of any of these items. Related issues: #3246, #3247.

Metadata

Metadata

Assignees

No one assigned

    Labels

    improveRefactor or improve

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions