-
Notifications
You must be signed in to change notification settings - Fork 17
Closed
Labels
verifiedAll test cases were verified successfullyAll test cases were verified successfully
Milestone
Description
When new worker nodes join an NS8 cluster with multiple user domains, the system experiences significant load spikes. Each ldapproxy on newly joined nodes broadcasts a user-domain-changed event for every user domain, causing all applications across all nodes to process these events.
Observed: 2 new nodes × 25 domains × 6 apps = 300 event handler invocations (only 12 should trigger actual service changes)
Root Cause
- ldapproxy generates one event per domain instead of batching
- Event handlers don't filter by node_id, so they process remote events unnecessarily
- Journal analysis:
journalctl --grep 'user-domain-changed is starting' | wc -l→ 300 entries - Worker nodes hit load averages of 4.60+ on 4-CPU systems
- Leader's
alloyprocess consumed high CPU reading excessive journal logs
Proposed Solution
1. Batch events at source (ldapproxy):
- Single domain:
{"node_id": 5, "domain": "example.com"} - Multiple domains:
{"node_id": 5, "domains": ["domain1", "domain2", ...]}
2. Add node filtering in event handlers:
def handle_user_domain_changed(event):
if event.get('node_id') != current_node_id:
return # Ignore remote events
domains = event.get('domains', [event['domain']] if 'domain' in event else [])
for domain in domains:
if domain_is_relevant(domain):
restart_or_reload_services()Components Affected
- ns8-core: ldapproxy event generation
- ns8-mail and other apps with user-domain-changed handlers
See also
Metadata
Metadata
Assignees
Labels
verifiedAll test cases were verified successfullyAll test cases were verified successfully
Type
Projects
Status
Done