Skip to content

Commit 25c23e1

Browse files
author
Guido Trotter
committed
Fix language on instance numbers in HA docs
Signed-off-by: Guido Trotter <[email protected]>
1 parent 31328ae commit 25c23e1

File tree

1 file changed

+17
-5
lines changed

1 file changed

+17
-5
lines changed

docs/high_availability.md

Lines changed: 17 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -312,13 +312,25 @@ This ensures:
312312
- **Independent processing** - Each instance independently evaluates routing, grouping, and deduplication
313313
- **No single point of failure** - Load balancers introduce a single point of failure
314314
315-
### Cluster Size Recommendations
315+
### Cluster Size Considerations
316316
317-
- **3 instances** - Recommended minimum for production (tolerates 1 failure)
318-
- **5 instances** - For critical environments (tolerates 2 failures)
319-
- **Odd numbers** - Preferred for simpler split-brain scenarios
317+
Since Alertmanager uses gossip without quorum or voting, **any N instances tolerate up to N-1 failures** - as long as one instance is alive, notifications will be sent.
320318
321-
The gossip protocol scales to dozens of instances, but typical deployments use 3-5.
319+
However, cluster size involves tradeoffs:
320+
321+
**Benefits of more instances:**
322+
- Greater resilience to simultaneous failures (hardware, network, datacenter outages)
323+
- Continued operation even during maintenance windows
324+
325+
**Costs of more instances:**
326+
- In case of partitions there will be an increase in duplicate notifications
327+
- More gossip traffic
328+
329+
**Typical deployments:**
330+
- **3-4 instances** - Common for single-datacenter production deployments
331+
- **4-5 instances** - Multi-datacenter or highly critical environments
332+
333+
**Note**: Unlike consensus-based systems (etcd, Raft), odd vs. even cluster sizes make no difference - there is no voting or quorum.
322334
323335
### Monitoring Cluster Health
324336

0 commit comments

Comments
 (0)