Commit accaa3d
committed
Count the connection failure as the condition of quarantine (#4727)
* Count the connection failure as the condition of quarantine
---
### Motivation
Currently, the BookieClient quarantine mechanism primarily triggers based on read and write error responses from Bookies. However, in multi-region deployments, a common failure mode is the Network Partition or DNS Resolution Failure at the Region level.
In such scenarios:
A Bookie remains registered in ZooKeeper (it can still heartbeat to its local ZK observer).
The Client (Broker) cannot resolve the Bookie's IP or establish a TCP connection.
The EnsemblePlacementPolicy (especially RegionAwareEnsemblePlacementPolicy) sees the Bookie as "Available" and repeatedly selects it to satisfy minRack or E/Qw constraints.
The LedgerHandle fails to write because it cannot initialize a connection handle, triggering an Ensemble Change.
Because the connection failure didn't trigger a quarantine, the placement policy picks the same problematic Bookie again in the next iteration.
This creates an infinite Ensemble Change loop, causing the Ledger write to hang indefinitely and bloating the Ledger metadata in ZooKeeper with thousands of segments.
* Add configuration to control the behavior
(cherry picked from commit 497aa4e)1 parent 1001f6a commit accaa3d
File tree
2 files changed
+26
-0
lines changed- bookkeeper-server/src/main/java/org/apache/bookkeeper
- conf
- proto
2 files changed
+26
-0
lines changedLines changed: 23 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
140 | 140 | | |
141 | 141 | | |
142 | 142 | | |
| 143 | + | |
143 | 144 | | |
144 | 145 | | |
145 | 146 | | |
| |||
1479 | 1480 | | |
1480 | 1481 | | |
1481 | 1482 | | |
| 1483 | + | |
| 1484 | + | |
| 1485 | + | |
| 1486 | + | |
| 1487 | + | |
| 1488 | + | |
| 1489 | + | |
| 1490 | + | |
| 1491 | + | |
| 1492 | + | |
| 1493 | + | |
| 1494 | + | |
| 1495 | + | |
| 1496 | + | |
| 1497 | + | |
| 1498 | + | |
| 1499 | + | |
| 1500 | + | |
| 1501 | + | |
| 1502 | + | |
| 1503 | + | |
| 1504 | + | |
1482 | 1505 | | |
1483 | 1506 | | |
1484 | 1507 | | |
| |||
Lines changed: 3 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1818 | 1818 | | |
1819 | 1819 | | |
1820 | 1820 | | |
| 1821 | + | |
| 1822 | + | |
| 1823 | + | |
1821 | 1824 | | |
1822 | 1825 | | |
1823 | 1826 | | |
| |||
0 commit comments