Commit 5bae582
committed
KAFKA-19905: Fix tight reconnection loop during shutdown
This patch fixes a tight broker to controller reconnection loop that may happen during shutdown.
1. Node 1 and 2 (brokers) request controlled shutdown
2. Controller grants the shutdown
3. Controller itself shuts down (RaftManager shutdown)
4. Node 1 and 2 continue trying to heartbeat to the now-dead controller
5. They get stuck in this reconnection loop because the NodeToControllerRequestThread is still running and hasn't been shut down properly
The reconnection loop goes on for exactly 5 minutes, which is the shutdown timeout hard coded in KafkaBroker trait.
This is what I have from another test logs for one of the brokers:
SIGTERM received: 14:39:46,282
Actual shutdown completed: 14:44:46,385
Time elapsed: 5 minutes and 0.103 seconds (approximately 5 minutes)
I acknowledge that this is unlikely to happen with brokers running on different machine, but not so unlikely when running tests locally on a single physical machine.
Signed-off-by: Federico Valeri <[email protected]>1 parent 2dffe32 commit 5bae582
File tree
1 file changed
+1
-1
lines changed- core/src/main/scala/kafka/server
1 file changed
+1
-1
lines changedLines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
220 | 220 | | |
221 | 221 | | |
222 | 222 | | |
223 | | - | |
| 223 | + | |
224 | 224 | | |
225 | 225 | | |
226 | 226 | | |
| |||
0 commit comments