Skip to content

Commit 5bae582

Browse files
committed
KAFKA-19905: Fix tight reconnection loop during shutdown
This patch fixes a tight broker to controller reconnection loop that may happen during shutdown. 1. Node 1 and 2 (brokers) request controlled shutdown 2. Controller grants the shutdown 3. Controller itself shuts down (RaftManager shutdown) 4. Node 1 and 2 continue trying to heartbeat to the now-dead controller 5. They get stuck in this reconnection loop because the NodeToControllerRequestThread is still running and hasn't been shut down properly The reconnection loop goes on for exactly 5 minutes, which is the shutdown timeout hard coded in KafkaBroker trait. This is what I have from another test logs for one of the brokers: SIGTERM received: 14:39:46,282 Actual shutdown completed: 14:44:46,385 Time elapsed: 5 minutes and 0.103 seconds (approximately 5 minutes) I acknowledge that this is unlikely to happen with brokers running on different machine, but not so unlikely when running tests locally on a single physical machine. Signed-off-by: Federico Valeri <[email protected]>
1 parent 2dffe32 commit 5bae582

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

core/src/main/scala/kafka/server/NodeToControllerChannelManager.scala

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -220,7 +220,7 @@ class NodeToControllerRequestThread(
220220
initialNetworkClient,
221221
Math.min(Int.MaxValue, Math.min(config.controllerSocketTimeoutMs, retryTimeoutMs)).toInt,
222222
time,
223-
false
223+
true
224224
) with Logging {
225225

226226
this.logIdent = logPrefix

0 commit comments

Comments
 (0)