Skip to content

Commit f52f2f7

Browse files
authored
Fix graceful shutdown race condition (#362)
Currently, when `Topology` is shutting down, it calls `Terminator.trap_exit/1` to allow it to handle `terminate/2` callback and shutdown producers gracefully. However, `Terminator.trap_exit/1` is using `GenServer.cast/2` which makes it fully asynchronous, making it possible for `Topology`'s terminate callback to return before `Terminator` starts trapping exits, allowing `Terminator` to be shut down without putting producers into "draining" state. In my system when we're running ~50-100 Broadway instances we're seeing ~50% of them shutting down properly and the rest of them get stuck with producers going full blast unaware of the shutdown. This change simply changes `Terminator.trap_exit/1` to use `GenServer.call/2` instead of `GenServer.cast/2` to make it fully synchronous, which fixed the issue.
1 parent 136bea6 commit f52f2f7

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

lib/broadway/topology/terminator.ex

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ defmodule Broadway.Topology.Terminator do
99

1010
@spec trap_exit(GenServer.server()) :: :ok
1111
def trap_exit(terminator) do
12-
GenServer.cast(terminator, :trap_exit)
12+
GenServer.call(terminator, :trap_exit)
1313
end
1414

1515
@impl true
@@ -24,9 +24,9 @@ defmodule Broadway.Topology.Terminator do
2424
end
2525

2626
@impl true
27-
def handle_cast(:trap_exit, state) do
27+
def handle_call(:trap_exit, _from, state) do
2828
Process.flag(:trap_exit, true)
29-
{:noreply, state}
29+
{:reply, :ok, state}
3030
end
3131

3232
@impl true

0 commit comments

Comments
 (0)