Fix graceful shutdown race condition #362

rslota · 2025-07-30T10:55:30Z

Currently, when Topology is shutting down, it calls Terminator.trap_exit/1 to allow it to handle terminate/2 callback and shutdown producers gracefully. However, Terminator.trap_exit/1 is using GenServer.cast/2 which makes it fully asynchronous, making it possible for Topology's terminate callback to return before Terminator starts trapping exits, allowing Terminator to be shut down without putting producers into "draining" state.

In my system when we're running ~50-100 Broadway instances we're seeing ~50% of them shutting down properly and the rest of them get stuck with producers going full blast unaware of the shutdown.

This change simply changes Terminator.trap_exit/1 to use GenServer.call/2 instead of GenServer.cast/2 to make it fully synchronous, which fixed the issue. Since I don't know Broadway internals at all, please let me know if there is a better way to fix this.

Currently, when `Topology` is shutting down, it calls `Terminator.trap_exit/1` to allow it to handle `terminate/2` callback and shutdown producers gracefully. However, `Terminator.trap_exit/1` is using `GenServer.cast/2` which makes it fully asynchronous, making it possible for `Topology`'s terminate callback to return before `Terminator` starts trapping exits, allowing `Terminator` to be shut down without putting producers into "draining" state. In my system when we're running ~50-100 Broadway instances we're seeing ~50% of them shutting down properly and the rest of them get stuck with producers going full blast unaware of the shutdown. This change simply changes `Terminator.trap_exit/1` to use `GenServer.call/2` instead of `GenServer.cast/2` to make it fully synchronous, which fixed the issue. Since I don't know Broadway internals at all, please let me know if there is a better way to fix this.

josevalim · 2025-07-30T13:59:38Z

💚 💙 💜 💛 ❤️

coveralls · 2025-07-30T13:59:57Z

Pull Request Test Coverage Report for Build 0160e33666b8ae8fdfb245bb0ac3b8d5d0a49817-PR-362

Details

2 of 2 (100.0%) changed or added relevant lines in 1 file are covered.
2 unchanged lines in 2 files lost coverage.
Overall coverage decreased (-0.1%) to 92.982%

Files with Coverage Reduction	New Missed Lines	%
lib/broadway/topology.ex	1	97.01%
lib/broadway/topology/terminator.ex	1	93.33%

Totals
Change from base Build 136bea6786ae1526721a98a93ca9d752543c3a7d:	-0.1%
Covered Lines:	636
Relevant Lines:	684

💛 - Coveralls

josevalim merged commit f52f2f7 into dashbitco:main Jul 30, 2025
0 of 2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix graceful shutdown race condition #362

Fix graceful shutdown race condition #362

Uh oh!

rslota commented Jul 30, 2025

Uh oh!

Uh oh!

josevalim commented Jul 30, 2025

Uh oh!

coveralls commented Jul 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix graceful shutdown race condition #362

Fix graceful shutdown race condition #362

Uh oh!

Conversation

rslota commented Jul 30, 2025

Uh oh!

Uh oh!

josevalim commented Jul 30, 2025

Uh oh!

coveralls commented Jul 30, 2025

Pull Request Test Coverage Report for Build 0160e33666b8ae8fdfb245bb0ac3b8d5d0a49817-PR-362

Details

💛 - Coveralls

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants