Skip to content

Conversation

@rslota
Copy link
Contributor

@rslota rslota commented Jul 30, 2025

Currently, when Topology is shutting down, it calls Terminator.trap_exit/1 to allow it to handle terminate/2 callback and shutdown producers gracefully. However, Terminator.trap_exit/1 is using GenServer.cast/2 which makes it fully asynchronous, making it possible for Topology's terminate callback to return before Terminator starts trapping exits, allowing Terminator to be shut down without putting producers into "draining" state.

In my system when we're running ~50-100 Broadway instances we're seeing ~50% of them shutting down properly and the rest of them get stuck with producers going full blast unaware of the shutdown.

This change simply changes Terminator.trap_exit/1 to use GenServer.call/2 instead of GenServer.cast/2 to make it fully synchronous, which fixed the issue. Since I don't know Broadway internals at all, please let me know if there is a better way to fix this.

Currently, when `Topology` is shutting down, it calls `Terminator.trap_exit/1` to allow it to handle `terminate/2` callback and shutdown producers gracefully.
However, `Terminator.trap_exit/1` is using `GenServer.cast/2` which makes it fully asynchronous, making it possible for `Topology`'s
terminate callback to return before `Terminator` starts trapping exits, allowing `Terminator` to be shut down without putting producers into "draining" state.

In my system when we're running ~50-100 Broadway instances we're seeing ~50% of them shutting down properly and the rest of them get stuck with producers
going full blast unaware of the shutdown.

This change simply changes `Terminator.trap_exit/1` to use `GenServer.call/2` instead of `GenServer.cast/2` to make it fully synchronous, which fixed the issue.
Since I don't know Broadway internals at all, please let me know if there is a better way to fix this.
@josevalim josevalim merged commit f52f2f7 into dashbitco:main Jul 30, 2025
0 of 2 checks passed
@josevalim
Copy link
Member

💚 💙 💜 💛 ❤️

@coveralls
Copy link

Pull Request Test Coverage Report for Build 0160e33666b8ae8fdfb245bb0ac3b8d5d0a49817-PR-362

Details

  • 2 of 2 (100.0%) changed or added relevant lines in 1 file are covered.
  • 2 unchanged lines in 2 files lost coverage.
  • Overall coverage decreased (-0.1%) to 92.982%

Files with Coverage Reduction New Missed Lines %
lib/broadway/topology.ex 1 97.01%
lib/broadway/topology/terminator.ex 1 93.33%
Totals Coverage Status
Change from base Build 136bea6786ae1526721a98a93ca9d752543c3a7d: -0.1%
Covered Lines: 636
Relevant Lines: 684

💛 - Coveralls

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants