Skip to content

Conversation

@garlick
Copy link
Member

@garlick garlick commented Dec 2, 2025

Problem: #7109 moved all event publishing from the broker to the overlay subsystem so that the peer multicast portion could eventually run in separate thread. Unfortunately, moving all event publication machinery to overlay creates an unnecessary requirement that overlay be running on a size=1 instance, and will complicate flux module reload overlay, should that be useful.

Leave the resource intensive event mcast code in overlay but move the rest of it back into the broker.

TL;DR

Recall there are two ways that event messages are routed per RFC 3:

  1. A bare event message is forwarded upstream on the TBON and published when it arrives on rank 0.

  2. An event message is base64 encoded and encapsulated in a request message that is sent to rank 0, where it is published.

Publication occurs only on rank 0 and consists of assigning a monotonically increasing sequence number, distributing the message to local broker and module subscribers, and sending the event message to the overlay via the interthread message channel for further distribution.

The overlay now routes events as follows:

  • If received from the local broker on the interthread channel: On rank 0, messages are mcast to all children. On rank > 0, messages are forwarded to the parent for publication.

  • If received from an overlay child: On rank 0, messages are forwarded to the local broker for publication. On rank > 0, message are forwarded to the parent for publication

  • If received from the overlay parent (rank > 0), messages are mcast to all children AND sent to the local broker on the interthread channel for distribution to local broker and module subscribers.

Update the overlay unit test that were exercising full event publication. Update some event sharness tests that used the other RPC name.

Problem: flux-framework#7109 moved all event publishing
from the broker to the overlay subsystem so that the peer multicast
portion could eventually run in separate thread.  Unfortunately,
moving all event publication machinery to overlay creates an
unnecessary requirement that overlay be running on a size=1 instance,
and will complicate 'flux module reload overlay', should that be
useful.

Leave the resource intensive event mcast code in overlay but
move the rest of it back into the broker.

TL;DR

Recall there are two ways that event messages are routed per RFC 3:

1) A bare event message is forwarded upstream on the TBON and published
when it arrives on rank 0.

2) An event message is base64 encoded and encapsulated in a request
message that is sent to rank 0, where it is published.

Publication occurs only on rank 0 and consists of assigning a monotonically
increasing sequence number, distributing the message to local broker
and module subscribers, and sending the event message to the overlay via
the interthread message channel for further distribution.

The overlay now routes events as follows:

- If received from the local broker on the interthread channel:
  On rank 0, messages are mcast to all children.
  On rank > 0, messages are forwarded to the parent for publication.

- If received from an overlay child:
  On rank 0, messages are forwarded to the local broker for publication.
  On rank > 0, message are forwarded to the parent for publication

- If received from the overlay parent (rank > 0), messages are mcast
  to all children AND sent to the local broker on the interthread channel
  for distribution to local broker and module subscribers.

Update the overlay unit test that were exercising full event publication.
Update some event sharness tests that used the other RPC name.
@codecov
Copy link

codecov bot commented Dec 3, 2025

Codecov Report

❌ Patch coverage is 78.26087% with 25 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.66%. Comparing base (609722a) to head (aace27d).
⚠️ Report is 8 commits behind head on master.

Files with missing lines Patch % Lines
src/broker/broker.c 77.14% 24 Missing ⚠️
src/broker/overlay.c 88.88% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7221      +/-   ##
==========================================
+ Coverage   83.64%   83.66%   +0.01%     
==========================================
  Files         554      554              
  Lines       92467    92488      +21     
==========================================
+ Hits        77345    77380      +35     
+ Misses      15122    15108      -14     
Files with missing lines Coverage Δ
src/broker/modhash.c 76.78% <100.00%> (ø)
src/common/libflux/event.c 77.66% <ø> (ø)
src/broker/overlay.c 79.77% <88.88%> (+0.77%) ⬆️
src/broker/broker.c 76.01% <77.14%> (+0.53%) ⬆️

... and 9 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant