Skip to content

Commit b578011

Browse files
authored
Merge pull request #312 from garlick/control_messages
rename keepalives to control messages
2 parents 63339c1 + 186ce15 commit b578011

File tree

4 files changed

+70
-42
lines changed

4 files changed

+70
-42
lines changed

spec_3.rst

Lines changed: 19 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ Goals
3030
-----
3131

3232
The Flux message protocol v1 provides a way for Flux utilities and services to
33-
communicate with one another within the context of a job. It has
33+
communicate with one another within the context of a flux instance. It has
3434
the following specific goals:
3535

3636
- Endpoint-count scalability (e.g. to 100K nodes) through multi-hop
@@ -58,10 +58,10 @@ Background
5858
``flux-broker`` is a message broker daemon for the Flux resource manager
5959
framework. A Flux *instance* is a set of interconnected ``flux-broker`` tasks
6060
that together provide a shared communications substrate for distributed
61-
resource manager services within a job. Services and utilities communicate
62-
by passing messages through the session brokers. There are four
63-
types of messages: events, requests, responses, and keepalives, which
64-
share a common structure described herein.
61+
resource manager services. Services and utilities communicate by passing
62+
messages through the session brokers. There are four types of messages:
63+
events, requests, responses, and control messages, which share a common
64+
structure described herein.
6565

6666
Event messages are published such that they are available to subscribers
6767
throughout the instance. Events are published with a *topic string*
@@ -77,8 +77,9 @@ Responses are optional replies to requests. They follow the ZeroMQ
7777
ROUTER-DEALER message flow, which unwinds the source address route
7878
accumulated by the request, and uses them to select among peers at each hop.
7979

80-
Keepalives are control messages used by one peer to indicate to another
81-
peer that it is still alive when it is not otherwise communicating.
80+
Control messages are used for connection management and status communication
81+
between brokers. Unlike the other message types, they are only used between
82+
directly connected peers, never routed.
8283

8384

8485
Implementation
@@ -242,13 +243,13 @@ ABNF grammar [#f2]_
242243

243244
message = C:request *S:response
244245
/ S:event
245-
/ C:keepalive
246+
/ C:control
246247

247248
; Multi-part ZeroMQ messages
248249
C:request = [routing] topic [payload] PROTO
249250
S:response = [routing] topic [payload] PROTO
250251
S:event = [routing] topic [payload] PROTO
251-
C:keepalive = PROTO
252+
C:control = PROTO
252253

253254
; Route frame stack, ZeroMQ DEALER-ROUTER format
254255
routing = *identity delimiter
@@ -262,12 +263,12 @@ ABNF grammar [#f2]_
262263
payload = *OCTET ; payload ZeroMQ frame
263264

264265
; Protocol frame
265-
PROTO = request / response / event / keepalive
266+
PROTO = request / response / event / control
266267

267268
request = magic version %x01 flags userid rolemask nodeid matchtag
268269
response = magic version %x02 flags userid rolemask errnum matchtag
269270
event = magic version %x04 flags userid rolemask sequence unused
270-
keepalive = magic version %x08 flags userid rolemask errnum status
271+
control = magic version %x08 flags userid rolemask type status
271272

272273
; Constants
273274
magic = %x8E ; magic cookie
@@ -306,9 +307,15 @@ ABNF grammar [#f2]_
306307
; Monotonic sequence number in network byte order
307308
sequence = 4OCTET
308309

310+
; Control message type
311+
type = 4OCTET
312+
313+
; Control message status
314+
status = 4OCTET
315+
309316
; unused 4-byte field
310317
unused = %x00.00.00.00
311318

312319
.. [#f1] `RFC 7159: The JavaScript Object Notation (JSON) Data Interchange Format <https://www.rfc-editor.org/rfc/rfc7159.txt>`__, T. Bray, Google, Inc, March 2014.
313320
314-
.. [#f2] For convenience: the ``C:request``, ``S:response``, ``S:event``, and ``C:keepalive`` ABNF non-terminals refer to ZeroMQ messages, sent by client or server, and built from ordered ZeroMQ message parts (frames). Other non-terminals are built from concatenated ABNF terminals per usual. Thus it is meaningful for ``delimiter``, a message frame, to have zero length, since a zero-length message frame is valid ZMTP.
321+
.. [#f2] For convenience: the ``C:request``, ``S:response``, ``S:event``, and ``C:control`` ABNF non-terminals refer to ZeroMQ messages, sent by client or server, and built from ordered ZeroMQ message parts (frames). Other non-terminals are built from concatenated ABNF terminals per usual. Thus it is meaningful for ``delimiter``, a message frame, to have zero length, since a zero-length message frame is valid ZMTP.

spec_5.rst

Lines changed: 50 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -64,9 +64,9 @@ message handlers for its methods, then run the flux reactor. It should
6464
use event driven (reactive) programming techniques to remain responsive
6565
while juggling work from multiple clients.
6666

67-
Keepalive messages are sent to the broker via pre-registered reactor
67+
Status messages are sent to the broker via pre-registered reactor
6868
watchers to indicate when the module is initializing, running, finalizing,
69-
or exited. At initialization, a module MAY also manually send a keepalive
69+
or exited. At initialization, a module MAY also manually send a status
7070
message to indicate to the broker when initialization is complete. This
7171
provides synchronization to the broker module loader as well as useful
7272
runtime debug information that can be reported by ``flux module list``.
@@ -91,35 +91,59 @@ A broker module SHALL export the following global symbols:
9191
type of error on failure.
9292

9393

94-
Keepalive Values
95-
~~~~~~~~~~~~~~~~
94+
Status Messages
95+
~~~~~~~~~~~~~~~
9696

97-
A broker module SHALL send RFC 3 keepalive messages containing status
98-
integers to the broker over its broker handle. Status integers are
99-
enumerated as follows:
97+
A broker module SHALL be considered to be in one of the following states,
98+
represented by the integer values shown in parenthesis:
10099

101100
- FLUX_MODSTATE_INIT (0) - initializing
102-
103101
- FLUX_MODSTATE_RUNNING (1) - running
104-
105102
- FLUX_MODSTATE_FINALIZING (2) - finalizing
106-
107103
- FLUX_MODSTATE_EXITED (3) - ``mod_main()`` exited
108104

109-
Modules SHALL send a keepalive message of ``FLUX_MODSTATE_RUNNING``
110-
after initialization to notify the broker that the module has started
111-
successfully. In order to ensure this happens for all modules, A keepalive
112-
message SHALL be sent via a pre-registered reactor watcher upon a module's
113-
first entry to the reactor if the module has not otherwise entered the
114-
RUNNING state. In addition, keepalive messages MAY be sent to the broker
115-
at regular intervals. The keepalive ``errnum`` field SHALL be zero except
116-
when ``mod_main()`` returns a value of -1 indicating failure and state
117-
transitions to FLUX_MODSTATE_EXITED. In this case ``errnum`` SHALL be
118-
set to the value of POSIX ``errno`` set by ``mod_main()`` before returning.
105+
Upon loading the module, the broker SHALL initialize the broker state
106+
to ``FLUX_MODSTATE_INIT``.
107+
108+
After initialization is complete, a module SHALL send an RPC to the
109+
``broker.module-status`` service with the FLUX_RPC_NORESPONSE flag to
110+
notify the broker that the module has started successfully. In order to
111+
ensure this happens for all modules, the RPC SHALL be sent via a
112+
pre-registered reactor watcher upon a module's first entry to the reactor
113+
if the module has not already sent the message.
114+
115+
Example payload:
116+
117+
.. code:: json
118+
119+
{
120+
"status":1
121+
}
119122
120-
The broker MAY track the number of session heartbeats since a
121-
module last sent a message and report this as "idle time"
122-
for the module.
123+
After exiting the reactor and before exiting the module thread, the module
124+
SHALL send an RPC to ``broker.module-status`` indicating that it intends to
125+
exit. The module SHALL wait for a response to this message before exiting
126+
``mod_main()``.
127+
128+
Example payload:
129+
130+
.. code:: json
131+
132+
{
133+
"status":2
134+
}
135+
136+
Finally once ``mod_main()`` has exited, the module thread SHALL send an RPC
137+
to ```broker.module-status`` with the FLUX_RPC_NORESPONSE flag including
138+
the error status of the module: zero if ``mod_main()`` exited with a return
139+
code greater than or equal to zero, otherwise the value of ``errno``.
140+
141+
.. code:: json
142+
143+
{
144+
"status":2,
145+
"errnum":0
146+
}
123147
124148
125149
Load Sequence
@@ -137,10 +161,9 @@ Unload Sequence
137161

138162
The broker module loader SHALL send a ``<service>.shutdown`` request to the
139163
module when the module loader receives a ``broker.rmmod`` request for the
140-
module. In response, the broker module SHALL exit ``mod_main()``, send a
141-
keepalive transition to FLUX_MODSTATE_EXITED state, and exit the
142-
module’s thread or process. This final state transition indicates to
143-
the broker that it MAY clean up the module thread.
164+
module. In response, the broker module SHALL exit ``mod_main()``, sending
165+
state transition messages as described above, and exit the module’s thread
166+
or process. The final state transition indicates to the broker that it MAY clean up the module thread.
144167

145168

146169
Built-in Request Handlers

spec_7.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -150,7 +150,7 @@ Examples:
150150
FLUX_MSGTYPE_REQUEST = 0x01,
151151
FLUX_MSGTYPE_RESPONSE = 0x02,
152152
FLUX_MSGTYPE_EVENT = 0x04,
153-
FLUX_MSGTYPE_KEEPALIVE = 0x08,
153+
FLUX_MSGTYPE_CONTROL = 0x08,
154154
FLUX_MSGTYPE_ANY = 0x0f,
155155
FLUX_MSGTYPE_MASK = 0x0f,
156156
};

spell.en.pws

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,6 @@ dlopen
7979
dlsym
8080
insmod
8181
json
82-
keepalive
8382
lsmod
8483
modstate
8584
POSIX
@@ -105,7 +104,6 @@ URI
105104
bitmask
106105
codec
107106
crypto
108-
keepalives
109107
MSGFLAG
110108
resize
111109
scalability

0 commit comments

Comments
 (0)