Skip to content

Commit f8955d4

Browse files
committed
rfc44: add RFC for FLUB bootstrap
Problem: there is no documentation for the FLUB protocol proposed in flux-framework/flux-core#5184 Add a new RFC.
1 parent 040f492 commit f8955d4

File tree

4 files changed

+278
-0
lines changed

4 files changed

+278
-0
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,7 @@ Table of Contents
5353
- [41/Job Information Service](spec_41.rst)
5454
- [42/Subprocess Server Protocol](spec_42.rst)
5555
- [43/Job List Service](spec_43.rst)
56+
- [44/FLUB: FLUx Bootstrap Protocol](spec_44.rst)
5657

5758
Build Instructions
5859
------------------

index.rst

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -283,6 +283,12 @@ standard I/O management of remote processes.
283283

284284
The Flux Job List Service provides read-only summary information for jobs.
285285

286+
:doc:`spec_44`
287+
~~~~~~~~~~~~~~
288+
289+
The FLUx Bootstrap Protocol enables a Flux broker to join
290+
t a running Flux instance.
291+
286292
.. Each file must appear in a toctree
287293
.. toctree::
288294
:hidden:
@@ -328,3 +334,4 @@ The Flux Job List Service provides read-only summary information for jobs.
328334
spec_41
329335
spec_42
330336
spec_43
337+
spec_44

spec_44.rst

Lines changed: 267 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,267 @@
1+
.. github display
2+
GitHub is NOT the preferred viewer for this file. Please visit
3+
https://flux-framework.rtfd.io/projects/flux-rfc/en/latest/spec_44.html
4+
5+
44/FLUx Bootstrap Protocol
6+
##########################
7+
8+
The FLUx Bootstrap (FLUB) protocol enables a Flux broker to join
9+
a running Flux instance.
10+
11+
.. list-table::
12+
:widths: 25 75
13+
14+
* - **Name**
15+
- github.com/flux-framework/rfc/spec_44.rst
16+
* - **Editor**
17+
- Jim Garlick <[email protected]>
18+
* - **State**
19+
- raw
20+
21+
Language
22+
********
23+
24+
.. include:: common/language.rst
25+
26+
Related Standards
27+
*****************
28+
29+
- :doc:`spec_3`
30+
- :doc:`spec_6`
31+
- :doc:`spec_13`
32+
33+
Background
34+
**********
35+
36+
Flux brokers use a bootstrap mechanism to obtain information needed to
37+
join a Flux instance, minimally:
38+
39+
- The instance size
40+
41+
- The overlay network topology
42+
43+
- The new broker's rank within the instance
44+
45+
- The overlay network address of the new broker's parent, whose rank
46+
it calculates from topology
47+
48+
- The public CURVE key of overlay network peers
49+
50+
After bootstrap, the Flux broker joins the overlay network and begins
51+
running Flux services.
52+
53+
Flux brokers typically bootstrap either from static configuration files
54+
or from a PMI service, described in :doc:`spec_13`.
55+
56+
A system instance bootstraps from configuration files that are replicated
57+
on disk across a cluster. Flux is expected to begin operations with only
58+
the rank 0 broker online. Other brokers may be started at any point in
59+
time depending on system administration practices. Some fraction are
60+
anticipated to remain down due to problems. A broker may rejoin a system
61+
instance after a node crash. All bootstrap information must be known in
62+
advance to be represented in the configuration files.
63+
64+
Other Flux instances, for example a Flux-launched batch job or Slurm-launched
65+
Flux instance, bootstrap from a PMI service. Brokers exchange ephemeral
66+
network addresses and CURVE public keys via the PMI protocol. All broker
67+
ranks must be online to participate in the exchange. After that, a
68+
PMI-bootstrapped instance may be set up to tolerate the loss of non-critical
69+
brokers, but since the PMI exchange is over, there is no mechanism for lost
70+
nodes to rejoin the instance.
71+
72+
The following use cases arise that cannot be easily handled by configuration
73+
files or PMI:
74+
75+
#. Grow a running Flux instance by adding nodes whose identities are
76+
not known in advance.
77+
78+
#. Replace a crashed node in a PMI-bootstrapped Flux instance.
79+
80+
#. Starting Flux in an environment such as the cloud where PMI is
81+
unavailable and bootstrap information is not known a priori.
82+
83+
FLUB is a bootstrap mechanism that solves those problems. It is intended
84+
to be used after part of the instance has already bootstrapped with one
85+
of the other methods.
86+
87+
Other Considerations
88+
====================
89+
90+
Unlike the brokers of a Flux system instance or an instance launched in
91+
parallel where the broker command lines are all the same, a broker added
92+
later may have a different command line. Therefore, select broker
93+
attributes that may have been set on the original command line must
94+
be shared with the new broker.
95+
96+
Similarly, a new broker might not have access to the instance configuration
97+
files, so the instance's configuration object must be shared.
98+
99+
The ``size`` broker attribute is a constant value per RFC 3, and this constancy
100+
is a deep assumption in the code base. However, it is already a de facto
101+
maximum rather than absolute size since all the brokers are not required
102+
to be online for the duration of the instance. A change for FLUB is that
103+
the ``size`` now may be set to a value that exceeds the original bootstrap
104+
size to allow room for expansion. The additional ranks are eligible for
105+
FLUB bootstrap.
106+
107+
To allow crashed nodes to be replaced with new ones, ranks that go offline
108+
and were bootstrapped with PMI or FLUB are also made eligible for FLUB
109+
replacement. Ranks that were bootstrapped from configuration are not
110+
eligible for replacement.
111+
112+
Caveats
113+
=======
114+
115+
The following areas are problematic and may require further design:
116+
117+
The ``hostlist`` broker attribute is currently a constant value set following
118+
the initial bootstrap, which enables it to be cached after first access.
119+
Some of the code that uses it (such as log message generation) relies on the
120+
fact that fetching the attribute does not trigger a synchronous RPC. For now
121+
we add placeholder hostnames to the hostlist when the instance size is greater
122+
than the bootstrap size and leave the value constant so it can be cached.
123+
124+
The ``broker.mapping`` broker attribute will only include the mapping of the
125+
initial set of broker ranks.
126+
127+
Goals
128+
*****
129+
130+
- New brokers MUST run as the instance owner.
131+
132+
- New brokers MUST use a secure mechanism to connect to the Flux instance.
133+
134+
- Select broker attributes set on the original command line SHOULD be shared
135+
with the new broker.
136+
137+
- The instance configuration object SHOULD be shared with the new broker.
138+
139+
- The design SHOULD NOT impact the existing code base more than necessary.
140+
141+
The following are purposefully left undefined by this specification:
142+
143+
- How the new broker is launched.
144+
145+
- How the new broker's resources are added to the instance resource inventory.
146+
147+
- How the scheduler is notified of resource inventory changes.
148+
149+
150+
Implementation
151+
**************
152+
153+
A Flux broker wishing to join a Flux instance MUST obtain a valid remote
154+
URI for any online rank. With this URI, the broker SHALL connect to the
155+
instance and make two RPCs in succession:
156+
157+
getinfo
158+
=======
159+
160+
.. object:: overlay.flub-getinfo request
161+
162+
The request SHALL be sent to rank 0.
163+
164+
Its payload SHALL contain an empty object.
165+
166+
.. object:: overlay.flub-getinfo response
167+
168+
The response SHALL consist of a JSON object with the following keys
169+
170+
.. object:: rank
171+
172+
(*integer*, REQUIRED) The rank that is assigned to the new broker.
173+
174+
.. object:: size
175+
176+
(*integer*, REQUIRED) The instance size.
177+
178+
.. object:: attrs
179+
180+
(*object*, REQUIRED) An object containing key-value pairs representing
181+
select broker attributes (see below). All values SHALL have a string type.
182+
183+
.. object:: config
184+
185+
(*object*, REQUIRED) The entire configuration object.
186+
187+
188+
key exchange
189+
============
190+
191+
.. object:: overlay.flub-kex request
192+
193+
The request SHALL be sent to the overlay parent rank. The parent rank
194+
SHALL be calculated using information received in the previous RPC.
195+
196+
The request SHALL consist of a JSON object with the following keys
197+
198+
.. object:: name
199+
200+
(*string*, REQUIRED) The new broker CURVE certificate name.
201+
202+
.. object:: pubkey
203+
204+
(*string*, REQUIRED) The new broker CURVE certificate public key.
205+
206+
.. object:: overlay.flub-kex response
207+
208+
The response SHALL consist of a JSON object with the following keys:
209+
210+
.. object:: name
211+
212+
(*string*, REQUIRED) The parent broker CURVE certificate name.
213+
214+
.. object:: pubkey
215+
216+
(*string*, REQUIRED) The parent broker CURVE certificate public key.
217+
218+
.. object:: uri
219+
220+
(*string*, REQUIRED) The parent broker ZeroMQ overlay endpoint.
221+
222+
Example
223+
=======
224+
225+
getinfo request
226+
---------------
227+
228+
.. code:: json
229+
230+
{}
231+
232+
getinfo response
233+
----------------
234+
235+
.. code:: json
236+
237+
{
238+
"rank": 3,
239+
"size": 16,
240+
"attrs": {
241+
"hostlist": "test[0-2],extra[3-15]",
242+
"instance-level": "1"
243+
},
244+
"config": {}
245+
}
246+
247+
kex request
248+
-----------
249+
250+
.. code:: json
251+
252+
{
253+
"name": "test100",
254+
"pubkey": "5fH%Tp1DJOO=HMWIx)V4@z%v]AWCoP(qj}Ybvoq1:"
255+
}
256+
257+
258+
kex response
259+
------------
260+
261+
.. code:: json
262+
263+
{
264+
"name": "test1",
265+
"pubkey": "dFYq0s2}JTE+xGf/UcC$).c!<A00le4)<pMok2t:",
266+
"uri": "tcp://[::ffff:10.0.2.13]:34061"
267+
}

spell.en.pws

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -491,3 +491,6 @@ bitmasks
491491
DoS
492492
lookups
493493
chu
494+
priori
495+
kex
496+
FLUx

0 commit comments

Comments
 (0)