|
| 1 | +.. github display |
| 2 | + GitHub is NOT the preferred viewer for this file. Please visit |
| 3 | + https://flux-framework.rtfd.io/projects/flux-rfc/en/latest/spec_44.html |
| 4 | +
|
| 5 | +44/FLUx Bootstrap Protocol |
| 6 | +########################## |
| 7 | + |
| 8 | +The FLUx Bootstrap (FLUB) protocol enables a Flux broker to join |
| 9 | +a running Flux instance. |
| 10 | + |
| 11 | +.. list-table:: |
| 12 | + :widths: 25 75 |
| 13 | + |
| 14 | + * - **Name** |
| 15 | + - github.com/flux-framework/rfc/spec_44.rst |
| 16 | + * - **Editor** |
| 17 | + |
| 18 | + * - **State** |
| 19 | + - raw |
| 20 | + |
| 21 | +Language |
| 22 | +******** |
| 23 | + |
| 24 | +.. include:: common/language.rst |
| 25 | + |
| 26 | +Related Standards |
| 27 | +***************** |
| 28 | + |
| 29 | +- :doc:`spec_3` |
| 30 | +- :doc:`spec_6` |
| 31 | +- :doc:`spec_13` |
| 32 | + |
| 33 | +Background |
| 34 | +********** |
| 35 | + |
| 36 | +Flux brokers use a bootstrap mechanism to obtain information needed to |
| 37 | +join a Flux instance, minimally: |
| 38 | + |
| 39 | +- The instance size |
| 40 | + |
| 41 | +- The overlay network topology |
| 42 | + |
| 43 | +- The new broker's rank within the instance |
| 44 | + |
| 45 | +- The overlay network address of the new broker's parent, whose rank |
| 46 | + it calculates from topology |
| 47 | + |
| 48 | +- The public CURVE key of overlay network peers |
| 49 | + |
| 50 | +After bootstrap, the Flux broker joins the overlay network and begins |
| 51 | +running Flux services. |
| 52 | + |
| 53 | +Flux brokers typically bootstrap either from static configuration files |
| 54 | +or from a PMI service, described in :doc:`spec_13`. |
| 55 | + |
| 56 | +A system instance bootstraps from configuration files that are replicated |
| 57 | +on disk across a cluster. Flux is expected to begin operations with only |
| 58 | +the rank 0 broker online. Other brokers may be started at any point in |
| 59 | +time depending on system administration practices. Some fraction are |
| 60 | +anticipated to remain down due to problems. A broker may rejoin a system |
| 61 | +instance after a node crash. All bootstrap information must be known in |
| 62 | +advance to be represented in the configuration files. |
| 63 | + |
| 64 | +Other Flux instances, for example a Flux-launched batch job or Slurm-launched |
| 65 | +Flux instance, bootstrap from a PMI service. Brokers exchange ephemeral |
| 66 | +network addresses and CURVE public keys via the PMI protocol. All broker |
| 67 | +ranks must be online to participate in the exchange. After that, a |
| 68 | +PMI-bootstrapped instance may be set up to tolerate the loss of non-critical |
| 69 | +brokers, but since the PMI exchange is over, there is no mechanism for lost |
| 70 | +nodes to rejoin the instance. |
| 71 | + |
| 72 | +The following use cases arise that cannot be easily handled by configuration |
| 73 | +files or PMI: |
| 74 | + |
| 75 | +#. Grow a running Flux instance by adding nodes whose identities are |
| 76 | + not known in advance. |
| 77 | + |
| 78 | +#. Replace a crashed node in a PMI-bootstrapped Flux instance. |
| 79 | + |
| 80 | +#. Starting Flux in an environment such as the cloud where PMI is |
| 81 | + unavailable and bootstrap information is not known a priori. |
| 82 | + |
| 83 | +FLUB is a bootstrap mechanism that solves those problems. It is intended |
| 84 | +to be used after part of the instance has already bootstrapped with one |
| 85 | +of the other methods. |
| 86 | + |
| 87 | +Other Considerations |
| 88 | +==================== |
| 89 | + |
| 90 | +Unlike the brokers of a Flux system instance or an instance launched in |
| 91 | +parallel where the broker command lines are all the same, a broker added |
| 92 | +later may have a different command line. Therefore, select broker |
| 93 | +attributes that may have been set on the original command line must |
| 94 | +be shared with the new broker. |
| 95 | + |
| 96 | +Similarly, a new broker might not have access to the instance configuration |
| 97 | +files, so the instance's configuration object must be shared. |
| 98 | + |
| 99 | +The ``size`` broker attribute is a constant value per RFC 3, and this constancy |
| 100 | +is a deep assumption in the code base. However, it is already a de facto |
| 101 | +maximum rather than absolute size since all the brokers are not required |
| 102 | +to be online for the duration of the instance. A change for FLUB is that |
| 103 | +the ``size`` now may be set to a value that exceeds the original bootstrap |
| 104 | +size to allow room for expansion. The additional ranks are eligible for |
| 105 | +FLUB bootstrap. |
| 106 | + |
| 107 | +To allow crashed nodes to be replaced with new ones, ranks that go offline |
| 108 | +and were bootstrapped with PMI or FLUB are also made eligible for FLUB |
| 109 | +replacement. Ranks that were bootstrapped from configuration are not |
| 110 | +eligible for replacement. |
| 111 | + |
| 112 | +Caveats |
| 113 | +======= |
| 114 | + |
| 115 | +The following areas are problematic and may require further design: |
| 116 | + |
| 117 | +The ``hostlist`` broker attribute is currently a constant value set following |
| 118 | +the initial bootstrap, which enables it to be cached after first access. |
| 119 | +Some of the code that uses it (such as log message generation) relies on the |
| 120 | +fact that fetching the attribute does not trigger a synchronous RPC. For now |
| 121 | +we add placeholder hostnames to the hostlist when the instance size is greater |
| 122 | +than the bootstrap size and leave the value constant so it can be cached. |
| 123 | + |
| 124 | +The ``broker.mapping`` broker attribute will only include the mapping of the |
| 125 | +initial set of broker ranks. |
| 126 | + |
| 127 | +Goals |
| 128 | +***** |
| 129 | + |
| 130 | +- New brokers MUST run as the instance owner. |
| 131 | + |
| 132 | +- New brokers MUST use a secure mechanism to connect to the Flux instance. |
| 133 | + |
| 134 | +- Select broker attributes set on the original command line SHOULD be shared |
| 135 | + with the new broker. |
| 136 | + |
| 137 | +- The instance configuration object SHOULD be shared with the new broker. |
| 138 | + |
| 139 | +- The design SHOULD NOT impact the existing code base more than necessary. |
| 140 | + |
| 141 | +The following are purposefully left undefined by this specification: |
| 142 | + |
| 143 | +- How the new broker is launched. |
| 144 | + |
| 145 | +- How the new broker's resources are added to the instance resource inventory. |
| 146 | + |
| 147 | +- How the scheduler is notified of resource inventory changes. |
| 148 | + |
| 149 | + |
| 150 | +Implementation |
| 151 | +************** |
| 152 | + |
| 153 | +A Flux broker wishing to join a Flux instance MUST obtain a valid remote |
| 154 | +URI for any online rank. With this URI, the broker SHALL connect to the |
| 155 | +instance and make two RPCs in succession: |
| 156 | + |
| 157 | +getinfo |
| 158 | +======= |
| 159 | + |
| 160 | +.. object:: overlay.flub-getinfo request |
| 161 | + |
| 162 | + The request SHALL be sent to rank 0. |
| 163 | + |
| 164 | + Its payload SHALL contain an empty object. |
| 165 | + |
| 166 | +.. object:: overlay.flub-getinfo response |
| 167 | + |
| 168 | + The response SHALL consist of a JSON object with the following keys |
| 169 | + |
| 170 | + .. object:: rank |
| 171 | + |
| 172 | + (*integer*, REQUIRED) The rank that is assigned to the new broker. |
| 173 | + |
| 174 | + .. object:: size |
| 175 | + |
| 176 | + (*integer*, REQUIRED) The instance size. |
| 177 | + |
| 178 | + .. object:: attrs |
| 179 | + |
| 180 | + (*object*, REQUIRED) An object containing key-value pairs representing |
| 181 | + select broker attributes (see below). All values SHALL have a string type. |
| 182 | + |
| 183 | + .. object:: config |
| 184 | + |
| 185 | + (*object*, REQUIRED) The entire configuration object. |
| 186 | + |
| 187 | + |
| 188 | +key exchange |
| 189 | +============ |
| 190 | + |
| 191 | +.. object:: overlay.flub-kex request |
| 192 | + |
| 193 | + The request SHALL be sent to the overlay parent rank. The parent rank |
| 194 | + SHALL be calculated using information received in the previous RPC. |
| 195 | + |
| 196 | + The request SHALL consist of a JSON object with the following keys |
| 197 | + |
| 198 | + .. object:: name |
| 199 | + |
| 200 | + (*string*, REQUIRED) The new broker CURVE certificate name. |
| 201 | + |
| 202 | + .. object:: pubkey |
| 203 | + |
| 204 | + (*string*, REQUIRED) The new broker CURVE certificate public key. |
| 205 | + |
| 206 | +.. object:: overlay.flub-kex response |
| 207 | + |
| 208 | + The response SHALL consist of a JSON object with the following keys: |
| 209 | + |
| 210 | + .. object:: name |
| 211 | + |
| 212 | + (*string*, REQUIRED) The parent broker CURVE certificate name. |
| 213 | + |
| 214 | + .. object:: pubkey |
| 215 | + |
| 216 | + (*string*, REQUIRED) The parent broker CURVE certificate public key. |
| 217 | + |
| 218 | + .. object:: uri |
| 219 | + |
| 220 | + (*string*, REQUIRED) The parent broker ZeroMQ overlay endpoint. |
| 221 | + |
| 222 | +Example |
| 223 | +======= |
| 224 | + |
| 225 | +getinfo request |
| 226 | +--------------- |
| 227 | + |
| 228 | +.. code:: json |
| 229 | +
|
| 230 | + {} |
| 231 | +
|
| 232 | +getinfo response |
| 233 | +---------------- |
| 234 | + |
| 235 | +.. code:: json |
| 236 | +
|
| 237 | + { |
| 238 | + "rank": 3, |
| 239 | + "size": 16, |
| 240 | + "attrs": { |
| 241 | + "hostlist": "test[0-2],extra[3-15]", |
| 242 | + "instance-level": "1" |
| 243 | + }, |
| 244 | + "config": {} |
| 245 | + } |
| 246 | +
|
| 247 | +kex request |
| 248 | +----------- |
| 249 | + |
| 250 | +.. code:: json |
| 251 | +
|
| 252 | + { |
| 253 | + "name": "test100", |
| 254 | + "pubkey": "5fH%Tp1DJOO=HMWIx)V4@z%v]AWCoP(qj}Ybvoq1:" |
| 255 | + } |
| 256 | +
|
| 257 | +
|
| 258 | +kex response |
| 259 | +------------ |
| 260 | + |
| 261 | +.. code:: json |
| 262 | +
|
| 263 | + { |
| 264 | + "name": "test1", |
| 265 | + "pubkey": "dFYq0s2}JTE+xGf/UcC$).c!<A00le4)<pMok2t:", |
| 266 | + "uri": "tcp://[::ffff:10.0.2.13]:34061" |
| 267 | + } |
0 commit comments