|
| 1 | +.. github display |
| 2 | + GitHub is NOT the preferred viewer for this file. Please visit |
| 3 | + https://flux-framework.rtfd.io/projects/flux-rfc/en/latest/spec_28.html |
| 4 | +
|
| 5 | +28/Flux Resource Acquisition Protocol Version 1 |
| 6 | +=============================================== |
| 7 | + |
| 8 | +This specification describes the Flux service that schedulers use to |
| 9 | +acquire exclusive access to resources and monitor their ongoing |
| 10 | +availability. |
| 11 | + |
| 12 | +- Name: github.com/flux-framework/rfc/spec_28.rst |
| 13 | + |
| 14 | +- Editor: Jim Garlick < [email protected]> |
| 15 | + |
| 16 | +- State: raw |
| 17 | + |
| 18 | + |
| 19 | +Language |
| 20 | +-------- |
| 21 | + |
| 22 | +The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", |
| 23 | +"SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to |
| 24 | +be interpreted as described in `RFC 2119 <http://tools.ietf.org/html/rfc2119>`__. |
| 25 | + |
| 26 | + |
| 27 | +Related Standards |
| 28 | +----------------- |
| 29 | + |
| 30 | +- :doc:`20/Resource Set Specification Version 1 <spec_20>` |
| 31 | + |
| 32 | +- :doc:`22/Idset String Representation <spec_22>` |
| 33 | + |
| 34 | +- :doc:`27/Flux Resource Allocation Protocol Version 1 <spec_27>` |
| 35 | + |
| 36 | + |
| 37 | +Background |
| 38 | +---------- |
| 39 | + |
| 40 | +A Flux instance manages a set of resources. This resource set may be obtained |
| 41 | +from a configuration file, dynamically discovered, or assigned by the enclosing |
| 42 | +instance. Resources may be excluded from scheduling by configuration, made |
| 43 | +unavailable temporarily by administrative control, or fail unexpectedly. The |
| 44 | +resource acquisition protocol allows the scheduler to track the set of |
| 45 | +resources available for scheduling and monitor ongoing availability, without |
| 46 | +dealing directly with these details, which are managed by the flux-core |
| 47 | +*resource* module. |
| 48 | + |
| 49 | +Version 1 of this protocol maps chunks of resources to integer *execution |
| 50 | +targets*, and reports availability at the target level. All resources are |
| 51 | +mapped to targets, and all the resources associated with a given target are |
| 52 | +either up or down as an atomic unit. Execution targets map directly to |
| 53 | +the *rank* idset under *R_lite* in the RFC 20 resource object *execution* |
| 54 | +section. |
| 55 | + |
| 56 | +A streaming ``resource.acquire`` RPC is offered by the flux-core resource |
| 57 | +module to the scheduler. The responses to this RPC define the resource |
| 58 | +set available for scheduling, and mark targets *up* or *down* as |
| 59 | +availability changes. |
| 60 | + |
| 61 | +Version 1 of this protocol supports a static resource set per Flux instance. |
| 62 | +Resource *grow* and *shrink* are to be handled by a future protocol revision. |
| 63 | + |
| 64 | + |
| 65 | +Design Criteria |
| 66 | +--------------- |
| 67 | + |
| 68 | +- Provide resource discovery service to scheduler implementations. |
| 69 | + |
| 70 | +- Allow the scheduler to determine satisfiability of resource requests |
| 71 | + independent of resource availability. |
| 72 | + |
| 73 | +- Support monitoring of available execution targets. |
| 74 | + |
| 75 | +- Support administrative drain of execution targets. |
| 76 | + |
| 77 | +- Support administrative exclusion of execution targets. |
| 78 | + |
| 79 | + |
| 80 | +Implementation |
| 81 | +-------------- |
| 82 | + |
| 83 | +The scheduler SHALL send a ``resource.acquire`` streaming RPC request at |
| 84 | +initialization to obtain resources to be used for scheduling and monitor |
| 85 | +changes in status. |
| 86 | + |
| 87 | + |
| 88 | +Acquire Request |
| 89 | +^^^^^^^^^^^^^^^ |
| 90 | + |
| 91 | +The ``resource.acquire`` request has no payload. |
| 92 | + |
| 93 | + |
| 94 | +Initial Acquire Response |
| 95 | +^^^^^^^^^^^^^^^^^^^^^^^^ |
| 96 | + |
| 97 | +The initial ``resource.acquire`` response SHALL include the following keys: |
| 98 | + |
| 99 | +resources |
| 100 | + (object) RFC 20 (R version 1) resource object that contains the full resource |
| 101 | + inventory, less execution targets excluded by configuration. The scheduler |
| 102 | + MAY use this set to determine the general satisfiability of job requests. |
| 103 | + |
| 104 | +up |
| 105 | + (string) RFC 22 idset of execution targets in ``resources`` that are |
| 106 | + initially available. The scheduler SHALL only allocate the resources |
| 107 | + associated with an execution target to jobs if the target is up. |
| 108 | + |
| 109 | +Example: |
| 110 | + |
| 111 | +.. code:: json |
| 112 | +
|
| 113 | + { |
| 114 | + "resources": { |
| 115 | + "version": 1, |
| 116 | + "execution": { |
| 117 | + "R_lite": [ |
| 118 | + { |
| 119 | + "rank": "0-5", |
| 120 | + "children": { |
| 121 | + "core": "0-5", |
| 122 | + "gpu": "0" |
| 123 | + } |
| 124 | + } |
| 125 | + ], |
| 126 | + "starttime": 0, |
| 127 | + "expiration": 0, |
| 128 | + "nodelist": [ |
| 129 | + "host[0-5]" |
| 130 | + ] |
| 131 | + } |
| 132 | + }, |
| 133 | + "up": "0-2" |
| 134 | + } |
| 135 | +
|
| 136 | +
|
| 137 | +Additional Acquire Responses |
| 138 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 139 | + |
| 140 | +Subsequent ``resource.acquire`` responses SHALL include one or more |
| 141 | +of the following OPTIONAL keys: |
| 142 | + |
| 143 | +up |
| 144 | + (string) RFC 22 idset of execution targets that should be marked available |
| 145 | + for scheduling. The idset only contains targets that are transitioning, |
| 146 | + not the full set of available targets. |
| 147 | + |
| 148 | +down |
| 149 | + (string) RFC 22 idset of execution targets that should be marked unavailable |
| 150 | + for scheduling. The idset only contains targets that are transitioning, |
| 151 | + not the full set of unavailable targets. |
| 152 | + |
| 153 | + |
| 154 | +Example: |
| 155 | + |
| 156 | +.. code:: json |
| 157 | +
|
| 158 | + { |
| 159 | + "up": "3-6", |
| 160 | + "down": "2" |
| 161 | + } |
| 162 | +
|
| 163 | +If down resources are assigned to a job, the scheduler SHALL NOT raise an |
| 164 | +exception on the job. The execution system takes the active role in handling |
| 165 | +failures in this case. Eventually the scheduler will receive a ``sched.free`` |
| 166 | +request for the offline resources. |
| 167 | + |
| 168 | +.. note:: |
| 169 | + *down* encompasses both crashed and drained execution targets. |
| 170 | + The scheduler handles both cases the same, so they are not differentiated |
| 171 | + in the protocol. |
| 172 | + |
| 173 | +Error Response |
| 174 | +^^^^^^^^^^^^^^ |
| 175 | + |
| 176 | +If an error response is returned to ``resource.acquire``, the scheduler |
| 177 | +should log the error and exit the reactor, as failure indicates either a |
| 178 | +catastrophic error, a failure to acquire any resources, or a failure to |
| 179 | +conform to this protocol. |
| 180 | + |
| 181 | + |
| 182 | +Disconnect Request |
| 183 | +^^^^^^^^^^^^^^^^^^ |
| 184 | + |
| 185 | +If the scheduler is unloaded, a disconnect request is automatically sent to |
| 186 | +the flux-core resource module. This cancels the ``resource.acquire`` request |
| 187 | +and makes resources available for re-acquisition. |
| 188 | + |
| 189 | +Running jobs are unaffected. |
| 190 | + |
| 191 | +.. note:: |
| 192 | + This behavior on disconnect is intended to support reloading the |
| 193 | + scheduler on a live system without impacting the running workload. |
| 194 | + |
| 195 | + Since resources may remain allocated to jobs after a disconnect, it is |
| 196 | + presumed that re-acquisition of resources will be accompanied by a |
| 197 | + ``job-manager.hello`` request, as described in RFC 27, to rediscover |
| 198 | + these allocations. |
0 commit comments