Skip to content

Commit fc33ec1

Browse files
authored
Merge pull request #308 from garlick/rfc28_v1
rfc28: add resource acquisition RFC
2 parents 67bfc3c + 8ba6fde commit fc33ec1

File tree

4 files changed

+210
-0
lines changed

4 files changed

+210
-0
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@ Table of Contents
3838
- [25/Job Specification Version 1](spec_25.rst)
3939
- [26/Job Dependency Specification](spec_26.rst)
4040
- [27/Flux Resource Allocation Protocol Version 1](spec_27.rst)
41+
- [28/Flux Resource Acquisition Protocol Version 1](spec_28.rst)
4142
- [29/Hostlist Format](spec_29.rst)
4243
- [30/Job Urgency](spec_30.rst)
4344

index.rst

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -198,6 +198,13 @@ for execution.
198198
This specification describes Version 1 of the Flux Resource Allocation
199199
Protocol implemented by the job manager and a compliant Flux scheduler.
200200

201+
:doc:`28/Flux Resource Acquisition Protocol Version 1 <spec_28>`
202+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
203+
204+
This specification describes the Flux service that schedulers use to
205+
acquire exclusive access to resources and monitor their ongoing
206+
availability.
207+
201208
:doc:`29/Hostlist Format <spec_29>`
202209
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
203210

@@ -239,5 +246,6 @@ This specification describes the Flux job urgency parameter.
239246
spec_25
240247
spec_26
241248
spec_27
249+
spec_28
242250
spec_29
243251
spec_30

spec_28.rst

Lines changed: 198 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,198 @@
1+
.. github display
2+
GitHub is NOT the preferred viewer for this file. Please visit
3+
https://flux-framework.rtfd.io/projects/flux-rfc/en/latest/spec_28.html
4+
5+
28/Flux Resource Acquisition Protocol Version 1
6+
===============================================
7+
8+
This specification describes the Flux service that schedulers use to
9+
acquire exclusive access to resources and monitor their ongoing
10+
availability.
11+
12+
- Name: github.com/flux-framework/rfc/spec_28.rst
13+
14+
- Editor: Jim Garlick <[email protected]>
15+
16+
- State: raw
17+
18+
19+
Language
20+
--------
21+
22+
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
23+
"SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to
24+
be interpreted as described in `RFC 2119 <http://tools.ietf.org/html/rfc2119>`__.
25+
26+
27+
Related Standards
28+
-----------------
29+
30+
- :doc:`20/Resource Set Specification Version 1 <spec_20>`
31+
32+
- :doc:`22/Idset String Representation <spec_22>`
33+
34+
- :doc:`27/Flux Resource Allocation Protocol Version 1 <spec_27>`
35+
36+
37+
Background
38+
----------
39+
40+
A Flux instance manages a set of resources. This resource set may be obtained
41+
from a configuration file, dynamically discovered, or assigned by the enclosing
42+
instance. Resources may be excluded from scheduling by configuration, made
43+
unavailable temporarily by administrative control, or fail unexpectedly. The
44+
resource acquisition protocol allows the scheduler to track the set of
45+
resources available for scheduling and monitor ongoing availability, without
46+
dealing directly with these details, which are managed by the flux-core
47+
*resource* module.
48+
49+
Version 1 of this protocol maps chunks of resources to integer *execution
50+
targets*, and reports availability at the target level. All resources are
51+
mapped to targets, and all the resources associated with a given target are
52+
either up or down as an atomic unit. Execution targets map directly to
53+
the *rank* idset under *R_lite* in the RFC 20 resource object *execution*
54+
section.
55+
56+
A streaming ``resource.acquire`` RPC is offered by the flux-core resource
57+
module to the scheduler. The responses to this RPC define the resource
58+
set available for scheduling, and mark targets *up* or *down* as
59+
availability changes.
60+
61+
Version 1 of this protocol supports a static resource set per Flux instance.
62+
Resource *grow* and *shrink* are to be handled by a future protocol revision.
63+
64+
65+
Design Criteria
66+
---------------
67+
68+
- Provide resource discovery service to scheduler implementations.
69+
70+
- Allow the scheduler to determine satisfiability of resource requests
71+
independent of resource availability.
72+
73+
- Support monitoring of available execution targets.
74+
75+
- Support administrative drain of execution targets.
76+
77+
- Support administrative exclusion of execution targets.
78+
79+
80+
Implementation
81+
--------------
82+
83+
The scheduler SHALL send a ``resource.acquire`` streaming RPC request at
84+
initialization to obtain resources to be used for scheduling and monitor
85+
changes in status.
86+
87+
88+
Acquire Request
89+
^^^^^^^^^^^^^^^
90+
91+
The ``resource.acquire`` request has no payload.
92+
93+
94+
Initial Acquire Response
95+
^^^^^^^^^^^^^^^^^^^^^^^^
96+
97+
The initial ``resource.acquire`` response SHALL include the following keys:
98+
99+
resources
100+
(object) RFC 20 (R version 1) resource object that contains the full resource
101+
inventory, less execution targets excluded by configuration. The scheduler
102+
MAY use this set to determine the general satisfiability of job requests.
103+
104+
up
105+
(string) RFC 22 idset of execution targets in ``resources`` that are
106+
initially available. The scheduler SHALL only allocate the resources
107+
associated with an execution target to jobs if the target is up.
108+
109+
Example:
110+
111+
.. code:: json
112+
113+
{
114+
"resources": {
115+
"version": 1,
116+
"execution": {
117+
"R_lite": [
118+
{
119+
"rank": "0-5",
120+
"children": {
121+
"core": "0-5",
122+
"gpu": "0"
123+
}
124+
}
125+
],
126+
"starttime": 0,
127+
"expiration": 0,
128+
"nodelist": [
129+
"host[0-5]"
130+
]
131+
}
132+
},
133+
"up": "0-2"
134+
}
135+
136+
137+
Additional Acquire Responses
138+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
139+
140+
Subsequent ``resource.acquire`` responses SHALL include one or more
141+
of the following OPTIONAL keys:
142+
143+
up
144+
(string) RFC 22 idset of execution targets that should be marked available
145+
for scheduling. The idset only contains targets that are transitioning,
146+
not the full set of available targets.
147+
148+
down
149+
(string) RFC 22 idset of execution targets that should be marked unavailable
150+
for scheduling. The idset only contains targets that are transitioning,
151+
not the full set of unavailable targets.
152+
153+
154+
Example:
155+
156+
.. code:: json
157+
158+
{
159+
"up": "3-6",
160+
"down": "2"
161+
}
162+
163+
If down resources are assigned to a job, the scheduler SHALL NOT raise an
164+
exception on the job. The execution system takes the active role in handling
165+
failures in this case. Eventually the scheduler will receive a ``sched.free``
166+
request for the offline resources.
167+
168+
.. note::
169+
*down* encompasses both crashed and drained execution targets.
170+
The scheduler handles both cases the same, so they are not differentiated
171+
in the protocol.
172+
173+
Error Response
174+
^^^^^^^^^^^^^^
175+
176+
If an error response is returned to ``resource.acquire``, the scheduler
177+
should log the error and exit the reactor, as failure indicates either a
178+
catastrophic error, a failure to acquire any resources, or a failure to
179+
conform to this protocol.
180+
181+
182+
Disconnect Request
183+
^^^^^^^^^^^^^^^^^^
184+
185+
If the scheduler is unloaded, a disconnect request is automatically sent to
186+
the flux-core resource module. This cancels the ``resource.acquire`` request
187+
and makes resources available for re-acquisition.
188+
189+
Running jobs are unaffected.
190+
191+
.. note::
192+
This behavior on disconnect is intended to support reloading the
193+
scheduler on a live system without impacting the running workload.
194+
195+
Since resources may remain allocated to jobs after a disconnect, it is
196+
presumed that re-acquisition of resources will be accompanied by a
197+
``job-manager.hello`` request, as described in RFC 27, to rediscover
198+
these allocations.

spell.en.pws

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -450,3 +450,6 @@ Hostlists
450450
afterany
451451
afterok
452452
afternotok
453+
login
454+
satisfiability
455+
satisfiable

0 commit comments

Comments
 (0)