You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: keps/sig-scheduling/5007-device-attach-before-pod-scheduled/README.md
+38-35Lines changed: 38 additions & 35 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -231,25 +231,26 @@ and make progress.
231
231
232
232
The basic idea is the following:
233
233
234
-
1.**Adding Attributes to ResourceSlice**:
235
-
- Add an attribute to `ResourceSlice` to indicate fabric devices. This key is predefined as part of the attributes.
234
+
1.**Adding BindingConditions and BindingFailureConditions to ResourceSlice**:
235
+
- Add conditions to `ResourceSlice` to indicate the device needs some preparation before the scheduler proceeds `Bind` Phase.
236
+
- For example, in a composable system, it is necessary to attach devices to nodes.
237
+
- DRA driver can set any condition to BindingConditions or BindingFailureConditions depending on the characteristics of the device it manages.
236
238
237
239
2.**Waiting for Device Attachment in PreBind**:
238
-
- For fabric devices, the scheduler waits for the device attachment to complete during the `PreBind` phase.
240
+
- The scheduler waits until all Conditions in BindingConditions are True.
241
+
- For fabric devices, this means that the scheduler waits for the device attachment to complete during the `PreBind` phase.
239
242
240
243
3.**PreBind Process**:
241
244
The overall flow of the `PreBind` process is as follows:
242
245
243
246
-**Updating ResourceClaim**:
244
-
- The scheduler DRA plugin updates the `ResourceClaim` to notify the Composable DRA Controllers that device attachment is needed.
245
-
This is the same as the existing `PreBind` process.
246
-
- In addition to the existing operations, the update to the `ResourceClaim` includes setting the necessary values in the `AllocatedDeviceStatus` conditions.
247
+
- The scheduler DRA plugin copies `BindingConditions` and `BindingFailureConditions` from `ResourceSlice.Device.Basic` to `AllocatedDeviceStatus.Conditions`.
247
248
248
249
-**Monitoring and Preparation by Composable DRA Controllers**:
249
250
- Composable DRA Controllers monitor the `ResourceClaim`. If a device that requires preparation is associated with the `ResourceClaim`, they perform the necessary preparations.
250
251
- Once the preparation is complete, they set the conditions to `true`.
251
252
- Please note that the scheduler need to abandon binding after the attach is complete in the case of a composable system.
252
-
Therefore, Composable DRA Controller sets the condition in BindingFailureGates to true after the attach is complete.
253
+
Therefore, Composable DRA Controller sets the condition in BindingFailureConditions to true after the attach is complete.
253
254
254
255
-**Completion of the PreBind Phase**:
255
256
- Once all conditions are met, the `PreBind` phase is completed, and the scheduler proceeds to the next step.
@@ -366,51 +367,56 @@ type BasicDevice struct {
366
367
// +optional
367
368
Attributesmap[QualifiedName]DeviceAttribute
368
369
369
-
//BindingGates defines the gates for binding.
370
+
//BindingConditions defines the conditions for binding.
370
371
//
371
372
// +optional
372
-
BindingGates []string
373
+
BindingConditions []string
373
374
374
-
//BindingFailureGates defines the gates for binding failure.
375
+
//BindingFailureConditions defines the conditions for binding failure.
375
376
//
376
377
// +optional
377
-
BindingFailureGates []string
378
+
BindingFailureConditions []string
378
379
379
380
// UsageRestrictedToNode indicates if the usage of an allocation involving this device
380
381
// has to be limited to exactly the node that was chosen when allocating the claim.
381
382
//
382
383
// +optional
383
384
UsageRestrictedToNodebool
384
385
385
-
//BindingTimeout indicates the prepare timeout period(minute).
386
+
//BindingTimeoutSeconds indicates the prepare timeout period.
386
387
// If the timeout period is exceeded, the scheduler clears the allocation in the ResourceClaim and reschedules the Pod.
The `BindingGates` and `BindingFailureGates` fields within `AllocatedDeviceStatus` are used to indicate the status of the device attachment.
401
+
The `BindingConditions` and `BindingFailureConditions` fields within `AllocatedDeviceStatus.Conditions` are used to indicate the status of the device attachment.
396
402
These fields will contain a list of conditions, each representing a specific state or event related to the device.
397
403
398
404
For this feature, following fields are added:
399
405
400
406
```go
401
-
// AllocatedDeviceStatus contains the status of an allocated device, if the
407
+
// AllocatedDeviceStatus.Conditions contains the status of an allocated device, if the
402
408
// driver chooses to report it. This may include driver-specific information.
403
-
typeAllocatedDeviceStatusstruct {
409
+
typeAllocatedDeviceStatus.Conditionsstruct {
404
410
...
405
-
//BindingGates defines the gates for binding.
411
+
//BindingConditions defines the conditions for binding.
406
412
//
407
413
// +optional
408
-
BindingGatesmap[string]bool
414
+
BindingConditionsmap[string]bool
409
415
410
-
//BindingFailureGates defines the gates for binding failure.
416
+
//BindingFailureConditions defines the conditions for binding failure.
411
417
//
412
418
// +optional
413
-
BindingFailureGatesmap[string]bool
419
+
BindingFailureConditionsmap[string]bool
414
420
}
415
421
```
416
422
@@ -420,28 +426,25 @@ When `UsageRestrictedToNode: true` is set, the scheduler DRA plugin will perform
420
426
421
427
1.**Set NodeSelector**: Before the `PreBind` phase, add the `NodeName` to the `ResourceClaim`'s `NodeSelector`.
422
428
423
-
If Gates are present, the scheduler DRA plugin will perform the following steps during the `PreBind` phase:
429
+
If Conditions are present, the scheduler DRA plugin will perform the following steps during the `PreBind` phase:
424
430
425
-
2.**Copy Gates**: Copy `BindingGates` and `BindingFailureGates` from `ResourceSlice.Device.Basic` to `AllocatedDeviceStatus`.
431
+
2.**Copy Conditions**: Copy `BindingConditions` and `BindingFailureConditions` from `ResourceSlice.Device.Basic` to `AllocatedDeviceStatus`.
426
432
3.**Wait for Conditions**: Wait for the following conditions:
427
-
- Wait until all conditions in the BindingGates are `True` before proceeding to Bind.
428
-
- If any one of the conditions in the BindingFailureGates becomes `True`, clear the allocation in the `ResourceClaim` and reschedule the Pod.
429
-
- If the preparation of a device takes longer than the `BindingTimeout` period, clear the allocation in the `ResourceClaim` and reschedule the Pod.
433
+
- Wait until all conditions in the BindingConditions are `True` before proceeding to Bind.
434
+
- If any one of the conditions in the BindingFailureConditions becomes `True`, clear the allocation in the `ResourceClaim` and reschedule the Pod.
435
+
- If the preparation of a device takes longer than the `BindingTimeoutSeconds` period, clear the allocation in the `ResourceClaim` and reschedule the Pod.
430
436
431
-
To support these steps, for example, a DRA driver can include the following definitions in BindingGates or BindingFailureGates within a ResourceSlice:
437
+
To support these steps, for example, a DRA driver can include the following definitions in BindingConditions or BindingFailureConditions within a ResourceSlice:
432
438
433
439
```go
434
440
const (
435
-
// NeedToPreparing indicates that this device needs some preparation.
@@ -464,10 +467,10 @@ During the scheduling cycle, the DRA plugin reserves a `ResourceSlice` for the `
464
467
In the binding cycle, the reserved `ResourceSlice` is assigned during `PreBind`.
465
468
466
469
If a fabric device is selected, the scheduler waits for the device attachment during `PreBind`.
467
-
The composable controller performs the attachment operation by checking the flag of the `ResourceClaim`.
470
+
The composable controller performs the attachment operation by checking the flag of BindingConditions in the `ResourceClaim`.
468
471
If the attachment fails, the following steps are taken:
469
472
470
-
1.**Update ResourceClaim**: The composable controller updates the `AllocatedDeviceStatus` to indicate the failure of the attachment by setting a condition with `Type: kubernetes.io/attach-failed` and `Status: True`.
473
+
1.**Update ResourceClaim**: The composable controller updates the `AllocatedDeviceStatus.Conditions` to indicate the failure of the attachment by setting a condition in BindingFailureConditions to `True`.
471
474
2.**Fail the Binding Cycle**: The scheduler detects the failed attachment condition and fails the binding cycle. This prevents the pod from proceeding with an unattached device.
472
475
3.**Unbind ResourceClaim and ResourceSlice**: The scheduler DRA plugin unbinds the `ResourceClaim` and `ResourceSlice` in `Unreserve`, clearing the allocation to prevent the fabric device from being used in the `ResourceClaim`.
473
476
4.**Retry Scheduling**: In the next scheduling cycle, the scheduler attempts to bind the `ResourceClaim` again.
@@ -767,7 +770,7 @@ well as the [existing list] of feature gates.
767
770
-->
768
771
769
772
-[x] Feature gate (also fill in values in `kep.yaml`)
770
-
- Feature gate name: DRAPrebindingGates
773
+
- Feature gate name: DRAPrebindingConditions
771
774
- Components depending on the feature gate: kube-scheduler
0 commit comments