Skip to content

Commit 4400fe0

Browse files
committed
KEP-5007: Update docs for DRADeviceBindingConditions in v1.36
Signed-off-by: Tsubasa Watanabe <w.tsubasa@fujitsu.com>
1 parent 236018e commit 4400fe0

File tree

2 files changed

+107
-103
lines changed

2 files changed

+107
-103
lines changed

content/en/docs/concepts/scheduling-eviction/dynamic-resource-allocation.md

Lines changed: 103 additions & 103 deletions
Original file line numberDiff line numberDiff line change
@@ -493,6 +493,109 @@ create ResourceClaim or ResourceClaimTemplate objects in namespaces labeled with
493493
This ensures that non-admin users cannot misuse the feature.
494494
Starting with Kubernetes v1.34, this label has been updated to `resource.kubernetes.io/admin-access: "true"`.
495495
496+
### Device Binding Conditions {#device-binding-conditions}
497+
498+
{{< feature-state feature_gate_name="DRADeviceBindingConditions" >}}
499+
500+
Device Binding Conditions allow the Kubernetes scheduler to delay Pod binding until
501+
external resources, such as fabric-attached GPUs or reprogrammable FPGAs, are confirmed
502+
to be ready.
503+
504+
This waiting behavior is implemented in the
505+
[PreBind phase](/docs/concepts/scheduling-eviction/scheduling-framework/#pre-bind)
506+
of the scheduling framework.
507+
During this phase, the scheduler checks whether all required device conditions are
508+
satisfied before proceeding with binding.
509+
510+
This improves scheduling reliability by avoiding premature binding and enables coordination
511+
with external device controllers.
512+
513+
To use this feature, device drivers (typically managed by driver owners) must publish the
514+
following fields in the `Device` section of a `ResourceSlice`. Cluster administrators
515+
must enable the `DRADeviceBindingConditions` and `DRAResourceClaimDeviceStatus` feature
516+
gates for the scheduler to honor these fields.
517+
518+
- `bindingConditions`: A list of condition types that must be set to True in the
519+
status.conditions field of the associated ResourceClaim before the Pod can be bound.
520+
These typically represent readiness signals such as "DeviceAttached" or "DeviceInitialized".
521+
- `bindingFailureConditions`: A list of condition types that, if set to True in
522+
status.conditions field of the associated ResourceClaim, indicate a failure state.
523+
If any of these conditions are True, the scheduler will abort binding and reschedule the Pod.
524+
- `bindsToNode`: if set to `true`, the scheduler records the selected node name in the
525+
`status.allocation.nodeSelector` field of the ResourceClaim.
526+
This does not affect the Pod's `spec.nodeSelector`. Instead, it sets a node selector
527+
inside the ResourceClaim, which external controllers can use to perform node-specific
528+
operations such as device attachment or preparation.
529+
530+
All condition types listed in bindingConditions and bindingFailureConditions are evaluated
531+
from the `status.conditions` field of the ResourceClaim.
532+
External controllers are responsible for updating these conditions using standard Kubernetes
533+
condition semantics (`type`, `status`, `reason`, `message`, `lastTransitionTime`).
534+
535+
The scheduler waits up to **600 seconds** (default) for all `bindingConditions` to become `True`.
536+
If the timeout is reached or any `bindingFailureConditions` are `True`, the scheduler
537+
clears the allocation and reschedules the Pod.
538+
This timeout duration is configurable by the user through `KubeSchedulerConfiguration`.
539+
540+
```yaml
541+
apiVersion: resource.k8s.io/v1
542+
kind: ResourceSlice
543+
metadata:
544+
name: gpu-slice
545+
spec:
546+
driver: dra.example.com
547+
nodeSelector:
548+
nodeSelectorTerms:
549+
- matchExpressions:
550+
- key: accelerator-type
551+
operator: In
552+
values:
553+
- "high-performance"
554+
pool:
555+
name: gpu-pool
556+
generation: 1
557+
resourceSliceCount: 1
558+
devices:
559+
- name: gpu-1
560+
attributes:
561+
vendor:
562+
string: "example"
563+
model:
564+
string: "example-gpu"
565+
bindsToNode: true
566+
bindingConditions:
567+
- dra.example.com/is-prepared
568+
bindingFailureConditions:
569+
- dra.example.com/preparing-failed
570+
```
571+
This example ResourceSlice has the following properties:
572+
573+
- The ResourceSlice targets nodes labeled with `accelerator-type=high-performance`,
574+
so that the scheduler uses only a specific set of eligible nodes.
575+
- The scheduler selects one node from the selected group (for example, `node-3`) and sets
576+
the `status.allocation.nodeSelector` field in the ResourceClaim to that node name.
577+
- The `dra.example.com/is-prepared` binding condition indicates that the device `gpu-1`
578+
must be prepared (the `is-prepared` condition has a status of `True`) before binding.
579+
- If the `gpu-1` device preparation fails (the `preparing-failed` condition has a status of `True`), the scheduler aborts binding.
580+
- The scheduler waits up to 600 seconds (default) for the device to become ready.
581+
- External controllers can use the node selector in the ResourceClaim to perform
582+
node-specific setup on the selected node.
583+
584+
An example of configuring this timeout in `KubeSchedulerConfiguration` is given below:
585+
586+
```yaml
587+
apiVersion: kubescheduler.config.k8s.io/v1
588+
kind: KubeSchedulerConfiguration
589+
profiles:
590+
- schedulerName: default-scheduler
591+
pluginConfig:
592+
- name: DynamicResources
593+
args:
594+
apiVersion: kubescheduler.config.k8s.io/v1
595+
kind: DynamicResourcesArgs
596+
bindingTimeout: 60s
597+
```
598+
496599
## DRA alpha features {#alpha-features}
497600

498601
The following sections describe DRA features that are available in the Alpha
@@ -865,109 +968,6 @@ actually triggering eviction:
865968
866969
- Edit the DeviceTaintRule and change the effect into `NoExecute`.
867970
868-
### Device Binding Conditions {#device-binding-conditions}
869-
870-
{{< feature-state feature_gate_name="DRADeviceBindingConditions" >}}
871-
872-
Device Binding Conditions allow the Kubernetes scheduler to delay Pod binding until
873-
external resources, such as fabric-attached GPUs or reprogrammable FPGAs, are confirmed
874-
to be ready.
875-
876-
This waiting behavior is implemented in the
877-
[PreBind phase](/docs/concepts/scheduling-eviction/scheduling-framework/#pre-bind)
878-
of the scheduling framework.
879-
During this phase, the scheduler checks whether all required device conditions are
880-
satisfied before proceeding with binding.
881-
882-
This improves scheduling reliability by avoiding premature binding and enables coordination
883-
with external device controllers.
884-
885-
To use this feature, device drivers (typically managed by driver owners) must publish the
886-
following fields in the `Device` section of a `ResourceSlice`. Cluster administrators
887-
must enable the `DRADeviceBindingConditions` and `DRAResourceClaimDeviceStatus` feature
888-
gates for the scheduler to honor these fields.
889-
890-
- `bindingConditions`: A list of condition types that must be set to True in the
891-
status.conditions field of the associated ResourceClaim before the Pod can be bound.
892-
These typically represent readiness signals such as "DeviceAttached" or "DeviceInitialized".
893-
- `bindingFailureConditions`: A list of condition types that, if set to True in
894-
status.conditions field of the associated ResourceClaim, indicate a failure state.
895-
If any of these conditions are True, the scheduler will abort binding and reschedule the Pod.
896-
- `bindsToNode`: if set to `true`, the scheduler records the selected node name in the
897-
`status.allocation.nodeSelector` field of the ResourceClaim.
898-
This does not affect the Pod's `spec.nodeSelector`. Instead, it sets a node selector
899-
inside the ResourceClaim, which external controllers can use to perform node-specific
900-
operations such as device attachment or preparation.
901-
902-
All condition types listed in bindingConditions and bindingFailureConditions are evaluated
903-
from the `status.conditions` field of the ResourceClaim.
904-
External controllers are responsible for updating these conditions using standard Kubernetes
905-
condition semantics (`type`, `status`, `reason`, `message`, `lastTransitionTime`).
906-
907-
The scheduler waits up to **600 seconds** (default) for all `bindingConditions` to become `True`.
908-
If the timeout is reached or any `bindingFailureConditions` are `True`, the scheduler
909-
clears the allocation and reschedules the Pod.
910-
This timeout duration is configurable by the user through `KubeSchedulerConfiguration`.
911-
912-
```yaml
913-
apiVersion: resource.k8s.io/v1
914-
kind: ResourceSlice
915-
metadata:
916-
name: gpu-slice
917-
spec:
918-
driver: dra.example.com
919-
nodeSelector:
920-
nodeSelectorTerms:
921-
- matchExpressions:
922-
- key: accelerator-type
923-
operator: In
924-
values:
925-
- "high-performance"
926-
pool:
927-
name: gpu-pool
928-
generation: 1
929-
resourceSliceCount: 1
930-
devices:
931-
- name: gpu-1
932-
attributes:
933-
vendor:
934-
string: "example"
935-
model:
936-
string: "example-gpu"
937-
bindsToNode: true
938-
bindingConditions:
939-
- dra.example.com/is-prepared
940-
bindingFailureConditions:
941-
- dra.example.com/preparing-failed
942-
```
943-
This example ResourceSlice has the following properties:
944-
945-
- The ResourceSlice targets nodes labeled with `accelerator-type=high-performance`,
946-
so that the scheduler uses only a specific set of eligible nodes.
947-
- The scheduler selects one node from the selected group (for example, `node-3`) and sets
948-
the `status.allocation.nodeSelector` field in the ResourceClaim to that node name.
949-
- The `dra.example.com/is-prepared` binding condition indicates that the device `gpu-1`
950-
must be prepared (the `is-prepared` condition has a status of `True`) before binding.
951-
- If the `gpu-1` device preparation fails (the `preparing-failed` condition has a status of `True`), the scheduler aborts binding.
952-
- The scheduler waits up to 600 seconds (default) for the device to become ready.
953-
- External controllers can use the node selector in the ResourceClaim to perform
954-
node-specific setup on the selected node.
955-
956-
An example of configuring this timeout in `KubeSchedulerConfiguration` is given below:
957-
958-
```yaml
959-
apiVersion: kubescheduler.config.k8s.io/v1
960-
kind: KubeSchedulerConfiguration
961-
profiles:
962-
- schedulerName: default-scheduler
963-
pluginConfig:
964-
- name: DynamicResources
965-
args:
966-
apiVersion: kubescheduler.config.k8s.io/v1
967-
kind: DynamicResourcesArgs
968-
bindingTimeout: 60s
969-
```
970-
971971
## {{% heading "whatsnext" %}}
972972
973973
- [Set Up DRA in a Cluster](/docs/tasks/configure-pod-container/assign-resources/set-up-dra-cluster/)

content/en/docs/reference/command-line-tools-reference/feature-gates/DRADeviceBindingConditions.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,10 @@ stages:
99
- stage: alpha
1010
defaultValue: false
1111
fromVersion: "1.34"
12+
toVersion: "1.35"
13+
- stage: beta
14+
defaultValue: true
15+
fromVersion: "1.36"
1216
---
1317
Enables support for DeviceBindingConditions in the DRA related fields.
1418
This allows for thorough device readiness checks and attachment processes before Bind phase.

0 commit comments

Comments
 (0)