@@ -493,6 +493,109 @@ create ResourceClaim or ResourceClaimTemplate objects in namespaces labeled with
493493This ensures that non-admin users cannot misuse the feature.
494494Starting with Kubernetes v1.34, this label has been updated to `resource.kubernetes.io/admin-access : " true" ` .
495495
496+ ### Device Binding Conditions {#device-binding-conditions}
497+
498+ {{< feature-state feature_gate_name="DRADeviceBindingConditions" >}}
499+
500+ Device Binding Conditions allow the Kubernetes scheduler to delay Pod binding until
501+ external resources, such as fabric-attached GPUs or reprogrammable FPGAs, are confirmed
502+ to be ready.
503+
504+ This waiting behavior is implemented in the
505+ [PreBind phase](/docs/concepts/scheduling-eviction/scheduling-framework/#pre-bind)
506+ of the scheduling framework.
507+ During this phase, the scheduler checks whether all required device conditions are
508+ satisfied before proceeding with binding.
509+
510+ This improves scheduling reliability by avoiding premature binding and enables coordination
511+ with external device controllers.
512+
513+ To use this feature, device drivers (typically managed by driver owners) must publish the
514+ following fields in the ` Device` section of a `ResourceSlice`. Cluster administrators
515+ must enable the `DRADeviceBindingConditions` and `DRAResourceClaimDeviceStatus` feature
516+ gates for the scheduler to honor these fields.
517+
518+ - `bindingConditions` : A list of condition types that must be set to True in the
519+ status.conditions field of the associated ResourceClaim before the Pod can be bound.
520+ These typically represent readiness signals such as "DeviceAttached" or "DeviceInitialized".
521+ - `bindingFailureConditions` : A list of condition types that, if set to True in
522+ status.conditions field of the associated ResourceClaim, indicate a failure state.
523+ If any of these conditions are True, the scheduler will abort binding and reschedule the Pod.
524+ - `bindsToNode` : if set to `true`, the scheduler records the selected node name in the
525+ ` status.allocation.nodeSelector` field of the ResourceClaim.
526+ This does not affect the Pod's `spec.nodeSelector`. Instead, it sets a node selector
527+ inside the ResourceClaim, which external controllers can use to perform node-specific
528+ operations such as device attachment or preparation.
529+
530+ All condition types listed in bindingConditions and bindingFailureConditions are evaluated
531+ from the `status.conditions` field of the ResourceClaim.
532+ External controllers are responsible for updating these conditions using standard Kubernetes
533+ condition semantics (`type`, `status`, `reason`, `message`, `lastTransitionTime`).
534+
535+ The scheduler waits up to **600 seconds** (default) for all `bindingConditions` to become `True`.
536+ If the timeout is reached or any `bindingFailureConditions` are `True`, the scheduler
537+ clears the allocation and reschedules the Pod.
538+ This timeout duration is configurable by the user through `KubeSchedulerConfiguration`.
539+
540+ ` ` ` yaml
541+ apiVersion: resource.k8s.io/v1
542+ kind: ResourceSlice
543+ metadata:
544+ name: gpu-slice
545+ spec:
546+ driver: dra.example.com
547+ nodeSelector:
548+ nodeSelectorTerms:
549+ - matchExpressions:
550+ - key: accelerator-type
551+ operator: In
552+ values:
553+ - "high-performance"
554+ pool:
555+ name: gpu-pool
556+ generation: 1
557+ resourceSliceCount: 1
558+ devices:
559+ - name: gpu-1
560+ attributes:
561+ vendor:
562+ string: "example"
563+ model:
564+ string: "example-gpu"
565+ bindsToNode: true
566+ bindingConditions:
567+ - dra.example.com/is-prepared
568+ bindingFailureConditions:
569+ - dra.example.com/preparing-failed
570+ ` ` `
571+ This example ResourceSlice has the following properties :
572+
573+ - The ResourceSlice targets nodes labeled with `accelerator-type=high-performance`,
574+ so that the scheduler uses only a specific set of eligible nodes.
575+ - The scheduler selects one node from the selected group (for example, `node-3`) and sets
576+ the `status.allocation.nodeSelector` field in the ResourceClaim to that node name.
577+ - The `dra.example.com/is-prepared` binding condition indicates that the device `gpu-1`
578+ must be prepared (the `is-prepared` condition has a status of `True`) before binding.
579+ - If the `gpu-1` device preparation fails (the `preparing-failed` condition has a status of `True`), the scheduler aborts binding.
580+ - The scheduler waits up to 600 seconds (default) for the device to become ready.
581+ - External controllers can use the node selector in the ResourceClaim to perform
582+ node-specific setup on the selected node.
583+
584+ An example of configuring this timeout in `KubeSchedulerConfiguration` is given below :
585+
586+ ` ` ` yaml
587+ apiVersion: kubescheduler.config.k8s.io/v1
588+ kind: KubeSchedulerConfiguration
589+ profiles:
590+ - schedulerName: default-scheduler
591+ pluginConfig:
592+ - name: DynamicResources
593+ args:
594+ apiVersion: kubescheduler.config.k8s.io/v1
595+ kind: DynamicResourcesArgs
596+ bindingTimeout: 60s
597+ ` ` `
598+
496599# # DRA alpha features {#alpha-features}
497600
498601The following sections describe DRA features that are available in the Alpha
@@ -865,109 +968,6 @@ actually triggering eviction:
865968
866969- Edit the DeviceTaintRule and change the effect into `NoExecute`.
867970
868- ### Device Binding Conditions {#device-binding-conditions}
869-
870- {{< feature-state feature_gate_name="DRADeviceBindingConditions" >}}
871-
872- Device Binding Conditions allow the Kubernetes scheduler to delay Pod binding until
873- external resources, such as fabric-attached GPUs or reprogrammable FPGAs, are confirmed
874- to be ready.
875-
876- This waiting behavior is implemented in the
877- [PreBind phase](/docs/concepts/scheduling-eviction/scheduling-framework/#pre-bind)
878- of the scheduling framework.
879- During this phase, the scheduler checks whether all required device conditions are
880- satisfied before proceeding with binding.
881-
882- This improves scheduling reliability by avoiding premature binding and enables coordination
883- with external device controllers.
884-
885- To use this feature, device drivers (typically managed by driver owners) must publish the
886- following fields in the `Device` section of a `ResourceSlice`. Cluster administrators
887- must enable the `DRADeviceBindingConditions` and `DRAResourceClaimDeviceStatus` feature
888- gates for the scheduler to honor these fields.
889-
890- - `bindingConditions`: A list of condition types that must be set to True in the
891- status.conditions field of the associated ResourceClaim before the Pod can be bound.
892- These typically represent readiness signals such as "DeviceAttached" or "DeviceInitialized".
893- - `bindingFailureConditions`: A list of condition types that, if set to True in
894- status.conditions field of the associated ResourceClaim, indicate a failure state.
895- If any of these conditions are True, the scheduler will abort binding and reschedule the Pod.
896- - `bindsToNode`: if set to `true`, the scheduler records the selected node name in the
897- `status.allocation.nodeSelector` field of the ResourceClaim.
898- This does not affect the Pod's `spec.nodeSelector`. Instead, it sets a node selector
899- inside the ResourceClaim, which external controllers can use to perform node-specific
900- operations such as device attachment or preparation.
901-
902- All condition types listed in bindingConditions and bindingFailureConditions are evaluated
903- from the `status.conditions` field of the ResourceClaim.
904- External controllers are responsible for updating these conditions using standard Kubernetes
905- condition semantics (`type`, `status`, `reason`, `message`, `lastTransitionTime`).
906-
907- The scheduler waits up to **600 seconds** (default) for all `bindingConditions` to become `True`.
908- If the timeout is reached or any `bindingFailureConditions` are `True`, the scheduler
909- clears the allocation and reschedules the Pod.
910- This timeout duration is configurable by the user through `KubeSchedulerConfiguration`.
911-
912- ```yaml
913- apiVersion: resource.k8s.io/v1
914- kind: ResourceSlice
915- metadata:
916- name: gpu-slice
917- spec:
918- driver: dra.example.com
919- nodeSelector:
920- nodeSelectorTerms:
921- - matchExpressions:
922- - key: accelerator-type
923- operator: In
924- values:
925- - "high-performance"
926- pool:
927- name: gpu-pool
928- generation: 1
929- resourceSliceCount: 1
930- devices:
931- - name: gpu-1
932- attributes:
933- vendor:
934- string: "example"
935- model:
936- string: "example-gpu"
937- bindsToNode: true
938- bindingConditions:
939- - dra.example.com/is-prepared
940- bindingFailureConditions:
941- - dra.example.com/preparing-failed
942- ```
943- This example ResourceSlice has the following properties:
944-
945- - The ResourceSlice targets nodes labeled with ` accelerator-type=high-performance ` ,
946- so that the scheduler uses only a specific set of eligible nodes.
947- - The scheduler selects one node from the selected group (for example, ` node-3 ` ) and sets
948- the ` status.allocation.nodeSelector ` field in the ResourceClaim to that node name.
949- - The ` dra.example.com/is-prepared ` binding condition indicates that the device ` gpu-1 `
950- must be prepared (the ` is-prepared ` condition has a status of ` True ` ) before binding.
951- - If the ` gpu-1 ` device preparation fails (the ` preparing-failed ` condition has a status of ` True ` ), the scheduler aborts binding.
952- - The scheduler waits up to 600 seconds (default) for the device to become ready.
953- - External controllers can use the node selector in the ResourceClaim to perform
954- node-specific setup on the selected node.
955-
956- An example of configuring this timeout in ` KubeSchedulerConfiguration ` is given below:
957-
958- ``` yaml
959- apiVersion : kubescheduler.config.k8s.io/v1
960- kind : KubeSchedulerConfiguration
961- profiles :
962- - schedulerName : default-scheduler
963- pluginConfig :
964- - name : DynamicResources
965- args :
966- apiVersion : kubescheduler.config.k8s.io/v1
967- kind : DynamicResourcesArgs
968- bindingTimeout : 60s
969- ` ` `
970-
971971## {{% heading "whatsnext" %}}
972972
973973- [Set Up DRA in a Cluster](/docs/tasks/configure-pod-container/assign-resources/set-up-dra-cluster/)
0 commit comments