Skip to content

Commit 2596f2d

Browse files
committed
KEP-5007: Update docs for DRADeviceBindingConditions in v1.36
Signed-off-by: Tsubasa Watanabe <w.tsubasa@fujitsu.com>
1 parent 85be459 commit 2596f2d

File tree

2 files changed

+36
-25
lines changed

2 files changed

+36
-25
lines changed

content/en/docs/concepts/scheduling-eviction/dynamic-resource-allocation.md

Lines changed: 32 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -947,7 +947,7 @@ Resource pool status is an *alpha feature* and only enabled when the
947947
[`DRAResourcePoolStatus` feature gate](/docs/reference/command-line-tools-reference/feature-gates/#DRAResourcePoolStatus)
948948
is enabled in the kube-apiserver and kube-controller-manager.
949949

950-
### Device Binding Conditions {#device-binding-conditions}
950+
### Device binding conditions
951951

952952
{{< feature-state feature_gate_name="DRADeviceBindingConditions" >}}
953953

@@ -969,13 +969,16 @@ following fields in the `Device` section of a `ResourceSlice`. Cluster administr
969969
must enable the `DRADeviceBindingConditions` and `DRAResourceClaimDeviceStatus` feature
970970
gates for the scheduler to honor these fields.
971971

972-
- `bindingConditions`: A list of condition types that must be set to True in the
973-
status.conditions field of the associated ResourceClaim before the Pod can be bound.
974-
These typically represent readiness signals such as "DeviceAttached" or "DeviceInitialized".
975-
- `bindingFailureConditions`: A list of condition types that, if set to True in
972+
`bindingConditions`
973+
: A list of _condition types_ that must be set to True (in the `.status.conditions` field of the associated ResourceClaim) before the Pod can be bound. These conditions typically represent readiness signals, such as DeviceAttached or DeviceInitialized.
974+
975+
`bindingFailureConditions`
976+
: A list of condition types that, if set to True in
976977
status.conditions field of the associated ResourceClaim, indicate a failure state.
977978
If any of these conditions are True, the scheduler will abort binding and reschedule the Pod.
978-
- `bindsToNode`: if set to `true`, the scheduler records the selected node name in the
979+
980+
`bindsToNode`
981+
: if set to `true`, the scheduler records the selected node name in the
979982
`status.allocation.nodeSelector` field of the ResourceClaim.
980983
This does not affect the Pod's `spec.nodeSelector`. Instead, it sets a node selector
981984
inside the ResourceClaim, which external controllers can use to perform node-specific
@@ -989,13 +992,32 @@ condition semantics (`type`, `status`, `reason`, `message`, `lastTransitionTime`
989992
The scheduler waits up to **600 seconds** (default) for all `bindingConditions` to become `True`.
990993
If the timeout is reached or any `bindingFailureConditions` are `True`, the scheduler
991994
clears the allocation and reschedules the Pod.
992-
This timeout duration is configurable by the user through `KubeSchedulerConfiguration`.
995+
A cluster administration can configure this timeout duration by editing the kube-scheduler configuration file.
996+
997+
An example of configuring this timeout in `KubeSchedulerConfiguration` is given below:
998+
999+
```yaml
1000+
apiVersion: kubescheduler.config.k8s.io/v1
1001+
kind: KubeSchedulerConfiguration
1002+
profiles:
1003+
- schedulerName: default-scheduler
1004+
pluginConfig:
1005+
- name: DynamicResources
1006+
args:
1007+
apiVersion: kubescheduler.config.k8s.io/v1
1008+
kind: DynamicResourcesArgs
1009+
bindingTimeout: 60s
1010+
```
1011+
1012+
#### Example {#device-binding-conditions-example}
1013+
1014+
Here is an example of a ResourceSlice that you might see in a cluster where there's a DRA driver in use, and that driver supports binding conditions:
9931015
9941016
```yaml
9951017
apiVersion: resource.k8s.io/v1
9961018
kind: ResourceSlice
9971019
metadata:
998-
name: gpu-slice
1020+
name: gpu-slice-1
9991021
spec:
10001022
driver: dra.example.com
10011023
nodeSelector:
@@ -1035,24 +1057,9 @@ must be prepared (the `is-prepared` condition has a status of `True`) before bin
10351057
- External controllers can use the node selector in the ResourceClaim to perform
10361058
node-specific setup on the selected node.
10371059

1038-
An example of configuring this timeout in `KubeSchedulerConfiguration` is given below:
1039-
1040-
```yaml
1041-
apiVersion: kubescheduler.config.k8s.io/v1
1042-
kind: KubeSchedulerConfiguration
1043-
profiles:
1044-
- schedulerName: default-scheduler
1045-
pluginConfig:
1046-
- name: DynamicResources
1047-
args:
1048-
apiVersion: kubescheduler.config.k8s.io/v1
1049-
kind: DynamicResourcesArgs
1050-
bindingTimeout: 60s
1051-
```
1052-
1053-
Device binding conditions is an *alpha feature* and only enabled when the
1060+
Device binding conditions is an *beta feature* and is enabled by default with the
10541061
[`DRADeviceBindingConditions` feature gate](/docs/reference/command-line-tools-reference/feature-gates/#DRADeviceBindingConditions)
1055-
is enabled in the kube-apiserver and kube-scheduler.
1062+
in the kube-apiserver and kube-scheduler.
10561063

10571064
## {{% heading "whatsnext" %}}
10581065

content/en/docs/reference/command-line-tools-reference/feature-gates/DRADeviceBindingConditions.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,10 @@ stages:
99
- stage: alpha
1010
defaultValue: false
1111
fromVersion: "1.34"
12+
toVersion: "1.35"
13+
- stage: beta
14+
defaultValue: true
15+
fromVersion: "1.36"
1216
---
1317
Enables support for DeviceBindingConditions in the DRA related fields.
1418
This allows for thorough device readiness checks and attachment processes before Bind phase.

0 commit comments

Comments
 (0)