-
Notifications
You must be signed in to change notification settings - Fork 15.4k
Document DRA Device Binding Conditions in v1.36 #54541
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
k8s-ci-robot
merged 1 commit into
kubernetes:dev-1.36
from
ttsuuubasa:dev-1.36-dra-device-binding-conditions
Apr 8, 2026
+36
−25
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -948,7 +948,7 @@ Resource pool status is an *alpha feature* and only enabled when the | |
| [`DRAResourcePoolStatus` feature gate](/docs/reference/command-line-tools-reference/feature-gates/#DRAResourcePoolStatus) | ||
| is enabled in the kube-apiserver and kube-controller-manager. | ||
|
|
||
| ### Device Binding Conditions {#device-binding-conditions} | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👍 Someone at some point decided that we need manual anchors. I don't know why and agree that we should remove them. |
||
| ### Device binding conditions | ||
|
|
||
| {{< feature-state feature_gate_name="DRADeviceBindingConditions" >}} | ||
|
|
||
|
|
@@ -970,13 +970,16 @@ following fields in the `Device` section of a `ResourceSlice`. Cluster administr | |
| must enable the `DRADeviceBindingConditions` and `DRAResourceClaimDeviceStatus` feature | ||
| gates for the scheduler to honor these fields. | ||
|
|
||
| - `bindingConditions`: A list of condition types that must be set to True in the | ||
| status.conditions field of the associated ResourceClaim before the Pod can be bound. | ||
| These typically represent readiness signals such as "DeviceAttached" or "DeviceInitialized". | ||
| - `bindingFailureConditions`: A list of condition types that, if set to True in | ||
| `bindingConditions` | ||
| : A list of _condition types_ that must be set to True (in the `.status.conditions` field of the associated ResourceClaim) before the Pod can be bound. These conditions typically represent readiness signals, such as DeviceAttached or DeviceInitialized. | ||
|
|
||
| `bindingFailureConditions` | ||
| : A list of condition types that, if set to True in | ||
| status.conditions field of the associated ResourceClaim, indicate a failure state. | ||
| If any of these conditions are True, the scheduler will abort binding and reschedule the Pod. | ||
| - `bindsToNode`: if set to `true`, the scheduler records the selected node name in the | ||
|
|
||
| `bindsToNode` | ||
| : if set to `true`, the scheduler records the selected node name in the | ||
| `status.allocation.nodeSelector` field of the ResourceClaim. | ||
| This does not affect the Pod's `spec.nodeSelector`. Instead, it sets a node selector | ||
| inside the ResourceClaim, which external controllers can use to perform node-specific | ||
|
|
@@ -990,13 +993,32 @@ condition semantics (`type`, `status`, `reason`, `message`, `lastTransitionTime` | |
| The scheduler waits up to **600 seconds** (default) for all `bindingConditions` to become `True`. | ||
| If the timeout is reached or any `bindingFailureConditions` are `True`, the scheduler | ||
| clears the allocation and reschedules the Pod. | ||
| This timeout duration is configurable by the user through `KubeSchedulerConfiguration`. | ||
| A cluster administration can configure this timeout duration by editing the kube-scheduler configuration file. | ||
|
|
||
| An example of configuring this timeout in `KubeSchedulerConfiguration` is given below: | ||
|
|
||
| ```yaml | ||
| apiVersion: kubescheduler.config.k8s.io/v1 | ||
| kind: KubeSchedulerConfiguration | ||
| profiles: | ||
| - schedulerName: default-scheduler | ||
| pluginConfig: | ||
| - name: DynamicResources | ||
| args: | ||
| apiVersion: kubescheduler.config.k8s.io/v1 | ||
| kind: DynamicResourcesArgs | ||
| bindingTimeout: 60s | ||
| ``` | ||
|
|
||
| #### Example {#device-binding-conditions-example} | ||
|
|
||
| Here is an example of a ResourceSlice that you might see in a cluster where there's a DRA driver in use, and that driver supports binding conditions: | ||
|
|
||
| ```yaml | ||
| apiVersion: resource.k8s.io/v1 | ||
| kind: ResourceSlice | ||
| metadata: | ||
| name: gpu-slice | ||
| name: gpu-slice-1 | ||
| spec: | ||
| driver: dra.example.com | ||
| nodeSelector: | ||
|
|
@@ -1036,24 +1058,9 @@ must be prepared (the `is-prepared` condition has a status of `True`) before bin | |
| - External controllers can use the node selector in the ResourceClaim to perform | ||
| node-specific setup on the selected node. | ||
|
|
||
| An example of configuring this timeout in `KubeSchedulerConfiguration` is given below: | ||
|
|
||
| ```yaml | ||
| apiVersion: kubescheduler.config.k8s.io/v1 | ||
| kind: KubeSchedulerConfiguration | ||
| profiles: | ||
| - schedulerName: default-scheduler | ||
| pluginConfig: | ||
| - name: DynamicResources | ||
| args: | ||
| apiVersion: kubescheduler.config.k8s.io/v1 | ||
| kind: DynamicResourcesArgs | ||
| bindingTimeout: 60s | ||
| ``` | ||
|
|
||
| Device binding conditions is an *alpha feature* and only enabled when the | ||
| Device binding conditions is a *beta feature* and is enabled by default, controlled by the | ||
| [`DRADeviceBindingConditions` feature gate](/docs/reference/command-line-tools-reference/feature-gates/#DRADeviceBindingConditions) | ||
| is enabled in the kube-apiserver and kube-scheduler. | ||
| in the kube-apiserver and kube-scheduler. | ||
|
|
||
| ### Node allocatable resources {#node-allocatable-resources} | ||
|
|
||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: when we document (sub)features for DRA, we should place them where they would belong if they were stable.
If we do that, then when features graduate, the docs remain easy to find and use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we were to pursue this, I feel that the current sections such as “DRA beta features” and “DRA alpha features” would no longer be appropriate, and that we would need to reconsider the overall structure of this chapter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @lmktfy; we merged the removal of those artificial sections for exactly that reason in #54648.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re-organization of DRA (sub)features may need to be done in a follow-up to this PR
I'll create an issue