Skip to content

Commit b0eea65

Browse files
committed
Addressed comments
1 parent e3a6289 commit b0eea65

File tree

1 file changed

+45
-7
lines changed

1 file changed

+45
-7
lines changed

content/en/blog/_posts/2026/dra-136-update.md

Lines changed: 45 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
layout: blog
3-
title: "Kubernetes v1.36: DRA has graduated to GA"
3+
title: "Kubernetes v1.36: More Drivers, New Features, and the Next Era of DRA"
44
slug: dra-136-updates
55
draft: true
66
date: XXXX-XX-XX
@@ -14,6 +14,12 @@ continues to mature, bringing a wave of feature graduations, critical usability
1414
improvements, and new capabilities that extends the flexibility of DRA to native
1515
resources like memory and CPU, and support for ResourceClaims in PodGroups.
1616

17+
We have also seen significant momentum in driver availability. Both the
18+
[NVIDIA GPU](https://github.com/NVIDIA/k8s-dra-driver-gpu)
19+
and Google TPU DRA drivers are being transferred to the Kubernetes project, joining the
20+
[DRANET](https://github.com/kubernetes-sigs/dranet)
21+
driver that was added last year.
22+
1723
Whether you are managing massive fleets of GPUs, need better handling of failures,
1824
or simply looking for better ways to define resource fallback options, the upgrades
1925
to DRA in 1.36 have something for you. Let's dive into the new features and graduations!
@@ -37,7 +43,7 @@ scheduling flexibility and cluster utilization.
3743

3844
As DRA becomes the standard for resource allocation, bridging the gap with legacy systems
3945
is crucial. The DRA
40-
[Extended Resource](https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#device-taints-and-tolerations)
46+
[Extended Resource](https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#extended-resource)
4147
feature allows users to request resources via traditional extended resources on a Pod.
4248
This allows for a gradual transition to DRA, meaning application developers and
4349
operators are not forced to immediately migrate their workloads to the ResourceClaim
@@ -71,6 +77,23 @@ devices or FPGAs—are fully prepared. By explicitly modeling resource readiness
7177
prevents premature assignments that can lead to Pod failures, ensuring a much more robust
7278
and predictable deployment process.
7379

80+
**Resource Health Status (Beta)**
81+
82+
Knowing when a device has failed or become unhealthy is critical for workloads running on
83+
specialized hardware. With
84+
[Resource Health Status](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#device-health-monitoring),
85+
Kubernetes now exposes device health information directly in the Pod Status through the
86+
`allocatedResourcesStatus` field. When a DRA driver detects that an allocated device
87+
has become unhealthy, it reports this back to the kubelet, which surfaces it in each
88+
container's status.
89+
90+
In 1.36, the feature graduates to beta (enabled by default) and adds an optional `message`
91+
field providing human readable context about the health status, such as error details or
92+
failure reasons. DRA drivers can also configure per device health check timeouts,
93+
allowing different hardware types to use appropriate timeout values based on their
94+
health reporting characteristics. This gives users and controllers crucial visibility
95+
to quickly identify and react to hardware failures.
96+
7497
## New Features
7598

7699
Beyond stabilizing existing capabilities, v1.36 introduces foundational new features
@@ -100,10 +123,25 @@ performance tuning.
100123

101124
One of the most requested features from cluster administrators has been better visibility
102125
into hardware capacity. The new
103-
[Resource Availability Visibility](add_link_here)
104-
feature introduces robust mechanisms to query and expose the total capacity, allocated
105-
usage, and available pool of DRA resources across the cluster. This unlocks better
106-
integration with dashboards and capacity planning tools.
126+
[DRAResourcePoolStatus](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#resource-pool-status)
127+
feature allows you to query the availability of devices in DRA resource pools. By creating a
128+
`ResourcePoolStatusRequest` object, you get a point-in-time snapshot of device counts
129+
— total, allocated, available, and unavailable — for each pool managed by a given
130+
driver. This enables better integration with dashboards and capacity planning tools.
131+
132+
**List Types for Attributes**
133+
134+
With
135+
[List Types for Attributes](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#list-type-attributes),
136+
DRA can represent device attributes as typed lists (int, bool, string, and version), not
137+
just scalar values. This helps model real hardware topology, such as devices that belong
138+
to multiple PCIe roots or NUMA domains.
139+
140+
This feature also extends `ResourceClaim` constraint behavior to work naturally
141+
with both scalar and list values: `matchAttribute` now checks for a non-empty
142+
intersection, and `distinctAttribute` checks for pairwise disjoint values.
143+
It also introduces `includes()` function in DRA CEL, which lets device selectors keep working
144+
more easily when an attribute changes between scalar and list representations.
107145

108146
**Device Allocation Ordering through Lexicographical Ordering**
109147

@@ -130,5 +168,5 @@ A good starting point is joining the WG Device Management
130168
[meetings](https://docs.google.com/document/d/1qxI87VqGtgN7EAJlqVfxx86HGKEAc2A3SKru8nJHNkQ/edit?tab=t.0#heading=h.tgg8gganowxq),
131169
which happen at US/EU and EU/APAC friendly time slots.
132170

133-
Not all enhancement ideas are tracked as issues yet, so come talk to us if you wantto help or have some ideas yourself!
171+
Not all enhancement ideas are tracked as issues yet, so come talk to us if you want to help or have some ideas yourself!
134172
We have work to do at all levels, from difficult core changes to usability enhancements in kubectl, which could be picked up by newcomers.

0 commit comments

Comments
 (0)