kubernetes · mortent · Mar 17, 2026 · Apr 1, 2026 · Apr 6, 2026 · Apr 11, 2026
diff --git a/content/en/blog/_posts/2026/dra-136-update.md b/content/en/blog/_posts/2026/dra-136-update.md
@@ -0,0 +1,177 @@
+---
+layout: blog
+title: "Kubernetes v1.36: More Drivers, New Features, and the Next Era of DRA"
+slug: dra-136-updates
+draft: true
+date: XXXX-XX-XX
+author: >
+  The DRA team
+---
+
+Dynamic Resource Allocation (DRA) has fundamentally changed how we handle hardware
-Dynamic Resource Allocation (DRA) has fundamentally changed how we handle hardware
+Dynamic Resource Allocation (DRA) has fundamentally changed how platform administrators can handle hardware
-Dynamic Resource Allocation (DRA) has fundamentally changed how we handle hardware
+Dynamic Resource Allocation (DRA) has fundamentally changed how platform administrators can handle hardware
+accelerators and specialized resources in Kubernetes. In the v1.36 release, DRA
+continues to mature, bringing a wave of feature graduations, critical usability
+improvements, and new capabilities that extend the flexibility of DRA to native
+resources like memory and CPU, and support for ResourceClaims in PodGroups.
+
+We have also seen significant momentum in driver availability. Both the
+[NVIDIA GPU](https://github.com/NVIDIA/k8s-dra-driver-gpu)
+and Google TPU DRA drivers are being transferred to the Kubernetes project, joining the
+[DRANET](https://github.com/kubernetes-sigs/dranet)
+driver that was added last year.
+
+Whether you are managing massive fleets of GPUs, need better handling of failures,
+or simply looking for better ways to define resource fallback options, the upgrades
+to DRA in 1.36 have something for you. Let's dive into the new features and graduations!
+
+## Feature graduations
+
+The community has been hard at work stabilizing core DRA concepts. In Kubernetes 1.36,
+several highly anticipated features have graduated to Beta and Stable.
+
+**Prioritized List (Stable)**
+
+Hardware heterogeneity is a reality in most clusters. With the
+[Prioritized List](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#prioritized-list)
+feature, you can confidently define fallback preferences when requesting
+devices. Instead of hardcoding a request for a specific device model, you can specify an
+ordered list of preferences (e.g., "Give me an H100, but if none are available, fall back
+to an A100"). The scheduler will evaluate these requests in order, drastically improving
+scheduling flexibility and cluster utilization.
+
+**Extended Resource Support (Beta)**
+
+As DRA becomes the standard for resource allocation, bridging the gap with legacy systems
+is crucial. The DRA
+[Extended Resource](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#extended-resource)
+feature allows users to request resources via traditional extended resources on a Pod.
+This allows for a gradual transition to DRA, meaning cluster operators can migrate clusters
+to DRA but let application developers adopt the ResourceClaim API on their own schedule.
+
+**Partitionable Devices (Beta)**
+
+Hardware accelerators are powerful, and sometimes a single workload doesn't need an
+entire device. The
+[Partitionable Devices](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#partitionable-devices)
+feature, provides native DRA support for dynamically carving physical hardware into smaller,
+logical instances (such as Multi-Instance GPUs) based on workload demands. This allows
+administrators to safely and efficiently share expensive accelerators across multiple Pods.
+
+**Device Taints (Beta)**
+
+Just as you can taint a Kubernetes Node, you can now apply taints directly to specific DRA
+devices.
+[Device Taints and Tolerations](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#device-taints-and-tolerations)
+empower cluster administrators to manage hardware more effectively. You can taint faulty
+devices to prevent them from being allocated to standard claims, or reserve specific hardware
+for dedicated teams, specialized workloads, and experiments. Ultimately, only Pods with
+matching tolerations are permitted to claim these tainted devices.
+
+**Device Binding Conditions (Beta)**
+
+To improve scheduling reliability, the Kubernetes scheduler can now use the
+[Binding Conditions](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#device-taints-and-tolerations)
+feature to delay committing a Pod to a Node until its required external resources—such as attachable
+devices or FPGAs—are fully prepared. By explicitly modeling resource readiness, this
+prevents premature assignments that can lead to Pod failures, ensuring a much more robust
+and predictable deployment process.
+
-
+
+**Resource Health Status (Beta)**
+
+Knowing when a device has failed or become unhealthy is critical for
+workloads running on specialized hardware. With
+[Resource Health Status](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#device-health-monitoring),
+Kubernetes now exposes device health information directly in the Pod
+Status through the `allocatedResourcesStatus` field. When a DRA driver
+detects that an allocated device has become unhealthy, it reports this
+back to the kubelet, which surfaces it in each container's status.
+
+In 1.36, the feature graduates to beta (enabled by default) and adds
+an optional `message` field providing human readable context about the
+health status, such as error details or failure reasons. DRA drivers
+can also configure per device health check timeouts, allowing different
+hardware types to use appropriate timeout values based on their
+health reporting characteristics. This gives users and controllers
+crucial visibility to quickly identify and react to hardware failures.
-
+
+**Resource Health Status (Beta)**
+
+Knowing when a device has failed or become unhealthy is critical for
+workloads running on specialized hardware. With
+[Resource Health Status](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#device-health-monitoring),
+Kubernetes now exposes device health information directly in the Pod
+Status through the `allocatedResourcesStatus` field. When a DRA driver
+detects that an allocated device has become unhealthy, it reports this
+back to the kubelet, which surfaces it in each container's status.
+
+In 1.36, the feature graduates to beta (enabled by default) and adds
+an optional `message` field providing human readable context about the
+health status, such as error details or failure reasons. DRA drivers
+can also configure per device health check timeouts, allowing different
+hardware types to use appropriate timeout values based on their
+health reporting characteristics. This gives users and controllers
+crucial visibility to quickly identify and react to hardware failures.
+**Resource Health Status (Beta)**
+
+Knowing when a device has failed or become unhealthy is critical for workloads running on
+specialized hardware. With
+[Resource Health Status](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#device-health-monitoring),
+Kubernetes now exposes device health information directly in the Pod Status through the
-Kubernetes now exposes device health information directly in the Pod Status through the
+Kubernetes can expose device health information directly in the Pod `.status`, through an entry within the
-Kubernetes now exposes device health information directly in the Pod Status through the
+Kubernetes can expose device health information directly in the Pod `.status`, through an entry within the
+`allocatedResourcesStatus` field. When a DRA driver detects that an allocated device
-`allocatedResourcesStatus` field. When a DRA driver detects that an allocated device
+`.status.allocatedResourcesStatus` field.
+
+When a compatible DRA driver detects that an allocated device
-`allocatedResourcesStatus` field. When a DRA driver detects that an allocated device
+`.status.allocatedResourcesStatus` field.
+
+When a compatible DRA driver detects that an allocated device
+has become unhealthy, it reports this back to the kubelet, which surfaces it in each
+container's status.
+
+In 1.36, the feature graduates to beta (enabled by default) and adds an optional `message`
-In 1.36, the feature graduates to beta (enabled by default) and adds an optional `message`
+In 1.36, the feature has graduated to beta (and is now enabled by default). There was
+a small change from alpha, adding an optional `message`
-In 1.36, the feature graduates to beta (enabled by default) and adds an optional `message`
+In 1.36, the feature has graduated to beta (and is now enabled by default). There was
+a small change from alpha, adding an optional `message`
+field providing human readable context about the health status, such as error details or
+failure reasons. DRA drivers can also configure per device health check timeouts,
+allowing different hardware types to use appropriate timeout values based on their
+health reporting characteristics. This gives users and controllers crucial visibility
+to quickly identify and react to hardware failures.
-**Resource Health Status (Beta)**
-
-Knowing when a device has failed or become unhealthy is critical for workloads running on
-specialized hardware. With
-[Resource Health Status](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#device-health-monitoring),
-Kubernetes now exposes device health information directly in the Pod Status through the
-`allocatedResourcesStatus` field. When a DRA driver detects that an allocated device
-has become unhealthy, it reports this back to the kubelet, which surfaces it in each
-container's status.
-
-In 1.36, the feature graduates to beta (enabled by default) and adds an optional `message`
-field providing human readable context about the health status, such as error details or
-failure reasons. DRA drivers can also configure per device health check timeouts,
-allowing different hardware types to use appropriate timeout values based on their
-health reporting characteristics. This gives users and controllers crucial visibility
-to quickly identify and react to hardware failures.
+**Resource Health Status (Beta)**
+
+Knowing when a device has failed or become unhealthy is critical for workloads running on
+specialized hardware. With
+[Resource Health Status](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#device-health-monitoring),
+Kubernetes now exposes device health information directly in Pod Status, giving users and
+controllers crucial visibility to quickly identify and react to hardware failures. In 1.36,
+the feature graduates to beta (enabled by default) and adds support for human readable
+health status messages, making it easier to diagnose issues without diving into driver logs.
-**Resource Health Status (Beta)**
-
-Knowing when a device has failed or become unhealthy is critical for workloads running on
-specialized hardware. With
-[Resource Health Status](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#device-health-monitoring),
-Kubernetes now exposes device health information directly in the Pod Status through the
-`allocatedResourcesStatus` field. When a DRA driver detects that an allocated device
-has become unhealthy, it reports this back to the kubelet, which surfaces it in each
-container's status.
-
-In 1.36, the feature graduates to beta (enabled by default) and adds an optional `message`
-field providing human readable context about the health status, such as error details or
-failure reasons. DRA drivers can also configure per device health check timeouts,
-allowing different hardware types to use appropriate timeout values based on their
-health reporting characteristics. This gives users and controllers crucial visibility
-to quickly identify and react to hardware failures.
+**Resource Health Status (Beta)**
+
+Knowing when a device has failed or become unhealthy is critical for workloads running on
+specialized hardware. With
+[Resource Health Status](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#device-health-monitoring),
+Kubernetes now exposes device health information directly in Pod Status, giving users and
+controllers crucial visibility to quickly identify and react to hardware failures. In 1.36,
+the feature graduates to beta (enabled by default) and adds support for human readable
+health status messages, making it easier to diagnose issues without diving into driver logs.
+
+## New Features
+
+Beyond stabilizing existing capabilities, v1.36 introduces foundational new features
+that expand what DRA can do.
+
+**ResourceClaim Support for Workloads**
+
+To optimize large-scale AI/ML workloads that rely on strict topological scheduling, the 
+[ResourceClaim Support for Workloads](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#workload-resourceclaims)
+feature enables Kubernetes to seamlessly manage shared resources across massive sets
+of Pods. By associating ResourceClaims or ResourceClaimTemplates with PodGroups,
+this feature eliminates previous scaling bottlenecks, such as the limit on the
+number of pods that can share a claim, and removes the burden of manual claim
+management from specialized orchestrators.
+
+**DRA for Native Resources**
+
+Why should DRA only be for external accelerators? In v1.36, we are introducing the first
+iteration of using the DRA API to manage Kubernetes native resources (like CPU and
-iteration of using the DRA API to manage Kubernetes native resources (like CPU and
+iteration of using the DRA APIs to manage 
+_node allocatable_ infrastructure resources (such as CPU and
-iteration of using the DRA API to manage Kubernetes native resources (like CPU and
+iteration of using the DRA APIs to manage 
+_node allocatable_ infrastructure resources (such as CPU and
+memory). By bringing CPU and memory allocation under the DRA umbrella with the DRA
+[Native Resources](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#node-allocatable-resources)
+feature, users can leverage DRA's advanced placement, NUMA-awareness, and prioritization
+semantics for standard compute resources, paving the way for incredibly fine-grained
+performance tuning.
+
+**DRA Resource Availability Visibility**
+
+One of the most requested features from cluster administrators has been better visibility
+into hardware capacity. The new
+[DRAResourcePoolStatus](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#resource-pool-status)
+feature allows you to query the availability of devices in DRA resource pools. By creating a
+`ResourcePoolStatusRequest` object, you get a point-in-time snapshot of device counts
+— total, allocated, available, and unavailable — for each pool managed by a given
+driver. This enables better integration with dashboards and capacity planning tools.
+
+**List Types for Attributes**
+
+With
+[List Types for Attributes](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#list-type-attributes),
+DRA can represent device attributes as typed lists (`ints`, `bools`, `strings`, and `versions`), not
+just scalar values. This helps model real hardware topology, such as devices that belong
+to multiple PCIe roots or NUMA domains.
+
+This feature also extends ResourceClaim constraint behavior to work naturally
+with both scalar and list values: `matchAttribute` now checks for a non-empty
+intersection, and `distinctAttribute` checks for pairwise disjoint values.
+It also introduces `includes()` function in DRA CEL, which lets device selectors keep working
+more easily when an attribute changes between scalar and list representations.
+
+**Deterministic Device Selection via Lexicographical Sorting**
+
+The Kubernetes scheduler has been updated to evaluate devices using lexicographical
+ordering based on resource pool and ResourceSlice names. This change empowers drivers
+to proactively influence the scheduling process, leading to improved throughput and
+more optimal scheduling decisions. To support this capability, the ResourceSlice
+controller toolkit now automatically generates names that reflect the exact device
+ordering specified by the driver author.
+
-
+
+**List Types for Attributes**
+
+With
+[List Types for Attributes](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#list-type-attributes),
+DRA can represent device attributes as typed lists (int, bool, string, and
+version), not just scalar values. This helps model real hardware topology, such
+as devices that belong to multiple PCIe roots or NUMA domains.
+
+This feature also extends `ResourceClaim` constraint behavior to work naturally
+with both scalar and list values: `matchAttribute` now checks for a non-empty
+intersection, and `distinctAttribute` checks for pairwise disjoint values.
+It also introduces `includes()` function in DRA CEL, which lets device selectors keep working
+more easily when an attribute changes between scalar and list representations.
-
+
+**List Types for Attributes**
+
+With
+[List Types for Attributes](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#list-type-attributes),
+DRA can represent device attributes as typed lists (int, bool, string, and
+version), not just scalar values. This helps model real hardware topology, such
+as devices that belong to multiple PCIe roots or NUMA domains.
+
+This feature also extends `ResourceClaim` constraint behavior to work naturally
+with both scalar and list values: `matchAttribute` now checks for a non-empty
+intersection, and `distinctAttribute` checks for pairwise disjoint values.
+It also introduces `includes()` function in DRA CEL, which lets device selectors keep working
+more easily when an attribute changes between scalar and list representations.
+## What’s next?
-## What’s next?
+**Discoverable Device Metadata in Containers**
+
+Workloads running on with DRA devices often need to discover details about
+their allocated devices, such as PCI bus addresses or network
+interface configuration, without querying the Kubernetes API. With
+[DRA Device Metadata](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#device-metadata),
+Kubernetes defines a standard protocol for how DRA drivers expose device
+attributes to containers as versioned JSON files at well-known paths. Drivers
+built with the
+[DRA kubelet plugin library](https://pkg.go.dev/k8s.io/dynamic-resource-allocation/kubeletplugin)
+get this behavior transparently; they just provide the metadata and the
+library handles file layout, CDI bind-mounts, versioning, and lifecycle. This
+gives applications a consistent, driver-independent way to discover and
+consume device metadata, eliminating the need for custom controllers or
+looking up the of ResourceSlice objects to get metadata via attributes.
+
+## What’s next?
-## What’s next?
+**Discoverable Device Metadata in Containers**
+
+Workloads running on with DRA devices often need to discover details about
+their allocated devices, such as PCI bus addresses or network
+interface configuration, without querying the Kubernetes API. With
+[DRA Device Metadata](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#device-metadata),
+Kubernetes defines a standard protocol for how DRA drivers expose device
+attributes to containers as versioned JSON files at well-known paths. Drivers
+built with the
+[DRA kubelet plugin library](https://pkg.go.dev/k8s.io/dynamic-resource-allocation/kubeletplugin)
+get this behavior transparently; they just provide the metadata and the
+library handles file layout, CDI bind-mounts, versioning, and lifecycle. This
+gives applications a consistent, driver-independent way to discover and
+consume device metadata, eliminating the need for custom controllers or
+looking up the of ResourceSlice objects to get metadata via attributes.
+
+## What’s next?
+
+This cycle introduced a wealth of new Dynamic Resource Allocation (DRA) features,
+and the momentum is only building. As we look ahead, our roadmap focuses on maturing
+existing features toward beta and stable releases while hardening DRA’s performance,
+scalability, and reliability. A key priority over the coming cycles will be deep
+integration with Workload-Aware and Topology-Aware Scheduling.
-integration with Workload-Aware and Topology-Aware Scheduling.
+integration with _workload-aware scheduling_ and with _topology-aware scheduling_.
-integration with Workload-Aware and Topology-Aware Scheduling.
+integration with _workload-aware scheduling_ and with _topology-aware scheduling_.
+
+A big goal for us is to migrate the entire community to DRA, and we want
+you involved. Whether you are currently maintaining a driver or are just beginning
+to explore the possibilities, your input is vital. Partner with us to shape the next
+generation of resource management. Reach out today to collaborate on development,
+share feedback, or start building your first DRA driver.
+
+
+## Getting involved
+
+A good starting point is joining the WG Device Management 
+[Slack channel](https://kubernetes.slack.com/archives/C0409NGC1TK) and
+[meetings](https://docs.google.com/document/d/1qxI87VqGtgN7EAJlqVfxx86HGKEAc2A3SKru8nJHNkQ/edit?tab=t.0#heading=h.tgg8gganowxq),
+which happen at US/EU and EU/APAC friendly time slots.
+
+Not all enhancement ideas are tracked as issues yet, so come talk to us if you want to help or have some ideas yourself!
+We have work to do at all levels, from difficult core changes to usability enhancements in kubectl, which could be picked up by newcomers.