Blog post for DRA updates in 1.36 by mortent · Pull Request #54567 · kubernetes/website

mortent · 2026-02-20T21:37:24Z

Description

This is a PR for the blog post covering DRA updates for 1.36. We plan a single blog post covering all DRA updates rather than individual blog posts for each feature.

Issue

netlify · 2026-02-20T21:48:05Z

✅ Pull request preview available for checking

Built without sensitive environment variables

Name	Link
🔨 Latest commit	`a1ea462`
🔍 Latest deploy log	https://app.netlify.com/projects/kubernetes-io-main-staging/deploys/69d998c32d35160007ed2ef4
😎 Deploy Preview	https://deploy-preview-54567--kubernetes-io-main-staging.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

lmktfy · 2026-02-21T11:14:36Z

/area blog

lmktfy · 2026-02-21T11:16:11Z

This PR should target main (all PRs that add blog articles should target main)

nmn3m · 2026-02-25T00:38:48Z

/cc @nmn3m

harche · 2026-03-09T18:13:47Z

Hi @mortent, we're planning to fold our Resource Health Status feature (KEP-4680) into this umbrella blog post instead of maintaining a separate one (#54534).

KEP-4680 is reaching Beta in v1.36. It exposes device health information from Device Plugin and DRA in Pod Status. Let us know if you'd like us to contribute a section or provide any input for the post.

mortent · 2026-03-17T22:27:47Z

/wg device-management

SergeyKanzhelev · 2026-03-26T21:31:28Z

content/en/blog/_posts/2026/dra-136-update.md

+author: >
+  The DRA team
+---
+


it will be great to include some information on adoption and gaps still left comparing to Device Plugin. Maybe a couple of words on available DRA drivers. So end users may make sense of this blog post.

Added a little section about the availability of drivers. I'm a little worried that by mentioning some drivers here, we might be forgetting others that also should be included. But I can ask in the device management chat if someone knows about other drivers that should be included.

I need to think a bit more about the gaps vs Device Plugin.

I can't think of anything that can be done by a Device Plugin that cannot also be done with a DRA driver. Resource Health Status may have been the last remaining gap.

is there auto-tainting of "broken" devices? Yes, it seems like the health tracking is the last thing

everpeace · 2026-03-26T21:45:46Z

content/en/blog/_posts/2026/dra-136-update.md

+more optimal scheduling decisions. To support this capability, the ResourceSlice
+controller toolkit now automatically generates names that reflect the exact device
+ordering specified by the driver author.
+


I want to include kubernetes/enhancements#5491 if it's worth putting in the feature blog.

ref: docs PR is #54561

Suggested change

**List Types for Attributes**

With

[List Types for Attributes](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#list-type-attributes),

DRA can represent device attributes as typed lists (int, bool, string, and

version), not just scalar values. This helps model real hardware topology, such

as devices that belong to multiple PCIe roots or NUMA domains.

This feature also extends `ResourceClaim` constraint behavior to work naturally

with both scalar and list values: `matchAttribute` now checks for a non-empty

intersection, and `distinctAttribute` checks for pairwise disjoint values.

It also introduces `includes()` function in DRA CEL, which lets device selectors keep working

more easily when an attribute changes between scalar and list representations.

Sorry I forgot this one, it is definitely worth including. Added your suggestion.

Similar to my comment on #54567 (comment), do you think we could make it a bit more focused on just the benefits of the feature and leave some of the details to the DRA documentation? And see if we can keep it to a single paragraph?

@everpeace Could you take a look at updating the description to align a bit more with the other features, ref my previous comment?

lmktfy · 2026-03-26T22:27:58Z

content/en/blog/_posts/2026/dra-136-update.md

@@ -0,0 +1,134 @@
+---
+layout: blog
+title: "Kubernetes v1.36: DRA has graduated to GA"


Isn't it already GA?

Yeah, just forgot to update this when I used the template from a previous post. I've updated the title now, but open to better alternatives.

harche · 2026-03-27T14:13:32Z

content/en/blog/_posts/2026/dra-136-update.md

+devices or FPGAs—are fully prepared. By explicitly modeling resource readiness, this
+prevents premature assignments that can lead to Pod failures, ensuring a much more robust
+and predictable deployment process.
+


I want to include kubernetes/enhancements#4680 in the feature blog.

ref: docs PR is #54420

Suggested change

**Resource Health Status (Beta)**

Knowing when a device has failed or become unhealthy is critical for

workloads running on specialized hardware. With

[Resource Health Status](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#device-health-monitoring),

Kubernetes now exposes device health information directly in the Pod

Status through the `allocatedResourcesStatus` field. When a DRA driver

detects that an allocated device has become unhealthy, it reports this

back to the kubelet, which surfaces it in each container's status.

In 1.36, the feature graduates to beta (enabled by default) and adds

an optional `message` field providing human readable context about the

health status, such as error details or failure reasons. DRA drivers

can also configure per device health check timeouts, allowing different

hardware types to use appropriate timeout values based on their

health reporting characteristics. This gives users and controllers

crucial visibility to quickly identify and react to hardware failures.

So I've added your proposal for now, but do you think we can shorten it a bit and make it just one paragraph? There is a large number of features and we don't want the blog post to be too long. Focus just on the benefits of this feature and what it enables and leave the details to the DRA docs which we link to. Also, including that it is graduating to beta in 1.36 is already given from the context.

Sorry I forgot to add this in the first draft, it is of course something we should include in the blog.

@harche Could you take a look at this? Currently the description of this feature gets into quite a bit more detail than the other descriptions and I think some of it can be left to the documentation.

thanks @mortent , does this look better ?#54567 (comment)

content/en/blog/_posts/2026/dra-136-update.md

lmktfy · 2026-03-31T18:15:10Z

/remove-area localization
/remove-language ja
/remove-language ko
/remove-language pl
/remove-language zh

content/en/blog/_posts/2026/dra-136-update.md

pohly

Overall looks good to me, thanks for putting this together.

content/en/blog/_posts/2026/dra-136-update.md

pohly · 2026-04-07T09:48:22Z

content/en/blog/_posts/2026/dra-136-update.md

+[NVIDIA GPU](https://github.com/NVIDIA/k8s-dra-driver-gpu)
+and Google TPU DRA drivers are being transferred to the Kubernetes project, joining the
+[DRANET](https://github.com/kubernetes-sigs/dranet)
+driver that was added last year.


Calling those out seems reasonable for a blog post because this is newsworthy.

We could link to https://github.com/kubernetes-sigs/wg-device-management/tree/main/device-ecosystem but I'll defer to SIG Docs about that.

content/en/blog/_posts/2026/dra-136-update.md

pohly · 2026-04-07T09:51:12Z

content/en/blog/_posts/2026/dra-136-update.md

+Why should DRA only be for external accelerators? In v1.36, we are introducing the first
+iterations of using the DRA API to manage Kubernetes native resources (like CPU and
+memory). By bringing CPU and memory allocation under the DRA umbrella with the DRA
+[Native Resources](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#node-allocatable-resources)


Should this be called "Node Allocatable Resources" instead of "Native Resources"?

cc @pravk03

I think I used the name from the KEP, but I see that #54598 uses the term node allocatable resources. @pravk03 I assume node allocatable resources is the preferred name here? And do we spell that with a hyphen or not (the docs seems inconsistent on this)?

Yes, we have renamed this to node allocatable resources.

Following the convention in the Node Allocatable and Resource Management documentation, we can omit the hyphen. I’ll update the the KEP and docs to keep it consistent.

content/en/blog/_posts/2026/dra-136-update.md

SergeyKanzhelev · 2026-04-07T18:01:27Z

content/en/blog/_posts/2026/dra-136-update.md

+This allows for a gradual transition to DRA, meaning application developers and
+operators are not forced to immediately migrate their workloads to the ResourceClaim
+API.


Operators here mean DevOps? I am wondering if we need to highlight that this is only a consumption API. All the management and monitoring must be done differently with DRA.

Section has been updated.

SergeyKanzhelev · 2026-04-07T18:03:04Z

content/en/blog/_posts/2026/dra-136-update.md

+[Extended Resource](https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#extended-resource)
+feature allows users to request resources via traditional extended resources on a Pod.
+This allows for a gradual transition to DRA, meaning application developers and
+operators are not forced to immediately migrate their workloads to the ResourceClaim


slight rephrase of "not forced to" to something like "continue using familiar API while exploring all the benefits of a new API" would be better.

I've rewritten this section a bit. Let me know if you think it looks better now.

SergeyKanzhelev · 2026-04-07T18:08:08Z

content/en/blog/_posts/2026/dra-136-update.md

+[Partitionable Devices](https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#partitionable-devices)
+feature, provides native DRA support for carving physical hardware into smaller,
+logical instances (such as Multi-Instance GPUs). This allows administrators to
+safely and efficiently share expensive accelerators across multiple Pods.


I think the key here is that it is dynamic, while safe and efficient. So the partitioning can dynamically change based on workload demands

Agree. I've reworked the section a little bit.

content/en/blog/_posts/2026/dra-136-update.md

SergeyKanzhelev · 2026-04-07T18:13:06Z

content/en/blog/_posts/2026/dra-136-update.md

+controller toolkit now automatically generates names that reflect the exact device
+ordering specified by the driver author.
+
+## What’s next?


should we say that the big priority is to migrate community to DRA? And also make it a call to action?

Good point. I've added a small section for this.

alaypatel07 · 2026-04-10T13:35:49Z

content/en/blog/_posts/2026/dra-136-update.md

+health reporting characteristics. This gives users and controllers crucial visibility
+to quickly identify and react to hardware failures.
+
+## New Features


kubernetes/enhancements#5304

This enhancement was add in 1.36, I wonder if this section should contain a sub-section for it.

cc @pohly

Let's add it. Can you suggest something?

Sure, added a suggestion here: https://github.com/kubernetes/website/pull/54567/changes#r3076822230

harche · 2026-04-13T18:26:53Z

content/en/blog/_posts/2026/dra-136-update.md

+**Resource Health Status (Beta)**
+
+Knowing when a device has failed or become unhealthy is critical for workloads running on
+specialized hardware. With
+[Resource Health Status](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#device-health-monitoring),
+Kubernetes now exposes device health information directly in the Pod Status through the
+`allocatedResourcesStatus` field. When a DRA driver detects that an allocated device
+has become unhealthy, it reports this back to the kubelet, which surfaces it in each
+container's status.
+
+In 1.36, the feature graduates to beta (enabled by default) and adds an optional `message`
+field providing human readable context about the health status, such as error details or
+failure reasons. DRA drivers can also configure per device health check timeouts,
+allowing different hardware types to use appropriate timeout values based on their
+health reporting characteristics. This gives users and controllers crucial visibility
+to quickly identify and react to hardware failures.


Suggested change

**Resource Health Status (Beta)**

Knowing when a device has failed or become unhealthy is critical for workloads running on

specialized hardware. With

[Resource Health Status](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#device-health-monitoring),

Kubernetes now exposes device health information directly in the Pod Status through the

`allocatedResourcesStatus` field. When a DRA driver detects that an allocated device

has become unhealthy, it reports this back to the kubelet, which surfaces it in each

container's status.

In 1.36, the feature graduates to beta (enabled by default) and adds an optional `message`

field providing human readable context about the health status, such as error details or

failure reasons. DRA drivers can also configure per device health check timeouts,

allowing different hardware types to use appropriate timeout values based on their

health reporting characteristics. This gives users and controllers crucial visibility

to quickly identify and react to hardware failures.

**Resource Health Status (Beta)**

Knowing when a device has failed or become unhealthy is critical for workloads running on

specialized hardware. With

[Resource Health Status](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#device-health-monitoring),

Kubernetes now exposes device health information directly in Pod Status, giving users and

controllers crucial visibility to quickly identify and react to hardware failures. In 1.36,

the feature graduates to beta (enabled by default) and adds support for human readable

health status messages, making it easier to diagnose issues without diving into driver logs.

alaypatel07 · 2026-04-14T02:32:49Z

content/en/blog/_posts/2026/dra-136-update.md

+controller toolkit now automatically generates names that reflect the exact device
+ordering specified by the driver author.
+
+## What’s next?


Suggested change

## What’s next?

**Discoverable Device Metadata in Containers**

Workloads running on with DRA devices often need to discover details about

their allocated devices, such as PCI bus addresses or network

interface configuration, without querying the Kubernetes API. With

[DRA Device Metadata](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#device-metadata),

Kubernetes defines a standard protocol for how DRA drivers expose device

attributes to containers as versioned JSON files at well-known paths. Drivers

built with the

[DRA kubelet plugin library](https://pkg.go.dev/k8s.io/dynamic-resource-allocation/kubeletplugin)

get this behavior transparently; they just provide the metadata and the

library handles file layout, CDI bind-mounts, versioning, and lifecycle. This

gives applications a consistent, driver-independent way to discover and

consume device metadata, eliminating the need for custom controllers or

looking up the of ResourceSlice objects to get metadata via attributes.

## What’s next?

k8s-ci-robot added this to the 1.36 milestone Feb 20, 2026

k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Feb 20, 2026

k8s-ci-robot added the area/blog Issues or PRs related to the Kubernetes Blog subproject label Feb 21, 2026

mortent mentioned this pull request Feb 23, 2026

DRA: Handle extended resource requests via DRA Driver kubernetes/enhancements#5004

Open

12 tasks

k8s-ci-robot requested a review from nmn3m February 25, 2026 00:38

nmn3m mentioned this pull request Mar 3, 2026

DRA: Resource Availability Visibility kubernetes/enhancements#5677

Open

6 tasks

This was referenced Mar 7, 2026

[WIP] Add HPA fallback external metrics blog #54649

Draft

Document DRA Device Binding Conditions in v1.36 #54541

Merged

harche mentioned this pull request Mar 9, 2026

Blog: KEP-4680 Resource Health Status reaches Beta in v1.36 #54534

Closed

mortent force-pushed the DRABlog136 branch from d44bbd2 to 9b73102 Compare March 17, 2026 22:26

mortent changed the base branch from dev-1.36 to main March 17, 2026 22:26

k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Mar 17, 2026

SergeyKanzhelev reviewed Mar 26, 2026

View reviewed changes

everpeace reviewed Mar 26, 2026

View reviewed changes

lmktfy reviewed Mar 26, 2026

View reviewed changes

harche reviewed Mar 27, 2026

View reviewed changes

nmn3m reviewed Mar 28, 2026

View reviewed changes

content/en/blog/_posts/2026/dra-136-update.md Show resolved Hide resolved

ttsuuubasa reviewed Mar 31, 2026

View reviewed changes

content/en/blog/_posts/2026/dra-136-update.md Show resolved Hide resolved

bart0sh reviewed Mar 31, 2026

View reviewed changes

content/en/blog/_posts/2026/dra-136-update.md Outdated Show resolved Hide resolved

Addressed comments

b0eea65

mortent force-pushed the DRABlog136 branch from 2c6595c to b0eea65 Compare April 1, 2026 00:39

nojnhuh reviewed Apr 1, 2026

View reviewed changes

troychiu reviewed Apr 1, 2026

View reviewed changes

content/en/blog/_posts/2026/dra-136-update.md Outdated Show resolved Hide resolved

Addressed comments

b89223c

pohly moved this from 🏗 In progress to 👀 In review in Dynamic Resource Allocation Apr 7, 2026

pohly suggested changes Apr 7, 2026

View reviewed changes

nojnhuh reviewed Apr 7, 2026

View reviewed changes

content/en/blog/_posts/2026/dra-136-update.md Outdated Show resolved Hide resolved

SergeyKanzhelev reviewed Apr 7, 2026

View reviewed changes

content/en/blog/_posts/2026/dra-136-update.md Outdated Show resolved Hide resolved

SergeyKanzhelev reviewed Apr 7, 2026

View reviewed changes

alaypatel07 reviewed Apr 10, 2026

View reviewed changes

Addressed comments

a1ea462

harche reviewed Apr 13, 2026

View reviewed changes

alaypatel07 reviewed Apr 14, 2026

View reviewed changes

+**List Types for Attributes**
+With
+[List Types for Attributes](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#list-type-attributes),
+DRA can represent device attributes as typed lists (int, bool, string, and
+version), not just scalar values. This helps model real hardware topology, such
+as devices that belong to multiple PCIe roots or NUMA domains.
+This feature also extends `ResourceClaim` constraint behavior to work naturally
+with both scalar and list values: `matchAttribute` now checks for a non-empty
+intersection, and `distinctAttribute` checks for pairwise disjoint values.
+It also introduces `includes()` function in DRA CEL, which lets device selectors keep working
+more easily when an attribute changes between scalar and list representations.

+**Resource Health Status (Beta)**
+Knowing when a device has failed or become unhealthy is critical for
+workloads running on specialized hardware. With
+[Resource Health Status](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#device-health-monitoring),
+Kubernetes now exposes device health information directly in the Pod
+Status through the `allocatedResourcesStatus` field. When a DRA driver
+detects that an allocated device has become unhealthy, it reports this
+back to the kubelet, which surfaces it in each container's status.
+In 1.36, the feature graduates to beta (enabled by default) and adds
+an optional `message` field providing human readable context about the
+health status, such as error details or failure reasons. DRA drivers
+can also configure per device health check timeouts, allowing different
+hardware types to use appropriate timeout values based on their
+health reporting characteristics. This gives users and controllers
+crucial visibility to quickly identify and react to hardware failures.

-## What’s next?
+**Discoverable Device Metadata in Containers**
+Workloads running on with DRA devices often need to discover details about
+their allocated devices, such as PCI bus addresses or network
+interface configuration, without querying the Kubernetes API. With
+[DRA Device Metadata](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#device-metadata),
+Kubernetes defines a standard protocol for how DRA drivers expose device
+attributes to containers as versioned JSON files at well-known paths. Drivers
+built with the
+[DRA kubelet plugin library](https://pkg.go.dev/k8s.io/dynamic-resource-allocation/kubeletplugin)
+get this behavior transparently; they just provide the metadata and the
+library handles file layout, CDI bind-mounts, versioning, and lifecycle. This
+gives applications a consistent, driver-independent way to discover and
+consume device metadata, eliminating the need for custom controllers or
+looking up the of ResourceSlice objects to get metadata via attributes.
+## What’s next?

Conversation

mortent commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Issue

Uh oh!

netlify bot commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Pull request preview available for checking

Uh oh!

lmktfy commented Feb 21, 2026

Uh oh!

lmktfy commented Feb 21, 2026

Uh oh!

nmn3m commented Feb 25, 2026

Uh oh!

harche commented Mar 9, 2026

Uh oh!

mortent commented Mar 17, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

everpeace Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lmktfy commented Mar 31, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pohly left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mortent commented Feb 20, 2026 •

edited

Loading

netlify bot commented Feb 20, 2026 •

edited

Loading

everpeace Mar 26, 2026 •

edited

Loading

alaypatel07 Apr 14, 2026 •

edited

Loading