@@ -530,7 +530,7 @@ spec:
530530Provided that the device class gpu.example.com is mapped to the extended
531531resource example.com/gpu.
532532` ` ` yaml
533- apiVersion: resource.k8s.io/v1beta1
533+ apiVersion: resource.k8s.io/v1
534534kind: DeviceClass
535535metadata:
536536 name: gpu.example.com
@@ -1099,7 +1099,11 @@ and creating new ones, as well as about cluster-level services (e.g. DNS):
10991099 - Impact of its outage on the feature :
11001100 - Impact of its degraded performance or high-error rates on the feature :
11011101-->
1102- No.
1102+ The container runtime must support CDI.
1103+
1104+ A third-party DRA driver is required for publishing resource information and preparing resources on a node.
1105+
1106+ These are not new requirements from this feature, rather, they are required by DRA structured parameters.
11031107
11041108# ## Scalability
11051109
@@ -1146,10 +1150,14 @@ The Troubleshooting section currently serves the `Playbook` role. We may conside
11461150splitting it into a dedicated `Playbook` document (potentially with some monitoring
11471151details). For now, we leave it here.
11481152-->
1153+ The troubleshooting section in https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/4381-dra-structured-parameters#troubleshooting
1154+ still applies.
11491155
11501156# ##### How does this feature react if the API server and/or etcd is unavailable?
11511157
1152- Will be considered for beta.
1158+ The Kubernetes control plane will be down, so no new Pods get scheduled. kubelet may
1159+ still be able to start or or restart containers if it already received all the relevant
1160+ updates (Pod, ResourceClaim, etc.).
11531161
11541162# ##### What are other known failure modes?
11551163
@@ -1169,15 +1177,14 @@ For each of them, fill in the following information by copying the below templat
11691177 - Detection : inspect pod status 'Pending'
11701178 - Mitigations : reduce the number of devices requested in one extended resource backed by DRA requests
11711179 - Diagnostics : scheduler logs at level 5 show the reason for the scheduling failure.
1172- - Testing : Will be considered for beta .
1180+ - Testing : this is known, determinstic failure mode due to defined system limit, i.e., DRA requests must be no more than 128 devices .
11731181
11741182# ##### What steps should be taken if SLOs are not being met to determine the problem?
11751183
1176- Will be considered for beta.
1177-
11781184# # Implementation History
11791185
11801186- Kubernetes 1.34 : KEP accepted.
1187+ - Kubernetes 1.35 : promotion to beta.
11811188
11821189# # Drawbacks
11831190
0 commit comments