@@ -901,15 +901,37 @@ feature flags will be enabled on some API servers and not others during the
901901rollout. Similarly, consider large clusters and how enablement/disablement
902902will rollout across nodes.
903903-->
904- Will be considered for beta.
904+ Workloads that do not use the DRA Extended Resource feature should not be impacted,
905+ since the functionality is unchanged.
906+
907+ If the feature is being used in pods before support for it has been fully rolled out
908+ across the cluster, api server, scheduler in control plane, and kubelet in nodes, it
909+ can cause a failure to schedule pods or a failure to run the pods on the nodes.
910+ This will not affect already running workloads unless they have to be restarted.
911+
912+ Device plugin drivers can be replaced with DRA drivers for the same devices on a
913+ per-node basis, one node at a time.
905914
906915# ##### What specific metrics should inform a rollback?
907916
908917<!--
909918What signals should users be paying attention to when the feature is young
910919that might indicate a serious problem?
911920-->
912- Will be considered for beta.
921+ One indicator are unexpected restarts of the cluster control plane components
922+ (kube-scheduler, apiserver) or kubelet.
923+
924+ If the scheduler_pending_pods metric in the kube-scheduler suddenly increases, it can
925+ suggest that pods are no longer gettings scheduled which might be due to a problem with
926+ the DRA scheduler plugin. Another are an increase in the number of pods that fail to start,
927+ as indicated by the kubelet_started_containers_errors_total metric.
928+
929+ If the node.status.Capacity for the extended resources for the devices do not decrease to zero,
930+ or a pod fail to be scheduled, or run on the node, it may indicate that the device plugin driver
931+ on the node for the devices is not properly replaced by the DRA driver.
932+
933+ In all cases further analysis of logs and pod events is needed to determine whether
934+ errors are related to this feature.
913935
914936# ##### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
915937
@@ -918,14 +940,17 @@ Describe manual testing that was done and the outcomes.
918940Longer term, we may want to require automated upgrade/rollback tests, but we
919941are missing a bunch of machinery and tooling and can't do that now.
920942-->
921- Will be considered for beta.
943+ This will be covered by automated tests before transition to beta by bringing up a KinD cluster and
944+ changing the feature gate for individual components.
945+
946+ Roundtripping of API types is covered by unit tests.
922947
923948# ##### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
924949
925950<!--
926951Even if applying deprecation policies, they may still surprise some users.
927952-->
928- Will be considered for beta.
953+ No
929954
930955# ## Monitoring Requirements
931956
0 commit comments