@@ -599,7 +599,7 @@ extended resource backed by DRA requests.
599599This registers all cluster events that might make an unschedulable pod schedulable,
600600like finishing the allocation of a claim, or resource slice updates.
601601
602- The existing dynamicresource plugin has registered almost all the events needed or
602+ The existing dynamicresource plugin has registered almost all the events needed for
603603extended resource backed by DRA, with one addition `framework.UpdateNodeAllocatable`
604604for node action.
605605
@@ -969,7 +969,13 @@ Ideally, this should be a metric. Operations against the Kubernetes API (e.g.,
969969checking if there are objects with field X set) may be a last resort. Avoid
970970logs or events for this purpose.
971971-->
972- Will be considered for beta.
972+ ` kube_pod_resource_limit` and `kube_pod_resource_request`
973+ (label : ` namespace` , `pod`, `node`, `scheduler`, `priority`, **`resource`**, `unit`)
974+ can be used to determine if the feature is in use by workloads though it doesn't differentiate
975+ between extended resources backed by DRA or device plugin.
976+
977+ `resourceclaim_controller_resource_claims` (label : ` admin_access` , `allocated`, `source`)
978+ should be a good metric to determine if the resource claim is created by extended resource backed by DRA.
973979
974980# ##### How can someone using this feature know that it is working for their instance?
975981
@@ -989,7 +995,9 @@ Recall that end users cannot usually observe component logs or access metrics.
989995- [ ] Other (treat as last resort)
990996 - Details :
991997-->
992- Will be considered for beta.
998+ - [ x ] API .status
999+ - Other field : ` .status.extendedResourceClaimStatus` will have a list of resource claims that are created for
1000+ DRA extended resources.
9931001
9941002# ##### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
9951003
@@ -1007,7 +1015,8 @@ high level (needs more precise definitions) those may be things like:
10071015These goals will help you determine what you need to measure (SLIs) in the next
10081016question.
10091017-->
1010- Will be considered for beta.
1018+ Existing DRA and related SLOs continue to apply.
1019+ Pod scheduling duration with this feature should be as fast as existing DRA.
10111020
10121021# ##### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
10131022
@@ -1021,15 +1030,22 @@ Pick one more of these and delete the rest.
10211030- [ ] Other (treat as last resort)
10221031 - Details :
10231032-->
1024- Will be considered for beta.
1033+ These are the same as for the main DRA feature :
1034+
1035+ - [x] Metrics
1036+ - Metric name : resourceclaim_controller_creates_total
1037+ - Metric name : resourceclaim_controller_resource_claims
1038+ - Metric name : workqueue with name="resource_claim"
1039+ - Metric name : scheduler_pending_pods
1040+ - Metric name : scheduler_plugin_execution_duration_seconds
10251041
10261042# ##### Are there any missing metrics that would be useful to have to improve observability of this feature?
10271043
10281044<!--
10291045Describe the metrics themselves and the reasons why they weren't added (e.g., cost,
10301046implementation difficulties, etc.).
10311047-->
1032- Will be considered for beta.
1048+ No
10331049
10341050# ## Dependencies
10351051
0 commit comments