You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: keps/sig-node/3695-pod-resources-for-dra/README.md
+24-18Lines changed: 24 additions & 18 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -102,7 +102,7 @@ Additionally, we propose adding a `Get()` method to the existing gRPC service
102
102
to allow querying specific pods for their allocated resources.
103
103
104
104
**Note:** The new `Get()` call is a strict subset of the `List()` call (which
105
-
returns the list of PodResources for *all* pods acrosss*all* namespaces in the
105
+
returns the list of PodResources for *all* pods across*all* namespaces in the
106
106
cluster). That is, it allows one to specify a specific pod and namespace to
107
107
retrieve PodResources from, rather than having to query all of them all at
108
108
once.
@@ -286,10 +286,22 @@ Kubelet will always be backwards compatible, so going forward existing plugins a
286
286
###### How can this feature be enabled / disabled in a live cluster?
287
287
288
288
-[x] Feature gate (also fill in values in `kep.yaml`)
289
-
- Feature gate name for retrieving resources allocated by DRA: `DynamicResourceAllocation` and `PodResourcesDynamicResources`.
290
-
- Feature gate name for Get method: `PodResourcesGet`. In case `DynamicResourceAllocation` or the `PodResourcesDynamicResources`
291
-
are disabled and `PodResourcesGet` is enabled, the Get method will retrieve resources allocated by device plugins, memory and cpus (but omit those allocated by DRA resource drivers).
292
-
In case `PodResourcesGet`, `DynamicResourceAllocation` and `PodResourcesDynamicResources` are all enabled, the Get method will also retrieve the resources allocated via DRA.
289
+
- Feature gate name: `DynamicResourceAllocation` is existing feature gate to
290
+
enable / disable DRA feature.
291
+
- Components depending on the feature gate: kube-apiserver, kube-controller-manager,
292
+
kube-scheduler, kubelet
293
+
- Feature gate name: `PodResourcesDynamicResources` new feature gate to
294
+
enable / disable PodResources API List method to populate `DynamicResource`
295
+
information from the `DRAManager`.
296
+
`DynamicResourceAllocation` feature gate has to be enabled as well.
297
+
- Components depending on the feature gate: kubelet, 3rd party consumers.
298
+
- Feature gate name: `PodResourcesGet` new feature gate to enable / disable
299
+
PodResources API Get method. In case `DynamicResourceAllocation` or
300
+
the `PodResourcesDynamicResources` are disabled and `PodResourcesGet`
301
+
is enabled, the Get method will retrieve resources allocated by device plugins,
302
+
memory and cpus (but omit those allocated by DRA resource drivers).
303
+
In case `PodResourcesGet`, `DynamicResourceAllocation` and `PodResourcesDynamicResources`
304
+
are all enabled, the `Get()` method will also retrieve the resources allocated via DRA.
293
305
- Components depending on the feature gate: kubelet, 3rd party consumers.
294
306
295
307
###### Does enabling the feature change any default behavior?
@@ -333,31 +345,23 @@ No.
333
345
Look at the `pod_resources_endpoint_requests_list` and `pod_resources_endpoint_requests_get` metric exposed by the kubelet.
334
346
335
347
###### How can someone using this feature know that it is working for their instance?
336
-
337
-
<!--
338
-
For instance, if this is a pod-related feature, it should be possible to determine if the feature is functioning properly
339
-
for each individual pod.
340
-
Pick one more of these and delete the rest.
341
-
Please describe all items visible to end users below with sufficient detail so that they can verify correct enablement
342
-
and operation of this feature.
343
-
Recall that end users cannot usually observe component logs or access metrics.
344
-
-->
348
+
Call the PodResources API and see the result.
345
349
346
350
-[ ] Events
347
351
- Event Reason:
348
352
-[ ] API .status
349
353
- Condition name:
350
354
- Other field:
351
-
-[X] Other (treat as last resort)
352
-
- Details: check the kubelet metric `pod_resources_endpoint_requests_total`, `pod_resources_endpoint_requests_list` and `pod_resources_endpoint_requests_get`.
355
+
-[] Other (treat as last resort)
356
+
- Details:
353
357
354
358
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
355
359
356
360
N/A.
357
361
358
362
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
359
363
360
-
-[X] Metrics
364
+
-[X] Metrics
361
365
- Metric name: `pod_resources_endpoint_requests_total`, `pod_resources_endpoint_requests_list` and `pod_resources_endpoint_requests_get`.
362
366
- Components exposing the metric: kubelet
363
367
@@ -367,9 +371,11 @@ As part of this feature enhancement, per-API-endpoint resources metrics are bein
367
371
368
372
### Dependencies
369
373
374
+
The container runtime must support CDI.
375
+
370
376
###### Does this feature depend on any specific services running in the cluster?
371
377
372
-
No.
378
+
A third-party resource driver is required for allocating resources.
0 commit comments