Skip to content

Commit eaa6675

Browse files
committed
address PPR questions
Signed-off-by: Moshe Levi <[email protected]>
1 parent 1191ad8 commit eaa6675

File tree

1 file changed

+24
-18
lines changed
  • keps/sig-node/3695-pod-resources-for-dra

1 file changed

+24
-18
lines changed

keps/sig-node/3695-pod-resources-for-dra/README.md

Lines changed: 24 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -102,7 +102,7 @@ Additionally, we propose adding a `Get()` method to the existing gRPC service
102102
to allow querying specific pods for their allocated resources.
103103

104104
**Note:** The new `Get()` call is a strict subset of the `List()` call (which
105-
returns the list of PodResources for *all* pods acrosss *all* namespaces in the
105+
returns the list of PodResources for *all* pods across *all* namespaces in the
106106
cluster). That is, it allows one to specify a specific pod and namespace to
107107
retrieve PodResources from, rather than having to query all of them all at
108108
once.
@@ -286,10 +286,22 @@ Kubelet will always be backwards compatible, so going forward existing plugins a
286286
###### How can this feature be enabled / disabled in a live cluster?
287287

288288
- [x] Feature gate (also fill in values in `kep.yaml`)
289-
- Feature gate name for retrieving resources allocated by DRA: `DynamicResourceAllocation` and `PodResourcesDynamicResources`.
290-
- Feature gate name for Get method: `PodResourcesGet`. In case `DynamicResourceAllocation` or the `PodResourcesDynamicResources`
291-
are disabled and `PodResourcesGet` is enabled, the Get method will retrieve resources allocated by device plugins, memory and cpus (but omit those allocated by DRA resource drivers).
292-
In case `PodResourcesGet`, `DynamicResourceAllocation` and `PodResourcesDynamicResources` are all enabled, the Get method will also retrieve the resources allocated via DRA.
289+
- Feature gate name: `DynamicResourceAllocation` is existing feature gate to
290+
enable / disable DRA feature.
291+
- Components depending on the feature gate: kube-apiserver, kube-controller-manager,
292+
kube-scheduler, kubelet
293+
- Feature gate name: `PodResourcesDynamicResources` new feature gate to
294+
enable / disable PodResources API List method to populate `DynamicResource`
295+
information from the `DRAManager`.
296+
`DynamicResourceAllocation` feature gate has to be enabled as well.
297+
- Components depending on the feature gate: kubelet, 3rd party consumers.
298+
- Feature gate name: `PodResourcesGet` new feature gate to enable / disable
299+
PodResources API Get method. In case `DynamicResourceAllocation` or
300+
the `PodResourcesDynamicResources` are disabled and `PodResourcesGet`
301+
is enabled, the Get method will retrieve resources allocated by device plugins,
302+
memory and cpus (but omit those allocated by DRA resource drivers).
303+
In case `PodResourcesGet`, `DynamicResourceAllocation` and `PodResourcesDynamicResources`
304+
are all enabled, the `Get()` method will also retrieve the resources allocated via DRA.
293305
- Components depending on the feature gate: kubelet, 3rd party consumers.
294306

295307
###### Does enabling the feature change any default behavior?
@@ -333,31 +345,23 @@ No.
333345
Look at the `pod_resources_endpoint_requests_list` and `pod_resources_endpoint_requests_get` metric exposed by the kubelet.
334346

335347
###### How can someone using this feature know that it is working for their instance?
336-
337-
<!--
338-
For instance, if this is a pod-related feature, it should be possible to determine if the feature is functioning properly
339-
for each individual pod.
340-
Pick one more of these and delete the rest.
341-
Please describe all items visible to end users below with sufficient detail so that they can verify correct enablement
342-
and operation of this feature.
343-
Recall that end users cannot usually observe component logs or access metrics.
344-
-->
348+
Call the PodResources API and see the result.
345349

346350
- [ ] Events
347351
- Event Reason:
348352
- [ ] API .status
349353
- Condition name:
350354
- Other field:
351-
- [X] Other (treat as last resort)
352-
- Details: check the kubelet metric `pod_resources_endpoint_requests_total`, `pod_resources_endpoint_requests_list` and `pod_resources_endpoint_requests_get`.
355+
- [ ] Other (treat as last resort)
356+
- Details:
353357

354358
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
355359

356360
N/A.
357361

358362
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
359363

360-
- [X ] Metrics
364+
- [X] Metrics
361365
- Metric name: `pod_resources_endpoint_requests_total`, `pod_resources_endpoint_requests_list` and `pod_resources_endpoint_requests_get`.
362366
- Components exposing the metric: kubelet
363367

@@ -367,9 +371,11 @@ As part of this feature enhancement, per-API-endpoint resources metrics are bein
367371

368372
### Dependencies
369373

374+
The container runtime must support CDI.
375+
370376
###### Does this feature depend on any specific services running in the cluster?
371377

372-
No.
378+
A third-party resource driver is required for allocating resources.
373379

374380
### Scalability
375381

0 commit comments

Comments
 (0)