Skip to content

Commit 0390966

Browse files
committed
updates based on discussion with sig instrumentation
1 parent 24831fa commit 0390966

File tree

1 file changed

+24
-10
lines changed
  • keps/sig-node/1287-in-place-update-pod-resources

1 file changed

+24
-10
lines changed

keps/sig-node/1287-in-place-update-pod-resources/README.md

Lines changed: 24 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,8 @@
4040
- [Instrumentation](#instrumentation)
4141
- [<code>kubelet_container_resize_requests_total</code>](#kubelet_container_resize_requests_total)
4242
- [<code>kubelet_pod_resize_sli_duration_seconds</code>](#kubelet_pod_resize_sli_duration_seconds)
43-
- [<code>kubelet_pod_infeasible_resize_total</code>](#kubelet_pod_infeasible_resize_total)
43+
- [<code>kubelet_pod_pending_resize_total</code>](#kubelet_pod_pending_resize_total)
44+
- [<code>kubelet_pod_in_progress_resize_total</code>](#kubelet_pod_in_progress_resize_total)
4445
- [<code>kubelet_pod_deferred_resize_accepted_total</code>](#kubelet_pod_deferred_resize_accepted_total)
4546
- [Static CPU &amp; Memory Policy](#static-cpu--memory-policy)
4647
- [Future Enhancements](#future-enhancements)
@@ -897,37 +898,49 @@ A single pod update changing multiple containers will be considered separate res
897898

898899
Labels:
899900
- `resource_type` - what type of resource is being resized. Possible values: `cpu_limits`, `cpu_requests` `memory_limits`, or `memory_requests`. If more than one of these resource types is changing in the resize request,
900-
we increment the counter multiple times, once for each. This means that a single pod update changing multiple
901+
we increment the counter multiple times, once for each. This means that a single container update changing multiple
901902
resource types will be considered multiple requests for this metric.
902903
- `operation_type` - whether the resize is an increase or a decrease. Possible values: `increase`, `decrease`, `add`, or `remove`.
904+
- `namespace` - the namespace of the pod.
903905

904906
This metric is recorded as a counter.
905907

906-
#### `kubelet_pod_resize_sli_duration_seconds`
908+
#### `kubelet_pod_resize_sli_duration_seconds`
909+
This metric tracks the latency between when the kubelet accepts a resize request and when it finshes actuating the request. More precisely, this metric tracks the total amount of time that the PodResizeInProgress condition is present on a pod.
907910

908-
This metric tracks the latency between when the kubelet accepts a resize request and when it finshes actuating
909-
the request. More precisely, this metric tracks the total amount of time that the `PodResizeInProgress` condition
910-
is present on a pod.
911+
Labels:
912+
- `namespace` - the namespace of the pod.
911913

912914
This metric is recorded as a gauge.
913915

914-
#### `kubelet_pod_infeasible_resize_total`
916+
#### `kubelet_pod_pending_resize_total`
915917

916-
This metric tracks the total count of resize requests that the kubelet marks as infeasible. This will make it
918+
This metric tracks the total count of pods that the kubelet marks as pending. This will make it
917919
easier for us to see which of the current limitations users are running into the most.
918920

919921
Labels:
920-
- `reason` - why the resize is infeasible. Although a more detailed "reason" will be provided in the `PodResizePending`
922+
- `reason` - why the resize is pending. Possible values: `infeasible` or `deferred`.
923+
- `message` - more details about why the resize is pending. Although a more detailed "message" will be provided in the `PodResizePending`
921924
condition in the pod, we limit this label to only the following possible values to keep cardinality low:
922925
- `guaranteed_pod_cpu_manager_static_policy` - In-place resize is not supported for Guaranteed Pods alongside CPU Manager static policy.
923926
- `guaranteed_pod_memory_manager_static_policy` - In-place resize is not supported for Guaranteed Pods alongside Memory Manager static policy.
924927
- `static_pod` - In-place resize is not supported for static pods.
925928
- `swap_limitation` - In-place resize is not supported for containers with swap.
926929
- `node_capacity` - The node doesn't have enough capacity for this resize request.
930+
- `namespace` - the namespace of the pod.
927931

928932
This list of possible reasons may shrink or grow depending on limitations that are added or removed in the future.
929933

930-
This metric is recorded as a counter.
934+
This metric is recorded as a gauge.
935+
936+
#### `kubelet_pod_in_progress_resize_total`
937+
938+
This metric tracks the total count of resize requests that the kubelet marks as in progress.
939+
940+
Labels:
941+
- `namespace` - the namespace of the pod.
942+
943+
This metric is recorded as a gauge.
931944

932945
#### `kubelet_pod_deferred_resize_accepted_total`
933946

@@ -937,6 +950,7 @@ opposed to being triggered by an event such as another pod being deleted or size
937950

938951
Labels:
939952
- `accepted_reason` - whether the resize was accepted through the timed retry or due to another pod event. Possible values: `periodic_retry`, `event_based`.
953+
- `namespace` - the namespace of the pod.
940954

941955
This metric is recorded as a counter.
942956

0 commit comments

Comments
 (0)