@@ -884,33 +884,28 @@ _This section must be completed when targeting beta graduation to a release._
884
884
885
885
* ** How can an operator determine if the feature is in use by workloads?**
886
886
887
- We will create a new gauge metric that's updated during kubelet's reconcile
888
- of ` v1.Pod ` to track the number containers scheduled to this node in the API.
889
- This will be slightly different than the existing
890
- ` kubelet_running_containers ` , which describes the kubelet's representation of
891
- containers, and will be able to label the metrics with fields that are only
892
- available in the API object, such as type of container.
893
-
894
- Note that these kubelet metrics are still in alpha.
895
-
896
- This is tracked in [ #97974 ] ( https://issues.k8s.io/97974 ) .
887
+ This information is available by examining pod objects in the API server
888
+ for the field ` pod.spec.ephemeralContainers ` . Additionally, the kubelet surfaces
889
+ the following metrics, added in [ #99000 ] ( https://issues.k8s.io/99000 ) :
890
+
891
+ - ` kubelet_managed_ephemeral_containers ` : The number of ephemeral containers
892
+ in pods managed by this kubelet.
893
+ - ` kubelet_started_containers_total ` : Counter of all containers started by
894
+ this kubelet, indexed by ` container_type ` . Ephemeral containers have a
895
+ ` container_type ` of ` ephemeral_container ` .
896
+ - ` kubelet_started_containers_errors_total ` : Counter of errors encountered
897
+ when this kubelet starts containers, idnexed by ` container_type ` .
898
+ Ephemeral containers have a ` container_type ` of ` ephemeral_container ` .
897
899
898
900
* ** What are the SLIs (Service Level Indicators) an operator can use to determine
899
901
the health of the service?**
900
902
- [x] Metrics
901
- - Metric name: ` apiserver_request_total{component="apiserver",resource="pods",subresource="ephemeralcontainers"} ` (apiserver), ` kubelet_container_errors_total{type="Ephemeral "}` (kubelet, Proposed)
903
+ - Metric name: ` apiserver_request_total{component="apiserver",resource="pods",subresource="ephemeralcontainers"} ` (apiserver), ` kubelet_started_containers_errors_total{container_type="ephemeral_container "}`
902
904
- [ Optional] Aggregation method: Aggregate by container type
903
- - Components exposing the metric: kubelet
905
+ - Components exposing the metric: apiserver, kubelet
904
906
- [ ] Other (treat as last resort)
905
907
- Details:
906
908
907
- Note that the kubelet SLI for this feature is a counter that increments upon
908
- failure to create an ephemeral container. Right now the kubelet only surfaces
909
- runtime-level errors, so I'll propose adding a higher level counter to
910
- encapsulate the entire container creation request, including container type.
911
-
912
- This is tracked in [ #97974 ] ( https://issues.k8s.io/97974 ) .
913
-
914
909
* ** What are the reasonable SLOs (Service Level Objectives) for the above SLIs?**
915
910
At a high level, this usually will be in the form of "high percentile of SLI
916
911
per day <= X". It's impossible to provide comprehensive guidance, but at the very
@@ -962,11 +957,13 @@ previous answers based on experience in the field._
962
957
963
958
* ** Will enabling / using this feature result in introducing new API types?**
964
959
965
- There an no new Kinds for storage, but new types are used in API interactions
966
- and in ` v1.Pod ` .
960
+ There an no new Kinds for storage, but new types are used in ` v1.Pod ` .
961
+ Ephemeral containers are added by writing a ` v1.Pod ` containing
962
+ ` pod.spec.ephemeralContainers ` to the pod's ` /ephemeralcontainers `
963
+ subresource, similar to how the kubelet updates ` pod.status ` .
967
964
968
965
- API type:
969
- - v1.EphemeralContainers (used for ` /ephemeralcontainers ` subresource)
966
+ - v1.Pod (with ` /ephemeralcontainers ` subresource)
970
967
- Supported number of objects per cluster: same as Pods
971
968
- Supported number of objects per namespace: same as Pods
972
969
@@ -980,21 +977,22 @@ the existing API objects?**
980
977
981
978
- API type(s): v1.Pod
982
979
- Estimated increase in size: Additional ` Container ` for each Ephemeral
983
- container. This is expected to be negligible since these are created by
980
+ container. This is expected to be negligible since these are created
984
981
manually by humans.
985
982
- Estimated amount of new objects: N/A
986
983
987
984
* ** Will enabling / using this feature result in increasing time taken by any
988
985
operations covered by [ existing SLIs/SLOs] ?**
989
986
990
- When people add additional containers to a Pod, the pod will have additional
987
+ When users add additional containers to a Pod, the pod will have additional
991
988
containers to shut down and garbage collect when the Pod exits.
992
989
993
990
* ** Will enabling / using this feature result in non-negligible increase of
994
991
resource usage (CPU, RAM, disk, IO, ...) in any components?**
995
992
996
993
Not automatically. Use of this feature will result in additional containers
997
- running on kubelets.
994
+ running on kubelets, but it does not change the amount of resources allocated
995
+ to pods.
998
996
999
997
### Troubleshooting
1000
998
@@ -1030,6 +1028,11 @@ _This section must be completed when targeting beta graduation to a release._
1030
1028
- Testing: No, testing for cluster misconfiguration at dev time doesn't
1031
1029
prevent cluster misconfiguration at run time.
1032
1030
1031
+ One may completely disable the feature using the ` EphemeralContainers ` feature
1032
+ flag, but it's also possible to prevent the creation of new ephemeral containers
1033
+ without a restart by removing authorization to ` ephemeralcontainers ` subresource
1034
+ via [ RBAC] ( https://kubernetes.io/docs/reference/access-authn-authz/rbac/ ) .
1035
+
1033
1036
* ** What steps should be taken if SLOs are not being met to determine the problem?**
1034
1037
1035
1038
Troubleshoot using apiserver and kubelet error logs.
@@ -1050,6 +1053,9 @@ _This section must be completed when targeting beta graduation to a release._
1050
1053
- * 2020-09-29* : Ported KEP to directory-based template.
1051
1054
- * 2021-01-07* : Updated KEP for beta release in 1.21 and completed PRR section.
1052
1055
- * 2021-04-12* : Switched ` /ephemeralcontainers ` API to use ` Pod ` .
1056
+ - * 2021-05-14* : Add additional graduation criteria
1057
+ - * 2021-07-09* : Revert KEP to alpha because of the new API introduced in 1.22.
1058
+ - * 2021-08-23* : Updated KEP for beta release in 1.23.
1053
1059
1054
1060
## Drawbacks
1055
1061
0 commit comments