Merge pull request kubernetes#2708 from cynepco3hahue/update_memory_manager_kep

k8s-ci-robot · web-flow · commit ea90443458c3 · 2021-05-13T08:04:26.000-07:00
Graduate memory manager to beta
diff --git a/keps/prod-readiness/sig-node/1769.yaml b/keps/prod-readiness/sig-node/1769.yaml
@@ -1,3 +1,5 @@
 kep-number: 1769
 alpha:
   approver: "@deads2k"
+beta:
+  approver: "@deads2k"
diff --git a/keps/sig-node/1769-memory-manager/README.md b/keps/sig-node/1769-memory-manager/README.md
@@ -429,6 +429,7 @@ Memory pinning will be validated for Topology Manager `single-numa-node` and `re
 
 #### Phase 2: Beta (target v1.22)
 - Extend E2E test coverage.
+- Provide memory manager metrics under pod resources API.
 - Feature gate is enabled by default.
 - Provide beta-level documentation.
 
@@ -531,7 +532,7 @@ Yes, the admission flow changes for a pod in Guaranteed QoS class. With the Memo
   feature, can it break the existing applications?). -->
 
 * **What happens if we reenable the feature if it was previously rolled back?**
-The Memory Manager utlizes State file to track memory assignments. If State file is not valid, it must be removed and kubelet restarted. E.g., State file might become invalid when kube/system reserved have changed (increased), which may lead to a situation when some containers cannot be started.
+The Memory Manager utilizes the state file to track memory assignments. If State file is not valid, it must be removed and kubelet restarted. E.g., State file might become invalid when kube/system reserved have changed (increased), which may lead to a situation when some containers cannot be started.
 
 * **Are there any tests for feature enablement/disablement?**
 Yes, there is a number of Unit Tests designated for State file validation.
@@ -541,69 +542,65 @@ Yes, there is a number of Unit Tests designated for State file validation.
 _This section must be completed when targeting beta graduation to a release._
 
 * **How can a rollout fail? Can it impact already running workloads?**
-  Try to be as paranoid as possible - e.g., what if some components will restart
-   mid-rollout?
+  It is possible that the state file will have inconsistent data during the rollout, because of the kubelet restart, but
+  you can easily to fix it by removing memory manager state file and run kubelet restart. It should not affect any running 
+  workloads.
+  
 
 * **What specific metrics should inform a rollback?**
+  The pod may fail with the admission error because the kubelet can not provide all resources aligned from the same NUMA node. 
+  You can see the error message under the pod events.
 
 * **Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?**
-  Describe manual testing that was done and the outcomes.
-  Longer term, we may want to require automated upgrade/rollback tests, but we
-  are missing a bunch of machinery and tooling and can't do that now.
+  Tested it manually by replacing the kubelet binary on the node with the `Static` memory manager policy, but I failed
+  to find correct procedure how to test upgrade from 1.21 to my custom build with updated kubelet binary.
 
 * **Is the rollout accompanied by any deprecations and/or removals of features, APIs, 
 fields of API types, flags, etc.?**
-  Even if applying deprecation policies, they may still surprise some users.
+  No.
 
 ### Monitoring Requirements
 
 _This section must be completed when targeting beta graduation to a release._
 
 * **How can an operator determine if the feature is in use by workloads?**
-  Ideally, this should be a metric. Operations against the Kubernetes API (e.g.,
-  checking if there are objects with field X set) may be a last resort. Avoid
-  logs or events for this purpose.
+  The memory manager data will be available under pod resources API. When it configured with the static policy
+  you will see memory related data during call to the pod resources API List method under the container.
 
 * **What are the SLIs (Service Level Indicators) an operator can use to determine 
 the health of the service?**
-  - [ ] Metrics
-    - Metric name:
-    - [Optional] Aggregation method:
-    - Components exposing the metric:
-  - [ ] Other (treat as last resort)
-    - Details:
+  
+  *For cluster admins:*
+  
+  The assumption that the feature should work, once memory manager static policy and reserved memory flags configured under the kubelet 
+  and kubelet succeeded to restart.
+  
+  *For a pod author:*
+  
+  * Pod succeeded to start. You have two options to verify if containers pinned to the NUMA node
+    - Via pod resources API, you will need to connect to grpc socket and get information from it, see [pod resource API doc page](https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/#monitoring-device-plugin-resources) for more information. 
+    - Checking the relevant container CGroup under the node.
+
+  * Pod failed to start because of the admission error.
+    
+    To understand the reason you will need to check via pod resources API 
+    the amount of allocatable memory and memory reserved by containers.
 
 * **What are the reasonable SLOs (Service Level Objectives) for the above SLIs?**
-  At a high level, this usually will be in the form of "high percentile of SLI
-  per day <= X". It's impossible to provide comprehensive guidance, but at the very
-  high level (needs more precise definitions) those may be things like:
-  - per-day percentage of API calls finishing with 5XX errors <= 1%
-  - 99% percentile over day of absolute value from (job creation time minus expected
-    job creation time) for cron job <= 10%
-  - 99,9% of /health requests per day finish with 200 code
+  This does not seem relevant to this feature.
 
 * **Are there any missing metrics that would be useful to have to improve observability 
 of this feature?**
   Describe the metrics themselves and the reasons why they weren't added (e.g., cost,
   implementation difficulties, etc.).
+  Currently, for the pod author, it is impossible to know containers NUMA pinning without access to the node.
 
 ### Dependencies
 
 _This section must be completed when targeting beta graduation to a release._
 
 * **Does this feature depend on any specific services running in the cluster?**
-  Think about both cluster-level services (e.g. metrics-server) as well
-  as node-level agents (e.g. specific version of CRI). Focus on external or
-  optional services that are needed. For example, if this feature depends on
-  a cloud provider API, or upon an external software-defined storage or network
-  control plane.
-
-  For each of these, fill in the following—thinking about running existing user workloads
-  and creating new ones, as well as about cluster-level services (e.g. DNS):
-  - [Dependency name]
-    - Usage description:
-      - Impact of its outage on the feature:
-      - Impact of its degraded performance or high-error rates on the feature:
+  No.
 
 
 ### Scalability
@@ -617,45 +614,26 @@ _For GA, this section is required: approvers should be able to confirm the
 previous answers based on experience in the field._
 
 * **Will enabling / using this feature result in any new API calls?**
-  Describe them, providing:
-  - API call type (e.g. PATCH pods)
-  - estimated throughput
-  - originating component(s) (e.g. Kubelet, Feature-X-controller)
-  focusing mostly on:
-  - components listing and/or watching resources they didn't before
-  - API calls that may be triggered by changes of some Kubernetes resources
-    (e.g. update of object X triggers new updates of object Y)
-  - periodic API calls to reconcile state (e.g. periodic fetching state,
-    heartbeats, leader election, etc.)
+  No.
 
 * **Will enabling / using this feature result in introducing new API types?**
-  Describe them, providing:
-  - API type
-  - Supported number of objects per cluster
-  - Supported number of objects per namespace (for namespace-scoped objects)
+  No.
 
 * **Will enabling / using this feature result in any new calls to the cloud 
 provider?**
+  No.
 
 * **Will enabling / using this feature result in increasing size or count of 
 the existing API objects?**
-  Describe them, providing:
-  - API type(s):
-  - Estimated increase in size: (e.g., new annotation of size 32B)
-  - Estimated amount of new objects: (e.g., new Object X for every existing Pod)
+  No.
 
 * **Will enabling / using this feature result in increasing time taken by any 
 operations covered by [existing SLIs/SLOs]?**
-  Think about adding additional work or introducing new steps in between
-  (e.g. need to do X to start a container), etc. Please describe the details.
+  No.
 
 * **Will enabling / using this feature result in non-negligible increase of 
 resource usage (CPU, RAM, disk, IO, ...) in any components?**
-  Things to keep in mind include: additional in-memory state, additional
-  non-trivial computations, excessive access to disks (including increased log
-  volume), significant amount of data sent and/or received over network, etc.
-  This through this both in small and large cases, again with respect to the
-  [supported limits].
+  No.
 
 ### Troubleshooting
 
@@ -666,20 +644,15 @@ details). For now, we leave it here.
 _This section must be completed when targeting beta graduation to a release._
 
 * **How does this feature react if the API server and/or etcd is unavailable?**
+  No impact.
 
 * **What are other known failure modes?**
-  For each of them, fill in the following information by copying the below template:
-  - [Failure mode brief description]
-    - Detection: How can it be detected via metrics? Stated another way:
-      how can an operator troubleshoot without logging into a master or worker node?
-    - Mitigations: What can be done to stop the bleeding, especially for already
-      running user workloads?
-    - Diagnostics: What are the useful log messages and their required logging
-      levels that could help debug the issue?
-      Not required until feature graduated to beta.
-    - Testing: Are there any tests for failure mode? If not, describe why.
+  During the enabling and disabling of the memory manager(changing memory manager policy) you must remove the memory
+  manager state file(`/var/lib/kubelet/memory_manager_state`), otherwise the kubelet start will fail.
+  You can identify the issue via check of the kubelet log.
 
 * **What steps should be taken if SLOs are not being met to determine the problem?**
+  Not applicable.
 
 [supported limits]: https://git.k8s.io/community//sig-scalability/configs-and-limits/thresholds.md
 [existing SLIs/SLOs]: https://git.k8s.io/community/sig-scalability/slos/slos.md#kubernetes-slisslos
diff --git a/keps/sig-node/1769-memory-manager/kep.yaml b/keps/sig-node/1769-memory-manager/kep.yaml
@@ -8,7 +8,7 @@ owning-sig: sig-node
 participating-sigs:
 status:  implementable
 creation-date: 2020-02-03
-last-updated: 2021-02-08
+last-updated: 2021-05-11
 reviewers:
   - "@klueska"
   - "@derekwaynecarr"
@@ -25,7 +25,7 @@ stage: alpha
 # The most recent milestone for which work toward delivery of this KEP has been
 # done. This can be the current (upcoming) milestone, if it is being actively
 # worked on.
-latest-milestone: "v1.21"
+latest-milestone: "v1.22"
 
 # The milestone at which this feature was, or is targeted to be, at each stage.
 milestone:
@@ -42,4 +42,4 @@ feature-gates:
 disable-supported: true
 
 # The following PRR answers are required at beta release
-metrics:
+metrics: