You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently enabling low latency workloads co-hosted on the same nodes in Windows Server create noisy neighbor behaviors
@@ -165,7 +171,7 @@ One difference between the Windows API and Linux is the concept of [Processor gr
165
171
On Windows systems with more than 64 cores the CPU's will be split into groups,
166
172
each processor is identified by its group number and its group-relative processor number.
167
173
168
-
In Cri we will add the following structure to the `WindowsContainerResources` in CRI:
174
+
In CRI we will add the following structure to the `WindowsContainerResources` in CRI:
169
175
170
176
```protobuf
171
177
message WindowsCpuGroupAffinity {
@@ -268,7 +274,7 @@ Integration tests do not run on Windows. Functionality will be covered by unit a
268
274
269
275
##### e2e tests
270
276
271
-
- e2e_node will need to be enabled for windows to add coverage
277
+
- e2e_node will need to be enabled for Windows to add coverage. We plan to enable just e2e tests that relate to memory/cpu/topology manager, not the full suite.
272
278
273
279
### Graduation Criteria
274
280
@@ -305,10 +311,19 @@ N/A
305
311
306
312
### Version Skew Strategy
307
313
308
-
N/A
314
+
This feature is kubelet specific, so version skew strategy is N/A.
309
315
310
316
## Production Readiness Review Questionnaire
311
317
318
+
This KEP discusses the changes required to enable for the various managers for Windows.
319
+
This means many of the PRR questions for these features have already been covered and implemented
320
+
as part of those KEPs. We try to give details relevant to Windows but do not plan to change any of the
321
+
details of the features enablement in the KEP unless it is required because of a difference in Windows.
Yes it uses a feature gate. Memory and CPU managers have a state file that requires cleanup. After changing the CPU manager policy from none to static or the the other way around, before to start the kubelet again, you must remove the CPU manager state file(/var/lib/kubelet/cpu_manager_state), otherwise the kubelet start will fail. Startup failures for this reason will be logged in the kubelet log.
359
+
360
+
Details for the steps to reset a state file are in https://kubernetes.io/docs/tasks/administer-cluster/cpu-management-policies/#changing-the-cpu-manager-policy. Memory manager has the same steps for resetting.
341
361
342
362
###### Does enabling the feature change any default behavior?
343
363
344
-
No, Additional settings are required to enable the features. The default policies for CPU/Memory manager will be `None`, meaning that they will not interact with running of pods.
364
+
No, Additional settings are required to enable the features. The default policies for CPU/Memory manager will be `None`, meaning that they will not interact with running of pods. The Cluster administrator will need to set specific CPU/Memory/Topology manager policies
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
347
374
@@ -356,12 +383,18 @@ feature.
356
383
NOTE: Also set `disable-supported` to `true` or `false` in `kep.yaml`.
357
384
-->
358
385
359
-
Yes. Restarting of the pods will be required to remove the CPU/Memory affinity.
386
+
Yes. A rolling restart (delete or delete and redeploy) of the pods will be required to remove the CPU/Memory affinity
387
+
from running pods. Restarting kubelet after changing the feature will not affect any running pods but new pods created will be
388
+
affected by the changes.
360
389
361
390
###### What happens if we reenable the feature if it was previously rolled back?
362
391
392
+
The Memory Manager and CPU managers utilize a state file to track assignments. If State file is not valid, it must be removed and kubelet restarted. E.g., State file might become invalid when kube/system reserved have changed (increased), which may lead to a situation when some containers cannot be started.
393
+
363
394
###### Are there any tests for feature enablement/disablement?
364
395
396
+
Yes, there is a number of Unit Tests designated for State file validation.
397
+
365
398
<!--
366
399
The e2e framework does not currently support enabling or disabling feature
367
400
gates. However, unit tests in each component dealing with managing data, created
@@ -466,13 +499,9 @@ The memory/cpu manager will be under the pod resources API. And there are propos
466
499
467
500
###### How can someone using this feature know that it is working for their instance?
468
501
469
-
-[x] Events
470
-
- Event Reason:
471
-
-[ ] API .status
472
-
- Condition name:
473
-
- Other field:
474
-
-[ ] Other (treat as last resort)
475
-
- Details:
502
+
-[X] Other (treat as last resort)
503
+
- Details: check the kubelet metric `cpu_manager_pinning_requests_total`
504
+
- check the kubelet metric `memory_manager_pinning_requests_total`
476
505
477
506
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
478
507
@@ -592,3 +621,5 @@ Use this section if you need things from the project/SIG. Examples include a
592
621
new subproject, repos requested, or GitHub details. Listing these here allows a
593
622
SIG to get the process for these resources started right away.
594
623
-->
624
+
625
+
n/a Windows will use existing testing infrastructure
0 commit comments