Skip to content

Commit 62d63df

Browse files
committed
Add Windows only policy definition
Signed-off-by: James Sturtevant <[email protected]>
1 parent e351e9c commit 62d63df

File tree

1 file changed

+11
-1
lines changed
  • keps/sig-windows/4885-windows-cpu-and-memory-affinity

1 file changed

+11
-1
lines changed

keps/sig-windows/4885-windows-cpu-and-memory-affinity/README.md

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -223,7 +223,17 @@ above would give you:
223223
It is possible to indicate to a process which Numa node is preferred but a limitation of the Windows API's is that [PROC_THREAD_ATTRIBUTE_PREFERRED_NODE](https://learn.microsoft.com/windows/win32/api/processthreadsapi/nf-processthreadsapi-updateprocthreadattribute)
224224
does not support setting multiple Numa nodes for a single Job object (i.e. Container) so is not usable in the context of Windows containers which have multiple processes.
225225

226-
To work around these limitations, the kubelet will query the OS to get the affinity masks associated with each of the Numa nodes selected by the memory manager and update the CPU Group affinity accordingly in the CRI field. This will result in the memory from the Numa node being used. There are a couple scenarios that need to be considered:
226+
Since the existing Memory Manager Policy `Static` on Linux has semantic meaning that ensures that only the memory from a
227+
NUMA node selected is used. We can not re-use this policy on Windows given that there is no way to ensure only the memory
228+
on the Node that the memory manager selects. For these reason if the `Static` policy is chosen on Windows kubelet will fail
229+
to start with an error message that states it can use the `Static` policy. Instead we will create a new Windows only Policy called
230+
`BestEffort` which will initially only be implemented on Windows and Linux will fail to start if the Policy is set.
231+
We do not have any use cases for this policy to be implemented on Linux at this time and so we will avoid adding a feature that isn't
232+
applicable to that platform.
233+
234+
The main purpose of the `BestEffort` policy on Windows will be to ensure that at the time of pod start up there is enough Memory on a given NUMA node to meet the memory requests of the pod. The intent here is to make sure if CPU's are selected that there is enough memory to also support the request to avoid cross CPU/NUMA node processing. On Windows, even though we cannot guarantee NUMA node selection, the Windows Schedule will do the right thing in most cases. By using Kubelet's existing [Memory Mapping strategy](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/1769-memory-manager#memory-map) we can ensure NUMA nodes have enough memory at the time of scheduling. It is important to note that this does not mean that it is guaranteed (hence the policy name change)
235+
236+
Since Windows does not have an API to directly assign NUMA nodes, the kubelet will query the OS to get the affinity masks associated with each of the Numa nodes selected by the memory manager and update the CPU Group affinity accordingly in the CRI field. This will result in the memory from the Numa node being used. There are a couple scenarios that need to be considered:
227237

228238
- Memory manager is enabled, cpu manager is not: kubelet will look up all the cpu's associated with the selected Numa nodes and assign the CPU Group affinity. For example if NumaNode 0 is selected by memory manager, and NumaNode 0 has the first four CPU's in Windows CPU group 0 the result would be `cpu affinity: 0000001111, group 0`.
229239
- Memory manager is enabled, CPU manager is enabled

0 commit comments

Comments
 (0)