Skip to content

Commit a613b47

Browse files
yevgeny-shnaidmank8s-ci-robot
authored andcommitted
Enhancement for redefining communication between Module-NMC and NMC
controllers
1 parent c2fa641 commit a613b47

File tree

1 file changed

+111
-0
lines changed

1 file changed

+111
-0
lines changed
Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
# Worker pods for KMM
2+
3+
Authors: @yevgeny-shnaidman, @ybettan
4+
5+
## Introduction
6+
7+
This enhancement aims at redifining areas of resposiblity between Module-NMC and NMC controllers.
8+
This will allow for more clear-cut code and eliminate the variuos race-conditions that we are seeing(or will see) in the current situation
9+
10+
### Current situation
11+
12+
Currently both Module-NMC and NMC controller takes decision regarding kernel module deployment based on node status
13+
- Module-NMC controller check the schedulability of the node in order to decide whether kernel module should be deployed or removed
14+
from the node (add/updating spec of the NMC or removing spec of the NMC)
15+
- NMC controller check the node's schedulability to decide whether to start creating loading/unloading pod on the node. In addition it also
16+
check if the node has been recently rebooted, in order to create a loading pod, even if the status and spec of the NMC are equal.
17+
18+
This creates a situation where 2 entities decide whether kernel modules should be loaded or not based on a nodes' status
19+
20+
## Goals
21+
22+
1. Create a clear-cut distinction between responsibilities of the two controllers
23+
2. Eliminate race conditions which are the result of the current situation
24+
25+
## Non-Goals
26+
27+
Do not change any other functionality of the two controllers, besides their decision making that is described above
28+
29+
## Design
30+
31+
### Module-NMC controller decision-making flow
32+
33+
The flow takes into account both Module with Version field defined (ordered upgrade) and without Version field defined (un-ordered upgrade)
34+
Module-NMC does not take into account the current state of the Node (Ready/NotReady/Schedulable/etc'). It just defines if the kernel module should
35+
be loaded on the node or not based on whether there is a KernelMapping for the current node's kernel and on the labels of the node. All the rest of the decisions
36+
will be taken by NMC reconciler, which has a much better view of Node's current state and kernel module's current state
37+
38+
1. Find all the nodes targeted by the Module regardless of node's status, based on the node selector field of the Module
39+
2. If no suitable KernelMapping for the Node's kernel - do nothing
40+
3. If there is a suitable KernelMapping and Version field missing in Module (not an ordered upgrade) - update the NMC spec
41+
4. If there is a suitable KernelMapping, Version field is present in the Module, module loader version label is on the node and
42+
its value is equal to the Version - update the spec
43+
5. If there is a suitable KernelMapping, Version field is present in the Module, module loader version label is on the node and
44+
its value is not equal to Module's version (meaning old version) - do nothing
45+
6. If there is a suitable KernelMapping, Version field is present in the Module, module loader version label is missing on the node
46+
(meaning kernel module should not be running on the node) - delete the NMC spec
47+
48+
In this implementation, Module-NMC does not need to delete the spec, but in the 2 following cases:
49+
1. during ordered upgrade (see point 6 above)
50+
2. Module is deleted, and so the kernel module should be unloaded
51+
52+
```mermaid
53+
flowchart TD
54+
Module[KMM Module]-->|Reconcile| MNC[Module-NMC controller]
55+
MNC-->|get nodes based on node selector| J1((.))
56+
J1-->|no KernelMapping for node's kernel| Done[Done]
57+
J1-->|found KernelMapping for node's kernel| J2((.))
58+
J2-->|Version missing in Module| US[Update NMC Spec]
59+
J2-->|Version present in Module| J3((.))
60+
J3-->|module loader version label equals Version| US
61+
J3-->|module loader version label not equal Version| Done
62+
J3-->|module loader version label missing| DS[Delete NMC Spec]
63+
```
64+
65+
66+
### NMC controller decision-making flow
67+
68+
NMC takes into account the NMC spec, status, node's status and node's ready timestamp to make decision whether to run worker pods, and whether to run unload or load
69+
worker pod
70+
71+
1. If Node is not Ready/Schedulable - do nothing
72+
2. If NMC's status is missing and Node's kernel version equal to NMC's spec kernel version - run worker load pod
73+
3. If NMC's spec is missing, NMC's status is present and NMC's status kernel version equal to Node's kernel version - run worker unload pod
74+
4. If NMC's spec is present and NMC's status is present, and NMC spec differ from NMC status:
75+
- if status kernel version equal to node's kernel version - run worker unload pod
76+
- if spec's kernel version equal to node's kernel version - run worker load pod
77+
5. If NMC's spec is present and NMC's status is present, and NMC spec equal to NMC status and status timestamp older then node's Ready timestamp - run worker load pod
78+
79+
```mermaid
80+
flowchart TD
81+
NMC[NodeModuleConfig]-->|Reconcile| NMCC[NCM controller]
82+
NMCC-->|get all completed Pods and update NMC statuses| J1((.))
83+
J1-->| get NMC's node| J2((.))
84+
J2-->|node is not Ready/Schedulable| Done[Done]
85+
J2-->|node is Ready/Schedulable| J3((.))
86+
J3-->|status missing| J4((.))
87+
J4-->|node's kernel equals spec' kernel| WLP[Create Worker Load Pod]-->Done
88+
J4-->|node's kernel differs spec' kernel| Done
89+
J3-->|spec missing| J5((.))
90+
J5-->|node's kernel equals status' kernel| WUP[Create Worker Unload Pod]-->Done
91+
J5-->|node's kernel differs status' kernel| Done
92+
J3-->|spec and status differ| J6((.))
93+
J6-->|status kernel equals node's kernel| WUP-->Done
94+
J6-->|status kernel differs node's kernel| J7((.))
95+
J7-->|spec kernel equals node's kernel| WLP-->Done
96+
J7-->|spec kernel differs node's kernel| Done
97+
J3-->|spec and status equal| J8((.))
98+
J8-->|node's Ready timestamp older than status's timestamp| Done
99+
J8-->|node's Ready timestamp newer than status's timestamp| J9((.))
100+
J9-->|spec's kernel equals node's kernel| WLP-->Done
101+
J9-->|spec's kernel differs node's kernel| Done
102+
```
103+
104+
## Addressing goal
105+
106+
* **clear-cut distinction between responsibilities of the two controllers**
107+
Module-NMC now specifies want it wants to run on the node, and NMC takes care of when to run it and how
108+
109+
* **Eliminating race conditions**
110+
Race conditin was due to both controllers looking at the same data Nodes's status and kernel, and making decisions based on that data.
111+
Now Module-NMC does not look at node's status, and NMC looks only at node's status and current kernel

0 commit comments

Comments
 (0)