You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is testing an idea that a compatibility spec can direct
to use (ask) an ml model. This is fully working but needs
testing in a cloud now since kind (locally) does not have
the instance type label.
Signed-off-by: vsoch <[email protected]>
- Select a container for a pod based on a compatibility artifact.
10
+
- Select an instance type for a pod based on a machine learning model.
11
+
12
+
For the latter, we serve a sidecar to the controller that provides models. The node features are served to the model with a request to use a specific one to determine the optimal instance type.
8
13
14
+
## Details
9
15
10
-
This is a Kubernetes controller that will do the following:
16
+
We do the following:
11
17
12
18
* Start running in a cluster with NFD, and retrieving metadata about nodes in the cluster, along with being updated when nodes are added and removed.
13
19
* Receiving pods and checking if they are flagged for image selection.
14
20
* Being flagged means having the label "oci.image.compatibilities.selection/enabled" and (optionally) a node selector
15
-
* If the cluster is not homogenous, a node selector is required, and should be the instance type that the pod is intended for.
21
+
* If the cluster is not homogeneous, a node selector is required, and should be the instance type that the pod is intended for.
22
+
23
+
When a pod (or abstraction that creates them) is created:
24
+
16
25
* If enabled, a URI is provided that points to a compatibility artifact
17
26
* The artifact describes several images (and criteria for checking) that can be used for the Pod
18
27
* The controller checks known nodes for the instance type against the spec,
19
-
28
+
* If an ML server model is specified in the artifact, the entire set of node metadata is sent to it.
Since we want to test that NFD is working, we are going to add custom labels. We just want to test and don't need the labels to persist with recreations, so we can just use `kubectl label`. However,
100
118
if we do it the right (persistent) way we would write a configuration file to `/etc/kubernetes/node-feature-discovery/features.d` on the node.
101
-
In our real world use case we would select based on operating system and kernel version. For our test case, we will just use a script that will programaticallly update worker nodes. In this example,
119
+
In our real world use case we would select based on operating system and kernel version. For our test case, we will just use a script that will programatically update worker nodes. In this example,
102
120
we are just going to add the same label to all nodes and then check our controller based on the image selected. Let's first add "vanilla":
At this point, we want to test compatibility. This step is already done, but I'll show you how I designed the compatibility spec. The logic for this dummy case is the following:
148
170
149
171
1. If our custom label "feature.node.ocifit-k8s.flavor" is vanilla, we want to choose a debian container.
@@ -155,6 +177,9 @@ here we are flipping the logic a bit. We don't know the image, and instead we ar
We aren't going to be using any referrers API or linking this to an image. The target images are in the artifact, and we get there directly from the associated manifest.
Boum! Conceptually, we are selecting a different image depending on the rules in the compatibility spec. Our node features were dummy, but they could be real attributes related to kernel, networking, etc.
236
261
262
+
#### ML Server Decision
263
+
264
+
265
+
237
266
## License
238
267
239
268
HPCIC DevTools is distributed under the terms of the MIT license.
@@ -245,4 +274,4 @@ See [LICENSE](https://github.com/converged-computing/cloud-select/blob/main/LICE
0 commit comments