You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/learning-paths/servers-and-cloud-computing/multiarch_ollama_on_gke/0-spin_up_gke_cluster.md
+10-6Lines changed: 10 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,11 +8,11 @@ layout: learningpathall
8
8
9
9
## Project overview
10
10
11
-
Arm CPUs are widely used in traditional AI/ML use cases. In this Learning Path, you learn how to run [Ollama](https://ollama.com/) on Arm-based CPUs in a hybrid architecture (amd64 and arm64) K8s cluster.
11
+
Arm CPUs are widely used in AI/ML use cases. In this Learning Path, you will learn how to run [Ollama](https://ollama.com/) on Arm-based CPUs in a hybrid architecture (amd64 and arm64) K8s cluster.
12
12
13
13
To demonstrate this, you can bring up an initial Kubernetes cluster (depicted as "*1. Initial Cluster (amd64)*" in the image below) with an amd64 node running an Ollama Deployment and Service.
14
14
15
-
Next, as depicted by "*2. Hybrid Cluster amd64/arm64*", you'll add the arm64 node, and apply an arm64 Deployment and Service to it, so that you can now test both architectures together, and separately, to investigate performance.
15
+
Next, as depicted by "*2. Hybrid Cluster amd64/arm64*", you'll add the arm64 node, and apply an arm64 deployment and service to it, so that you can now test both architectures together, and separately, to investigate performance.
16
16
17
17
When you are satisfied with the arm64 performance over amd64, its easy to delete the amd64-specific node, deployment, and service, to complete the migration, as depicted in "*3. Migrated Cluster (arm64)*".
18
18
@@ -52,14 +52,14 @@ Although this will work in all regions and zones where C4 and C4a instance types
52
52
10. For *Machine Type*, select *c4-standard-4*
53
53
54
54
{{% notice Note %}}
55
-
The chosen node types support only one pod per node. If you wish to run multiple pods per node, assume each node should provide about 10GB memory per pod.
55
+
The chosen node types support only one pod per node. If you wish to run multiple pods per node, each node should provide about 10GB memory per pod.
11.*Click* the *Create* button at the bottom of the screen.
61
61
62
-
It will take a few moments, but when the green checkmark is showing next to the *ollama-on-multiarch* cluster, you're ready to continue to test your connection to the cluster.
62
+
It will take a few moments, but when the green checkmark is showing next to the `ollama-on-multiarch` cluster, you're ready to continue to test your connection to the cluster.
CRITICAL: ACTION REQUIRED: gke-gcloud-auth-plugin, which is needed for continued use of kubectl, was not found or is not executable. Install gke-gcloud-auth-plugin for use with kubectl by following https://cloud.google.com/kubernetes-engine/docs/how-to/cluster-access-for-kubectl#install_plugin
82
83
```
84
+
83
85
This command should help resolve it:
84
86
85
87
```bash
86
88
gcloud components install gke-gcloud-auth-plugin
87
89
```
90
+
88
91
Finally, test the connection to the cluster with this command:
89
92
90
93
```commandline
91
94
kubectl cluster-info
92
95
```
93
-
If you receive a non-error response, you're successfully connected to the k8s cluster!
96
+
97
+
If you receive a non-error response, you're successfully connected to the K8s cluster.
Copy file name to clipboardExpand all lines: content/learning-paths/servers-and-cloud-computing/multiarch_ollama_on_gke/1-deploy-amd64.md
+6-6Lines changed: 6 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -77,13 +77,13 @@ spec:
77
77
78
78
When the above is applied:
79
79
80
-
* A new Deployment called `ollama-amd64-deployment` is created. This deployment pulls a multi-architecture [Ollama image](https://hub.docker.com/layers/ollama/ollama/0.6.1/images/sha256-28b909914d4e77c96b1c57dea199c60ec12c5050d08ed764d9c234ba2944be63) from DockerHub.
80
+
* A new deployment called `ollama-amd64-deployment` is created. This deployment pulls a multi-architecture [Ollama image](https://hub.docker.com/layers/ollama/ollama/0.6.1/images/sha256-28b909914d4e77c96b1c57dea199c60ec12c5050d08ed764d9c234ba2944be63) from DockerHub.
81
81
82
82
Of particular interest is the `nodeSelector` `kubernetes.io/arch`, with the value of `amd64`. This ensures that the deployment only runs on amd64 nodes, utilizing the amd64 version of the Ollama container image.
83
83
84
-
* A new load balancer Service `ollama-amd64-svc` is created, which targets all pods with the `arch: amd64` label (the amd64 deployment creates these pods).
84
+
* A new load balancer service `ollama-amd64-svc` is created, which targets all pods with the `arch: amd64` label (the amd64 deployment creates these pods).
85
85
86
-
A `sessionAffinity` tag is added to this Service to remove sticky connections to the target pods. This removes persistent connections to the same pod on each request.
86
+
A `sessionAffinity` tag is added to this service to remove sticky connections to the target pods. This removes persistent connections to the same pod on each request.
87
87
88
88
### Apply the amd64 deployment and service
89
89
@@ -134,7 +134,7 @@ When the pods show `Running` and the service shows a valid `External IP`, you ar
134
134
{{% notice Note %}}
135
135
The following utility `modelUtil.sh` is provided for convenience.
136
136
137
-
It's a wrapper for kubectl, utilizing the utilities [curl](https://curl.se/), [jq](https://jqlang.org/), [bc](https://www.gnu.org/software/bc/), and [stdbuf](https://www.gnu.org/software/coreutils/manual/html_node/stdbuf-invocation.html).
137
+
It's a wrapper for kubectl, utilizing [curl](https://curl.se/), [jq](https://jqlang.org/), [bc](https://www.gnu.org/software/bc/), and [stdbuf](https://www.gnu.org/software/coreutils/manual/html_node/stdbuf-invocation.html).
138
138
139
139
Make sure you have these shell utilities installed before running.
140
140
{{% /notice %}}
@@ -248,7 +248,7 @@ The script conveniently bundles many test and logging commands into a single pla
248
248
./model_util.sh amd64 hello
249
249
```
250
250
251
-
You get back the HTTP response, as well as the logline from the pod that served it:
251
+
You get back the HTTP response, as well as the log line from the pod that served it:
If you see the output `Ollama is running` you have successfully bootstrapped your GKE cluster with an amd64 node, running a deployment with the Ollama multi-architecture container instance!
263
+
If you see the output `Ollama is running` you have successfully bootstrapped your GKE cluster with an amd64 node, running a deployment with the Ollama multi-architecture container instance.
264
264
265
265
Continue to the next section to do the same thing, but with an Arm node.
Copy file name to clipboardExpand all lines: content/learning-paths/servers-and-cloud-computing/multiarch_ollama_on_gke/2-deploy-arm64.md
+32-29Lines changed: 32 additions & 29 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,10 +6,9 @@ weight: 4
6
6
layout: learningpathall
7
7
---
8
8
9
-
## Overview
10
-
At this point you have a what many people in their K8s Arm journey start with -- a workload running on an amd64 cluster. As mentioned earlier, the easiest way to experiment with Arm in your K8s cluster is to run both architectures simultaneously, not just for the sake of learning how to do it, but also to see first-hand the price/performance advantages of running Arm-based nodes.
9
+
You have reached the point from which most projects start investigating migration to Arm. You have a workload running on an amd64 cluster and you want to evaluate the benefits of Arm.
11
10
12
-
Next, you'll add an Arm-based node pool to the cluster, and from there, apply an ollama Arm deployment and service to mimic what we did in the last chapter.
11
+
In this section, you will add an Arm-based node pool to the cluster, and apply an Ollama Arm deployment and service to mimic what you did in the previous section.
13
12
14
13
### Adding the arm64-pool node pool
15
14
@@ -27,29 +26,30 @@ To add Arm nodes to the cluster:
27
26
7. Select *C4A* : *c4a-standard-4* for Machine *Configuration/Type*.
28
27
29
28
{{% notice Note %}}
30
-
To make an apples-to-apples comparison of amd64 and arm64 performance, the c4a-standard-4 is spun up as the arm64 "equivalent" of the previously deployed c4-standard-4 in the amd64 node pool.
29
+
To compare amd64 and arm64 performance, the c4a-standard-4 is used as the arm64 equivalent of the previously deployed c4-standard-4 in the amd64 node pool.
31
30
{{% /notice %}}
32
31
33
32

34
33
35
34
8. Select *Create*
36
35
9. After provisioning completes, select the newly created *arm64-pool* from the *Clusters* screen to take you to the *Node pool details* page.
37
36
38
-
Note the taint GKE applies by default to the Arm Node of *NoSchedule* if arch=arm64:
37
+
Notice the taint below that GKE applies by default to the Arm node of `NoSchedule` if `arch=arm64`:
39
38
40
39

41
40
42
-
Without a toleration for this taint, we won't be able to schedule any workloads on it! But do not fear, as the nodeSelector in the amd64 (and as you will shortly see, the arm64) Deployment YAMLs not only defines which architecture to target, [but in the arm64 use case](https://cloud.google.com/kubernetes-engine/docs/how-to/prepare-arm-workloads-for-deployment#schedule-with-node-selector-arm), it also adds the required toleration automatically.
41
+
Without a toleration for this taint, you won't be able to schedule any workloads on it. The nodeSelector in the amd64 (and as you will shortly see, the arm64) deployment YAMLs not only defines which architecture to target, [but in the arm64 use case](https://cloud.google.com/kubernetes-engine/docs/how-to/prepare-arm-workloads-for-deployment#schedule-with-node-selector-arm), it also adds the required toleration automatically.
43
42
44
43
```yaml
45
44
nodeSelector:
46
-
kubernetes.io/arch: arm64 # or amd64
45
+
kubernetes.io/arch: arm64
47
46
```
48
47
49
-
### Deployment and Service
50
-
We can now apply the arm64-based deployment.
48
+
### Deployment and service
51
49
52
-
1. Copy the following YAML, and save it to a file called arm64_ollama.yaml:
50
+
You can now apply the arm64-based deployment.
51
+
52
+
1. Use a text editor to copy the following YAML, and save it to a file called `arm64_ollama.yaml`:
53
53
54
54
```yaml
55
55
apiVersion: apps/v1
@@ -121,40 +121,40 @@ spec:
121
121
122
122
When the above is applied:
123
123
124
-
* A new Deployment called *ollama-arm64-deployment* is created. Like the amd64 deployment, it pulls the same multi-architectural (both amd64 and arm64) image from Dockerhub [ollama image from Dockerhub](https://hub.docker.com/layers/ollama/ollama/0.6.1/images/sha256-28b909914d4e77c96b1c57dea199c60ec12c5050d08ed764d9c234ba2944be63).
124
+
* A new Deployment called `ollama-arm64-deployment` is created. Like the amd64 deployment, it pulls the same multi-architecture image from DockerHub.
125
125
126
-
Of particular interest is the *nodeSelector* *kubernetes.io/arch*, with the value of *arm64*. This will ensure that this deployment only runs on arm64-based nodes, utilizing the arm64 layer of the ollama multi-architecture container image. As mentioned earlier, this *nodeSelector* triggers the automatic creation of the toleration for the arm64 nodes.
126
+
Of particular interest is the `nodeSelector` `kubernetes.io/arch`, with the value of `arm64`. This ensures that the deployment runs on arm64-based nodes, utilizing the arm64 layer of the Ollama multi-architecture container image. The `nodeSelector` triggers the automatic creation of the toleration for the arm64 nodes.
127
127
128
-
* Two new load balancer Services are created. The first, *ollama-arm64-svc* is created, analogous to the existing service, targets all pods with the *arch: arm64* label (our arm64 deployment creates these pods.) The second service, *ollama-multiarch-svc*, target ALL Pods, regardless of the architecture they are running. This service will show us how we can mix and match pods in production to serve the same app regardless of node/pod architecture.
128
+
* Two new load balancer services are created. The first, `ollama-arm64-svc` is created, analogous to the existing service, and targets all pods with the `arch: arm64` label (the arm64 deployment creates these pods). The second service, `ollama-multiarch-svc`, targets all pods, regardless of the architecture. This service shows how you can mix and match pods in production to serve the same application regardless of node/pod architecture.
129
129
130
-
You may also notice that a *sessionAffinity* tag was added to this Service to remove sticky connections to the target pods; this removes persistent connections to the same pod on each request.
130
+
A `sessionAffinity` tag is added to this service to remove sticky connections to the target pods. This removes persistent connections to the same pod on each request.
131
131
132
132
133
133
### Apply the arm64 Deployment and Service
134
134
135
-
1. Run the following command to apply the arm64 deployment, and service definitions:
135
+
1. Run the following command to apply the arm64 deployment and service definitions:
136
136
137
137
```bash
138
138
kubectl apply -f arm64_ollama.yaml
139
139
```
140
140
141
-
You should get the following responses back:
141
+
You see the following responses:
142
142
143
-
```bash
143
+
```output
144
144
deployment.apps/ollama-arm64-deployment created
145
145
service/ollama-arm64-svc created
146
146
service/ollama-multiarch-svc created
147
147
```
148
148
149
-
2. Get the status of the pods, and the services, by running the following:
149
+
2. Get the status of the pods and the services by running the following:
150
150
151
-
```commandline
151
+
```bash
152
152
kubectl get nodes,pods,svc -nollama
153
153
```
154
154
155
-
Your output should be similar to the following, showing two nodes, two pods, and three services:
155
+
Your output is similar to the following, showing two nodes, two pods, and three services:
Once again, we're looking for "Ollama is running". If you see that, congrats, you've successfully setup your GKE cluster with both amd64 and arm64 nodes and pods running a Deployment with the ollama multi-architecture container!
196
197
197
-
Next, let's do some simple analysis of the cluster's performance.
198
+
Once again, if you see "Ollama is running" then you have successfully setup your GKE cluster with both amd64 and arm64 nodes and pods running a deployment with the Ollama multi-architecture container.
199
+
200
+
Continue to the next section to analyze the performance.
0 commit comments