Skip to content

Commit 02ffc00

Browse files
Merge pull request #1752 from jasonrandrews/review
Reviewing Ollama on GKE
2 parents e88ae75 + c93dd28 commit 02ffc00

File tree

5 files changed

+92
-83
lines changed

5 files changed

+92
-83
lines changed

content/learning-paths/servers-and-cloud-computing/multiarch_ollama_on_gke/0-spin_up_gke_cluster.md

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -8,11 +8,11 @@ layout: learningpathall
88

99
## Project overview
1010

11-
Arm CPUs are widely used in traditional AI/ML use cases. In this Learning Path, you learn how to run [Ollama](https://ollama.com/) on Arm-based CPUs in a hybrid architecture (amd64 and arm64) K8s cluster.
11+
Arm CPUs are widely used in AI/ML use cases. In this Learning Path, you will learn how to run [Ollama](https://ollama.com/) on Arm-based CPUs in a hybrid architecture (amd64 and arm64) K8s cluster.
1212

1313
To demonstrate this, you can bring up an initial Kubernetes cluster (depicted as "*1. Initial Cluster (amd64)*" in the image below) with an amd64 node running an Ollama Deployment and Service.
1414

15-
Next, as depicted by "*2. Hybrid Cluster amd64/arm64*", you'll add the arm64 node, and apply an arm64 Deployment and Service to it, so that you can now test both architectures together, and separately, to investigate performance.
15+
Next, as depicted by "*2. Hybrid Cluster amd64/arm64*", you'll add the arm64 node, and apply an arm64 deployment and service to it, so that you can now test both architectures together, and separately, to investigate performance.
1616

1717
When you are satisfied with the arm64 performance over amd64, its easy to delete the amd64-specific node, deployment, and service, to complete the migration, as depicted in "*3. Migrated Cluster (arm64)*".
1818

@@ -52,14 +52,14 @@ Although this will work in all regions and zones where C4 and C4a instance types
5252
10. For *Machine Type*, select *c4-standard-4*
5353

5454
{{% notice Note %}}
55-
The chosen node types support only one pod per node. If you wish to run multiple pods per node, assume each node should provide about 10GB memory per pod.
55+
The chosen node types support only one pod per node. If you wish to run multiple pods per node, each node should provide about 10GB memory per pod.
5656
{{% /notice %}}
5757

5858
![Configure amd64 node type](images/configure-x86-note-type.png)
5959

6060
11. *Click* the *Create* button at the bottom of the screen.
6161

62-
It will take a few moments, but when the green checkmark is showing next to the *ollama-on-multiarch* cluster, you're ready to continue to test your connection to the cluster.
62+
It will take a few moments, but when the green checkmark is showing next to the `ollama-on-multiarch` cluster, you're ready to continue to test your connection to the cluster.
6363

6464
### Connect to the cluster
6565

@@ -75,19 +75,23 @@ export CLUSTER_NAME=ollama-on-multiarch
7575
export PROJECT_ID=YOUR_PROJECT_ID
7676
gcloud container clusters get-credentials $CLUSTER_NAME --zone $ZONE --project $PROJECT_ID
7777
```
78+
7879
If you get the message:
7980

80-
```commandline
81+
```output
8182
CRITICAL: ACTION REQUIRED: gke-gcloud-auth-plugin, which is needed for continued use of kubectl, was not found or is not executable. Install gke-gcloud-auth-plugin for use with kubectl by following https://cloud.google.com/kubernetes-engine/docs/how-to/cluster-access-for-kubectl#install_plugin
8283
```
84+
8385
This command should help resolve it:
8486

8587
```bash
8688
gcloud components install gke-gcloud-auth-plugin
8789
```
90+
8891
Finally, test the connection to the cluster with this command:
8992

9093
```commandline
9194
kubectl cluster-info
9295
```
93-
If you receive a non-error response, you're successfully connected to the k8s cluster!
96+
97+
If you receive a non-error response, you're successfully connected to the K8s cluster.

content/learning-paths/servers-and-cloud-computing/multiarch_ollama_on_gke/1-deploy-amd64.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -77,13 +77,13 @@ spec:
7777

7878
When the above is applied:
7979

80-
* A new Deployment called `ollama-amd64-deployment` is created. This deployment pulls a multi-architecture [Ollama image](https://hub.docker.com/layers/ollama/ollama/0.6.1/images/sha256-28b909914d4e77c96b1c57dea199c60ec12c5050d08ed764d9c234ba2944be63) from DockerHub.
80+
* A new deployment called `ollama-amd64-deployment` is created. This deployment pulls a multi-architecture [Ollama image](https://hub.docker.com/layers/ollama/ollama/0.6.1/images/sha256-28b909914d4e77c96b1c57dea199c60ec12c5050d08ed764d9c234ba2944be63) from DockerHub.
8181

8282
Of particular interest is the `nodeSelector` `kubernetes.io/arch`, with the value of `amd64`. This ensures that the deployment only runs on amd64 nodes, utilizing the amd64 version of the Ollama container image.
8383

84-
* A new load balancer Service `ollama-amd64-svc` is created, which targets all pods with the `arch: amd64` label (the amd64 deployment creates these pods).
84+
* A new load balancer service `ollama-amd64-svc` is created, which targets all pods with the `arch: amd64` label (the amd64 deployment creates these pods).
8585

86-
A `sessionAffinity` tag is added to this Service to remove sticky connections to the target pods. This removes persistent connections to the same pod on each request.
86+
A `sessionAffinity` tag is added to this service to remove sticky connections to the target pods. This removes persistent connections to the same pod on each request.
8787

8888
### Apply the amd64 deployment and service
8989

@@ -134,7 +134,7 @@ When the pods show `Running` and the service shows a valid `External IP`, you ar
134134
{{% notice Note %}}
135135
The following utility `modelUtil.sh` is provided for convenience.
136136

137-
It's a wrapper for kubectl, utilizing the utilities [curl](https://curl.se/), [jq](https://jqlang.org/), [bc](https://www.gnu.org/software/bc/), and [stdbuf](https://www.gnu.org/software/coreutils/manual/html_node/stdbuf-invocation.html).
137+
It's a wrapper for kubectl, utilizing [curl](https://curl.se/), [jq](https://jqlang.org/), [bc](https://www.gnu.org/software/bc/), and [stdbuf](https://www.gnu.org/software/coreutils/manual/html_node/stdbuf-invocation.html).
138138

139139
Make sure you have these shell utilities installed before running.
140140
{{% /notice %}}
@@ -248,7 +248,7 @@ The script conveniently bundles many test and logging commands into a single pla
248248
./model_util.sh amd64 hello
249249
```
250250

251-
You get back the HTTP response, as well as the logline from the pod that served it:
251+
You get back the HTTP response, as well as the log line from the pod that served it:
252252

253253
```output
254254
Server response:
@@ -260,6 +260,6 @@ Pod log output:
260260
[pod/ollama-amd64-deployment-cbfc4b865-msftf/ollama-multiarch] 2025-03-25T21:13:49.022522588Z
261261
```
262262

263-
If you see the output `Ollama is running` you have successfully bootstrapped your GKE cluster with an amd64 node, running a deployment with the Ollama multi-architecture container instance!
263+
If you see the output `Ollama is running` you have successfully bootstrapped your GKE cluster with an amd64 node, running a deployment with the Ollama multi-architecture container instance.
264264

265265
Continue to the next section to do the same thing, but with an Arm node.

content/learning-paths/servers-and-cloud-computing/multiarch_ollama_on_gke/2-deploy-arm64.md

Lines changed: 32 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -6,10 +6,9 @@ weight: 4
66
layout: learningpathall
77
---
88

9-
## Overview
10-
At this point you have a what many people in their K8s Arm journey start with -- a workload running on an amd64 cluster. As mentioned earlier, the easiest way to experiment with Arm in your K8s cluster is to run both architectures simultaneously, not just for the sake of learning how to do it, but also to see first-hand the price/performance advantages of running Arm-based nodes.
9+
You have reached the point from which most projects start investigating migration to Arm. You have a workload running on an amd64 cluster and you want to evaluate the benefits of Arm.
1110

12-
Next, you'll add an Arm-based node pool to the cluster, and from there, apply an ollama Arm deployment and service to mimic what we did in the last chapter.
11+
In this section, you will add an Arm-based node pool to the cluster, and apply an Ollama Arm deployment and service to mimic what you did in the previous section.
1312

1413
### Adding the arm64-pool node pool
1514

@@ -27,29 +26,30 @@ To add Arm nodes to the cluster:
2726
7. Select *C4A* : *c4a-standard-4* for Machine *Configuration/Type*.
2827

2928
{{% notice Note %}}
30-
To make an apples-to-apples comparison of amd64 and arm64 performance, the c4a-standard-4 is spun up as the arm64 "equivalent" of the previously deployed c4-standard-4 in the amd64 node pool.
29+
To compare amd64 and arm64 performance, the c4a-standard-4 is used as the arm64 equivalent of the previously deployed c4-standard-4 in the amd64 node pool.
3130
{{% /notice %}}
3231

3332
![YAML Overview](images/arm_node_config-2.png)
3433

3534
8. Select *Create*
3635
9. After provisioning completes, select the newly created *arm64-pool* from the *Clusters* screen to take you to the *Node pool details* page.
3736

38-
Note the taint GKE applies by default to the Arm Node of *NoSchedule* if arch=arm64:
37+
Notice the taint below that GKE applies by default to the Arm node of `NoSchedule` if `arch=arm64`:
3938

4039
![arm node taint](images/taint_on_arm_node.png)
4140

42-
Without a toleration for this taint, we won't be able to schedule any workloads on it! But do not fear, as the nodeSelector in the amd64 (and as you will shortly see, the arm64) Deployment YAMLs not only defines which architecture to target, [but in the arm64 use case](https://cloud.google.com/kubernetes-engine/docs/how-to/prepare-arm-workloads-for-deployment#schedule-with-node-selector-arm), it also adds the required toleration automatically.
41+
Without a toleration for this taint, you won't be able to schedule any workloads on it. The nodeSelector in the amd64 (and as you will shortly see, the arm64) deployment YAMLs not only defines which architecture to target, [but in the arm64 use case](https://cloud.google.com/kubernetes-engine/docs/how-to/prepare-arm-workloads-for-deployment#schedule-with-node-selector-arm), it also adds the required toleration automatically.
4342

4443
```yaml
4544
nodeSelector:
46-
kubernetes.io/arch: arm64 # or amd64
45+
kubernetes.io/arch: arm64
4746
```
4847
49-
### Deployment and Service
50-
We can now apply the arm64-based deployment.
48+
### Deployment and service
5149
52-
1. Copy the following YAML, and save it to a file called arm64_ollama.yaml:
50+
You can now apply the arm64-based deployment.
51+
52+
1. Use a text editor to copy the following YAML, and save it to a file called `arm64_ollama.yaml`:
5353

5454
```yaml
5555
apiVersion: apps/v1
@@ -121,40 +121,40 @@ spec:
121121

122122
When the above is applied:
123123

124-
* A new Deployment called *ollama-arm64-deployment* is created. Like the amd64 deployment, it pulls the same multi-architectural (both amd64 and arm64) image from Dockerhub [ollama image from Dockerhub](https://hub.docker.com/layers/ollama/ollama/0.6.1/images/sha256-28b909914d4e77c96b1c57dea199c60ec12c5050d08ed764d9c234ba2944be63).
124+
* A new Deployment called `ollama-arm64-deployment` is created. Like the amd64 deployment, it pulls the same multi-architecture image from DockerHub.
125125

126-
Of particular interest is the *nodeSelector* *kubernetes.io/arch*, with the value of *arm64*. This will ensure that this deployment only runs on arm64-based nodes, utilizing the arm64 layer of the ollama multi-architecture container image. As mentioned earlier, this *nodeSelector* triggers the automatic creation of the toleration for the arm64 nodes.
126+
Of particular interest is the `nodeSelector` `kubernetes.io/arch`, with the value of `arm64`. This ensures that the deployment runs on arm64-based nodes, utilizing the arm64 layer of the Ollama multi-architecture container image. The `nodeSelector` triggers the automatic creation of the toleration for the arm64 nodes.
127127

128-
* Two new load balancer Services are created. The first, *ollama-arm64-svc* is created, analogous to the existing service, targets all pods with the *arch: arm64* label (our arm64 deployment creates these pods.) The second service, *ollama-multiarch-svc*, target ALL Pods, regardless of the architecture they are running. This service will show us how we can mix and match pods in production to serve the same app regardless of node/pod architecture.
128+
* Two new load balancer services are created. The first, `ollama-arm64-svc` is created, analogous to the existing service, and targets all pods with the `arch: arm64` label (the arm64 deployment creates these pods). The second service, `ollama-multiarch-svc`, targets all pods, regardless of the architecture. This service shows how you can mix and match pods in production to serve the same application regardless of node/pod architecture.
129129

130-
You may also notice that a *sessionAffinity* tag was added to this Service to remove sticky connections to the target pods; this removes persistent connections to the same pod on each request.
130+
A `sessionAffinity` tag is added to this service to remove sticky connections to the target pods. This removes persistent connections to the same pod on each request.
131131

132132

133133
### Apply the arm64 Deployment and Service
134134

135-
1. Run the following command to apply the arm64 deployment, and service definitions:
135+
1. Run the following command to apply the arm64 deployment and service definitions:
136136

137137
```bash
138138
kubectl apply -f arm64_ollama.yaml
139139
```
140140

141-
You should get the following responses back:
141+
You see the following responses:
142142

143-
```bash
143+
```output
144144
deployment.apps/ollama-arm64-deployment created
145145
service/ollama-arm64-svc created
146146
service/ollama-multiarch-svc created
147147
```
148148

149-
2. Get the status of the pods, and the services, by running the following:
149+
2. Get the status of the pods and the services by running the following:
150150

151-
```commandline
151+
```bash
152152
kubectl get nodes,pods,svc -nollama
153153
```
154154

155-
Your output should be similar to the following, showing two nodes, two pods, and three services:
155+
Your output is similar to the following, showing two nodes, two pods, and three services:
156156

157-
```commandline
157+
```output
158158
NAME STATUS ROLES AGE VERSION
159159
node/gke-ollama-on-arm-amd64-pool-62c0835c-93ht Ready <none> 91m v1.31.6-gke.1020000
160160
node/gke-ollama-on-arm-arm64-pool-2ae0d1f0-pqrf Ready <none> 4m11s v1.31.6-gke.1020000
@@ -169,21 +169,23 @@ service/ollama-arm64-svc LoadBalancer 1.2.3.4 1.2.3.4
169169
service/ollama-multiarch-svc LoadBalancer 1.2.3.4 1.2.3.4 80:30667/TCP 2m52s
170170
```
171171

172-
When the pods show *Running* and the service shows a valid *External IP*, we're ready to test the ollama arm64 service!
172+
When the pods show `Running` and the service shows a valid `External IP`, you are ready to test the Ollama arm64 service.
173173

174-
### Test the ollama on arm web service
174+
### Test the Ollama web service on arm64
175175

176-
To test the service, use the previously created model_util.sh from the last section; instead of the *amd64* parameter, replace it with *arm64*:
176+
To test the service, use the previously created `model_util.sh` from the previous section.
177+
178+
Instead of the `amd64` parameter, replace it with `arm64`:
177179

178180
3. Run the following to make an HTTP request to the amd64 ollama service on port 80:
179181

180-
```commandline
182+
```bash
181183
./model_util.sh arm64 hello
182184
```
183185

184-
You should get back the HTTP response, as well as the logline from the pod that served it:
186+
You get back the HTTP response, as well as the log line from the pod that served it:
185187

186-
```commandline
188+
```output
187189
Server response:
188190
Using service endpoint 34.44.135.90 for hello on arm64
189191
Ollama is running
@@ -192,6 +194,7 @@ Pod log output:
192194
193195
[pod/ollama-arm64-deployment-678dc8556f-956d6/ollama-multiarch] 2025-03-25T21:25:21.547384356Z
194196
```
195-
Once again, we're looking for "Ollama is running". If you see that, congrats, you've successfully setup your GKE cluster with both amd64 and arm64 nodes and pods running a Deployment with the ollama multi-architecture container!
196197

197-
Next, let's do some simple analysis of the cluster's performance.
198+
Once again, if you see "Ollama is running" then you have successfully setup your GKE cluster with both amd64 and arm64 nodes and pods running a deployment with the Ollama multi-architecture container.
199+
200+
Continue to the next section to analyze the performance.

0 commit comments

Comments
 (0)