Skip to content

Commit f23d616

Browse files
Merge pull request #1753 from geremyCohen/ollama_on_gke
Applied suggested modifications
2 parents 02ffc00 + 3949f0f commit f23d616

File tree

8 files changed

+42
-21
lines changed

8 files changed

+42
-21
lines changed

content/learning-paths/servers-and-cloud-computing/multiarch_ollama_on_gke/0-spin_up_gke_cluster.md

Lines changed: 29 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ layout: learningpathall
88

99
## Project overview
1010

11-
Arm CPUs are widely used in AI/ML use cases. In this Learning Path, you will learn how to run [Ollama](https://ollama.com/) on Arm-based CPUs in a hybrid architecture (amd64 and arm64) K8s cluster.
11+
Arm CPUs are widely used in Kubernetes AI/ML use cases. In this Learning Path, you learn how to run [Ollama](https://ollama.com/) on Arm-based CPUs in a hybrid architecture (amd64 and arm64) K8s cluster.
1212

1313
To demonstrate this, you can bring up an initial Kubernetes cluster (depicted as "*1. Initial Cluster (amd64)*" in the image below) with an amd64 node running an Ollama Deployment and Service.
1414

@@ -49,11 +49,7 @@ Although this will work in all regions and zones where C4 and C4a instance types
4949

5050
8. Click on *NODE POOLS*->*Nodes*
5151
9. For *Series*, select *C4*
52-
10. For *Machine Type*, select *c4-standard-4*
53-
54-
{{% notice Note %}}
55-
The chosen node types support only one pod per node. If you wish to run multiple pods per node, each node should provide about 10GB memory per pod.
56-
{{% /notice %}}
52+
10. For *Machine Type*, select *c4-standard-8*
5753

5854
![Configure amd64 node type](images/configure-x86-note-type.png)
5955

@@ -63,11 +59,33 @@ It will take a few moments, but when the green checkmark is showing next to the
6359

6460
### Connect to the cluster
6561

66-
{{% notice Note %}}
67-
The following assumes you have gcloud and kubectl already installed. If not, please follow the instructions on the first page under "Prerequisites".
68-
{{% /notice %}}
62+
Before continuing, make sure you have *kubectl* and *gcloud* installed. You can verify by running each command, for example, entering *gcloud* and enter:
6963

70-
You'll first setup your newly created K8s cluster credentials using the gcloud utility. Enter the following in your command prompt (or cloud shell), and make sure to replace "YOUR_PROJECT_ID" with the ID of your GCP project:
64+
```bash
65+
gcloud
66+
```
67+
should return
68+
```output
69+
ERROR: (gcloud) Command name argument expected.
70+
...
71+
```
72+
and entering *kubectl* and enter should return:
73+
74+
```output
75+
kubectl controls the Kubernetes cluster manager.
76+
77+
Find more information at: https://kubernetes.io/docs/reference/kubectl/
78+
...
79+
```
80+
If you get something similar to:
81+
82+
```output
83+
command not found
84+
```
85+
86+
Please follow prerequisite instructions on the first page to install the missing utilities.
87+
88+
With prerequisites out of the way, you will next setup your newly created K8s cluster credentials using the gcloud utility. Enter the following in your command prompt (or cloud shell), and make sure to replace "YOUR_PROJECT_ID" with the ID of your GCP project:
7189

7290
```bash
7391
export ZONE=us-central1
@@ -81,8 +99,7 @@ If you get the message:
8199
```output
82100
CRITICAL: ACTION REQUIRED: gke-gcloud-auth-plugin, which is needed for continued use of kubectl, was not found or is not executable. Install gke-gcloud-auth-plugin for use with kubectl by following https://cloud.google.com/kubernetes-engine/docs/how-to/cluster-access-for-kubectl#install_plugin
83101
```
84-
85-
This command should help resolve it:
102+
This command will resolve it:
86103

87104
```bash
88105
gcloud components install gke-gcloud-auth-plugin

content/learning-paths/servers-and-cloud-computing/multiarch_ollama_on_gke/2-deploy-arm64.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ To add Arm nodes to the cluster:
2626
7. Select *C4A* : *c4a-standard-4* for Machine *Configuration/Type*.
2727

2828
{{% notice Note %}}
29-
To compare amd64 and arm64 performance, the c4a-standard-4 is used as the arm64 equivalent of the previously deployed c4-standard-4 in the amd64 node pool.
29+
To compare amd64 and arm64 performance, the c4a-standard-4 is used as the arm64 equivalent of the previously deployed c4-standard-8 in the amd64 node pool.
3030
{{% /notice %}}
3131

3232
![YAML Overview](images/arm_node_config-2.png)
@@ -146,7 +146,7 @@ service/ollama-arm64-svc created
146146
service/ollama-multiarch-svc created
147147
```
148148

149-
2. Get the status of the pods and the services by running the following:
149+
2. Get the status of the nodes, pods, and services by running the following:
150150

151151
```bash
152152
kubectl get nodes,pods,svc -nollama

content/learning-paths/servers-and-cloud-computing/multiarch_ollama_on_gke/3-perf-tests.md

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -72,14 +72,15 @@ The output is similar to:
7272

7373
```output
7474
...
75-
1023,13],"total_duration":15341522988,"load_duration":16209080,"prompt_eval_count":32,"prompt_eval_duration":164000000,"eval_count":93,"eval_duration":15159000000}
76-
Tokens per second: 6.13
75+
"prompt_eval_duration":79000000,"eval_count":72,"eval_duration":5484000000}
76+
Tokens per second: 13.12
7777
7878
Pod log output:
79-
[pod/ollama-arm64-deployment-678dc8556f-mj7gm/ollama-multiarch] 06:29:14
79+
80+
[pod/ollama-amd64-deployment-cbfc4b865-k2gc4/ollama-multiarch] 2025-03-27T00:25:21
8081
```
8182

82-
You can see tokens per second rate measured at 6.13 (from the log output example, your actual value may vary a bit).
83+
You can see tokens per second rate measured at 13.12 (from the log output example, your actual value may vary a bit).
8384

8485
Next, run the same inference on the arm64 node with the following command:
8586

@@ -94,10 +95,13 @@ Visually, you see the output streaming out faster on arm64 than on amd64. Look a
9495
Tokens per second: 14.47
9596
9697
Pod log output:
97-
[pod/ollama-arm64-deployment-678dc8556f-mj7gm/ollama-multiarch] 06:46:35
98+
99+
[pod/ollama-arm64-deployment-678dc8556f-md222/ollama-multiarch] 2025-03-27T00:26:30
98100
```
99101

100-
The output shows a more than a 2X performance increase between arm64 and amd64.
102+
The output shows more than a 15% performance increase of arm64 over amd64.
103+
104+
### Notes on Evaluating Price/Performance
101105

102106
### Price performance notes
103107

content/learning-paths/servers-and-cloud-computing/multiarch_ollama_on_gke/_index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: Use GKE to run Ollama on arm64 and amd64 nodes using a multi-architecture container image
2+
title: Run Ollama's multi-arch container image on GKE with arm64 and amd64 nodes.
33

44
minutes_to_complete: 30
55

-60.3 KB
Loading
-11.2 KB
Loading
390 KB
Loading
-14.9 KB
Loading

0 commit comments

Comments
 (0)