Skip to content

Commit d75e81f

Browse files
Merge pull request #1774 from madeline-underwood/Ollama
Ollama_JA to review
2 parents 9cd41a5 + 79ee095 commit d75e81f

File tree

5 files changed

+86
-87
lines changed

5 files changed

+86
-87
lines changed

content/learning-paths/servers-and-cloud-computing/multiarch_ollama_on_gke/0-spin_up_gke_cluster.md

Lines changed: 32 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -1,91 +1,95 @@
11
---
2-
title: Spin up the GKE Cluster
2+
title: Create the GKE Cluster
33
weight: 2
44

55
### FIXED, DO NOT MODIFY
66
layout: learningpathall
77
---
88

9-
## Project overview
9+
## Project Overview
1010

11-
Arm CPUs are widely used in Kubernetes AI/ML use cases. In this Learning Path, you learn how to run [Ollama](https://ollama.com/) on Arm-based CPUs in a hybrid architecture (amd64 and arm64) K8s cluster.
11+
Arm CPUs are widely used in AI/ML workloads on Kubernetes. In this Learning Path, you'll learn how to deploy [Ollama](https://ollama.com/) on Arm-based CPUs within a hybrid architecture (amd64 and arm64) K8s cluster.
1212

13-
To demonstrate this, you can bring up an initial Kubernetes cluster (depicted as "*1. Initial Cluster (amd64)*" in the image below) with an amd64 node running an Ollama Deployment and Service.
13+
First, you'll bring up an initial Kubernetes cluster with an amd64 node running an Ollama Deployment and Service (see **1: Initial Cluster (amd64)** in the image below).
1414

15-
Next, as depicted by "*2. Hybrid Cluster amd64/arm64*", you'll add the arm64 node, and apply an arm64 deployment and service to it, so that you can now test both architectures together, and separately, to investigate performance.
15+
Next, you'll expand the cluster by adding an arm64 deployment and service to it, forming a hybrid cluster (**2: Hybrid Cluster amd64/arm64**). This allows you to test both architectures together, and separately, to investigate performance.
1616

17-
When you are satisfied with the arm64 performance over amd64, its easy to delete the amd64-specific node, deployment, and service, to complete the migration, as depicted in "*3. Migrated Cluster (arm64)*".
17+
Once satisfied with arm64 performance, you can remove the amd64-specific node, deployment, and service, which then completes your migration to an arm64-only cluster (**3: Migrated Cluster (arm64)**.
1818

1919
![Project Overview](images/general_flow.png)
2020

21-
Once you've seen how easy it is to add arm64 nodes to an existing cluster, you can apply the knowledge to experiment with arm64 nodes on other workloads in your environment.
21+
Once you've seen how easy it is to add arm64 nodes to an existing cluster, you will be ready to explore arm64 nodes for other workloads in your environment.
2222

2323
### Create the cluster
2424

25-
1. From within the GCP Console, navigate to [Google Kubernetes Engine](https://console.cloud.google.com/kubernetes/list/overview) and click *Create*.
25+
* In the GCP Console, navigate to [Google Kubernetes Engine](https://console.cloud.google.com/kubernetes/list/overview), then select **Create**.
2626

27-
2. Select *Standard*->*Configure*
27+
* Select **Standard: You manage your cluster**, then **Configure**.
2828

2929
![Select and Configure Cluster Type](images/select_standard.png)
3030

31-
The *Cluster basics* tab appears.
31+
On the **Cluster basics** tab:
3232

33-
3. For *Name*, enter *ollama-on-multiarch*
34-
4. For *Region*, enter *us-central1*.
33+
* For **Name**, enter `ollama-on-arm` (see **1**).
34+
* For **Region**, enter `us-central1` (see **2**).
3535

3636
![Select and Configure Cluster Type](images/cluster_basics.png)
3737

3838
{{% notice Note %}}
39-
Although this will work in all regions and zones where C4 and C4a instance types are supported, the `us-central1` and `us-central1-1a` regions and zones are used. For simplicity and cost savings, only one node per architecture is used.
39+
Whilst this procedure works in all regions and zones supporting C4 and C4a instance types, this example uses `us-central1` and `us-central1-1a` regions and zones. For simplicity and cost savings, only one node per architecture is used.
4040
{{% /notice %}}
4141

42-
5. Click on *NODE POOLS*->*default-pool*
43-
6. For *Name*, enter *amd64-pool*
44-
7. For size, enter *1*
45-
8. Select *Specify node locations*, and select *us-central1-a*
42+
* Under **NODE POOLS**, select **default-pool**.
43+
* For **Name**, enter `amd64-pool`(see **1** below).
44+
* For **Size**, enter **1** (see **2** below).
45+
* Select **Specify node locations** (**3**), and select **us-central1-a** (**4**).
4646

4747
![Configure amd64 Node pool](images/x86-node-pool.png)
4848

4949

50-
8. Click on *NODE POOLS*->*Nodes*
51-
9. For *Series*, select *C4*
52-
10. For *Machine Type*, select *c4-standard-8*
50+
* Click on **NODE POOLS**->**Nodes**
51+
* For **Series**, select **C4** (see **1** below).
52+
* For **Machine Type**, select **c4-standard-8** (see **2**).
5353

5454
![Configure amd64 node type](images/configure-x86-note-type.png)
5555

56-
11. *Click* the *Create* button at the bottom of the screen.
56+
* Click the **Create** button at the bottom of the screen.
5757

58-
It will take a few moments, but when the green checkmark is showing next to the `ollama-on-multiarch` cluster, you're ready to continue to test your connection to the cluster.
58+
Wait until the cluster shows a green checkmark next to the `ollama-on-multiarch` cluster, then you're ready to continue to test your connection to the cluster.
5959

6060
### Connect to the cluster
6161

62-
Before continuing, make sure you have *kubectl* and *gcloud* installed. You can verify by running each command, for example, entering *gcloud* and enter:
62+
Ensure you have `kubectl` and `gcloud` installed.
63+
64+
You can verify by running each command, for example, enter `gcloud`, and run:
6365

6466
```bash
6567
gcloud
6668
```
67-
should return
69+
This should return:
6870
```output
6971
ERROR: (gcloud) Command name argument expected.
7072
...
7173
```
72-
and entering *kubectl* and enter should return:
74+
Then enter `kubectl` and run it, which should return:
7375

7476
```output
7577
kubectl controls the Kubernetes cluster manager.
7678
7779
Find more information at: https://kubernetes.io/docs/reference/kubectl/
7880
...
7981
```
80-
If you get something similar to:
82+
Otherwise, it might return a message like this:
8183

8284
```output
8385
command not found
8486
```
8587

86-
Please follow prerequisite instructions on the first page to install the missing utilities.
88+
If you see this, follow the prerequisite instructions on the first page to install the missing utilities.
89+
90+
Now you can set up your newly-created K8s cluster credentials using the gcloud utility.
8791

88-
With prerequisites out of the way, you will next setup your newly created K8s cluster credentials using the gcloud utility. Enter the following in your command prompt (or cloud shell), and make sure to replace "YOUR_PROJECT_ID" with the ID of your GCP project:
92+
Enter the following in your command prompt (or cloud shell), and make sure to replace `YOUR_PROJECT_ID` with the ID of your GCP project:
8993

9094
```bash
9195
export ZONE=us-central1

content/learning-paths/servers-and-cloud-computing/multiarch_ollama_on_gke/1-deploy-amd64.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,9 @@ weight: 3
66
layout: learningpathall
77
---
88

9-
In this section, you'll bootstrap the cluster with Ollama on amd64, to simulate an "existing" K8s cluster running Ollama. In the next section you will add arm64 nodes alongside the amd64 nodes so you can compare them.
9+
## Deployment and service
1010

11-
### Deployment and service
11+
In this section, you'll bootstrap the cluster with Ollama on amd64, simulating an existing Kubernetes (K8s) cluster running Ollama. In the next section, you'll add arm64 nodes alongside the amd64 nodes for performance comparison.
1212

1313
1. Use a text editor to copy the following YAML and save it to a file called `namespace.yaml`:
1414

@@ -19,7 +19,7 @@ metadata:
1919
name: ollama
2020
```
2121
22-
When the above is applied, a new K8s namespace named `ollama` is created. This is where all the K8s objects will live.
22+
Applying this YAML creates a new namespace called `ollama`, which contains all subsequent K8s objects.
2323

2424
2. Use a text editor to copy the following YAML and save it to a file called `amd64_ollama.yaml`:
2525

@@ -81,7 +81,7 @@ When the above is applied:
8181

8282
Of particular interest is the `nodeSelector` `kubernetes.io/arch`, with the value of `amd64`. This ensures that the deployment only runs on amd64 nodes, utilizing the amd64 version of the Ollama container image.
8383

84-
* A new load balancer service `ollama-amd64-svc` is created, which targets all pods with the `arch: amd64` label (the amd64 deployment creates these pods).
84+
* A new load balancer service `ollama-amd64-svc` is created, targeting all pods with the `arch: amd64` label (the amd64 deployment creates these pods).
8585

8686
A `sessionAffinity` tag is added to this service to remove sticky connections to the target pods. This removes persistent connections to the same pod on each request.
8787

@@ -102,19 +102,19 @@ deployment.apps/ollama-amd64-deployment created
102102
service/ollama-amd64-svc created
103103
```
104104

105-
2. Optionally, set the `default Namespace` to `ollama` so you don't need to specify the namespace each time, by entering the following:
105+
2. Optionally, set the `default Namespace` to `ollama` to simplify future commands:
106106

107107
```bash
108108
config set-context --current --namespace=ollama
109109
```
110110

111-
3. Get the status of the pods and the services by running the following:
111+
3. Get the status of nodes, pods and services by running:
112112

113113
```bash
114114
kubectl get nodes,pods,svc -nollama
115115
```
116116

117-
Your output is similar to the following, showing one node, one pod, and one service:
117+
Your output should be similar to the following, showing one node, one pod, and one service:
118118

119119
```output
120120
NAME STATUS ROLES AGE VERSION
@@ -127,12 +127,12 @@ NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S)
127127
service/ollama-amd64-svc LoadBalancer 1.2.2.3 1.2.3.4 80:30668/TCP 16m
128128
```
129129

130-
When the pods show `Running` and the service shows a valid `External IP`, you are ready to test the Ollama amd64 service!
130+
When the pods show `Running` and the service shows a valid `External IP`, you're ready to test the Ollama amd64 service.
131131

132132
### Test the Ollama web service on amd64
133133

134134
{{% notice Note %}}
135-
The following utility `modelUtil.sh` is provided for convenience.
135+
The following utility `model_util.sh` is provided for convenience.
136136

137137
It's a wrapper for kubectl, utilizing [curl](https://curl.se/), [jq](https://jqlang.org/), [bc](https://www.gnu.org/software/bc/), and [stdbuf](https://www.gnu.org/software/coreutils/manual/html_node/stdbuf-invocation.html).
138138

content/learning-paths/servers-and-cloud-computing/multiarch_ollama_on_gke/2-deploy-arm64.md

Lines changed: 10 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -5,20 +5,19 @@ weight: 4
55
### FIXED, DO NOT MODIFY
66
layout: learningpathall
77
---
8+
## Adding the arm64-pool node pool
89

910
You have reached the point from which most projects start investigating migration to Arm. You have a workload running on an amd64 cluster and you want to evaluate the benefits of Arm.
1011

1112
In this section, you will add an Arm-based node pool to the cluster, and apply an Ollama Arm deployment and service to mimic what you did in the previous section.
1213

13-
### Adding the arm64-pool node pool
14-
1514
To add Arm nodes to the cluster:
1615

17-
1. From the Clusters menu, select *ollama-on-multiarch*
18-
2. Select *Add node pool*
19-
3. For *Name*, enter *arm64-pool*
20-
4. For *Size*, enter *1*
21-
5. Check *Specify node locations* and select *us-central1-a*
16+
1. From the Clusters menu, select *ollama-on-multiarch*.
17+
2. Select *Add node pool*.
18+
3. For **Name**, enter `arm64-pool`.
19+
4. For **Size**, enter `1`.
20+
5. Check **Specify node locations** and select **us-central1-a**.
2221

2322
![YAML Overview](images/arm_node_config-1.png)
2423

@@ -34,7 +33,7 @@ To compare amd64 and arm64 performance, the c4a-standard-4 is used as the arm64
3433
8. Select *Create*
3534
9. After provisioning completes, select the newly created *arm64-pool* from the *Clusters* screen to take you to the *Node pool details* page.
3635

37-
Notice the taint below that GKE applies by default to the Arm node of `NoSchedule` if `arch=arm64`:
36+
Notice the default `NoSchedule` taint applied by GKE to Arm nodes with `arch=arm64`:
3837

3938
![arm node taint](images/taint_on_arm_node.png)
4039

@@ -175,9 +174,9 @@ When the pods show `Running` and the service shows a valid `External IP`, you ar
175174

176175
To test the service, use the previously created `model_util.sh` from the previous section.
177176

178-
Instead of the `amd64` parameter, replace it with `arm64`:
177+
Replace the `amd64` parameter with `arm64`:
179178

180-
3. Run the following to make an HTTP request to the amd64 ollama service on port 80:
179+
3. Run the following to make an HTTP request to the arm64 ollama service on port 80:
181180

182181
```bash
183182
./model_util.sh arm64 hello
@@ -195,6 +194,6 @@ Pod log output:
195194
[pod/ollama-arm64-deployment-678dc8556f-956d6/ollama-multiarch] 2025-03-25T21:25:21.547384356Z
196195
```
197196

198-
Once again, if you see "Ollama is running" then you have successfully setup your GKE cluster with both amd64 and arm64 nodes and pods running a deployment with the Ollama multi-architecture container.
197+
If you see the message "Ollama is running," you have successfully set up your GKE cluster with both amd64 and arm64 nodes, each running a deployment using the Ollama multi-architecture container.
199198

200199
Continue to the next section to analyze the performance.
Lines changed: 28 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,22 @@
11
---
2-
title: Testing functionality and performance
2+
title: Test functionality and performance
33
weight: 5
44

55
### FIXED, DO NOT MODIFY
66
layout: learningpathall
77
---
8+
## Use the multiarch service
89

9-
Now that you have a hybrid cluster running Ollama, you can investigate the advantages of running on Arm.
10-
11-
### Use the multiarch service to run the application on any platform
10+
With your hybrid cluster running Ollama, you can now explore the advantages of running on Arm.
1211

1312
You may wish to access Ollama without regard to architecture.
1413

15-
To send a request to either based on availability, run the following command:
14+
To send a request to either based on availability, run:
1615

1716
```bash
1817
./model_util.sh multiarch hello
1918
```
20-
21-
You see a server response, and the pod that handled the request prefixed with the deployment, node, pod, and timestamp:
19+
The response indicates which pod handled the request, along with its deployment, node, and timestamp:
2220

2321
```commandline
2422
Server response:
@@ -28,47 +26,47 @@ Pod log output:
2826
[pod/ollama-amd64-deployment-cbfc4b865-rf4p9/ollama-multiarch] 06:25:48
2927
```
3028

31-
Use the up arrow (command recall) and run the command multiple times in a row.
32-
33-
You see which exact pod was hit, amd64 or arm64, in the pod log output:
29+
Use the command recall (up arrow) to repeat the command and observe responses from both amd64 and arm64 pods:
3430

3531
```output
3632
[pod/ollama-amd64-... # amd64 pod was hit
3733
[pod/ollama-arm64-... # arm64 pod was hit
3834
```
3935

40-
You see both architectures responding to a "hello world" ping. Next try to load an LLM and investigate the performance of the Ollama pods.
36+
With both architectures responding, you can now load an LLM to compare performance.
4137

4238
### Load the llama3.2 model into pods
4339

4440
{{% notice Note %}}
45-
The llama3.2 model is used in this demonstration. Because [Ollama supports many different models](https://ollama-operator.ayaka.io/pages/en/guide/supported-models) you can modify the `model_util.sh` script to replace llama3.2 with other models.
41+
The llama3.2 model is used in this demonstration. [Ollama supports multiple different models](https://ollama-operator.ayaka.io/pages/en/guide/supported-models); you can modify the `model_util.sh` script to test others.
4642
{{% /notice %}}
4743

48-
Ollama will host and run models, but you need to first load the model before performing inference.
44+
Ollama hosts and runs models, but first you need to load model before performing inference.
4945

50-
To do this, run the commands below:
46+
To do this, run:
5147

5248
```bash
5349
./model_util.sh amd64 pull
5450
./model_util.sh arm64 pull
5551
```
5652

57-
If the output ends with ```{"status":"success"}``` for each command, the model was pulled successfully.
53+
If each model returns ```{"status":"success"}``` for each command, the models loaded successfully.
5854

5955
### Perform inference
6056

61-
Once the models are loaded into both pods, you can perform inference regardless of node architecture or individually by architecture type (amd64 or arm64).
57+
Once the models are loaded into both pods, you can perform inference either regardless of node architecture or individually, by architecture type (amd64 or arm64).
6258

63-
By default, the prompt hardcoded into the `model_util.sh` script is `Create a sentence that makes sense in the English language, with as many palindromes in it as possible`, but you can change it to anything you want to try.
59+
By default, the prompt hardcoded into the `model_util.sh` script is `Create a sentence that makes sense in the English language, with as many palindromes in it as possible`.
6460

65-
To see the inference performance on the amd64 pod:
61+
You can modify the prompt as desired.
62+
63+
Test inference on the amd64 pod:
6664

6765
```bash
6866
./model_util.sh amd64 infer
6967
```
7068

71-
The output is similar to:
69+
Example output:
7270

7371
```output
7472
...
@@ -80,15 +78,15 @@ Pod log output:
8078
[pod/ollama-amd64-deployment-cbfc4b865-k2gc4/ollama-multiarch] 2025-03-27T00:25:21
8179
```
8280

83-
You can see tokens per second rate measured at 13.12 (from the log output example, your actual value may vary a bit).
81+
You can see tokens per second rate measured at 13.12 (from the log output example, your actual value might vary).
8482

8583
Next, run the same inference on the arm64 node with the following command:
8684

8785
```bash
8886
./model_util.sh arm64 infer
8987
```
9088

91-
Visually, you see the output streaming out faster on arm64 than on amd64. Look at the output to verify it is indeed faster.
89+
You will notice the output streams faster on arm64 compared to amd64. Review the tokens-per-second metric to verify the performance difference.
9290

9391
```output
9492
4202,72,426,13],"total_duration":13259950101,"load_duration":1257990283,"prompt_eval_count":32,"prompt_eval_duration":1431000000,"eval_count":153,"eval_duration":10570000000}
@@ -99,23 +97,21 @@ Pod log output:
9997
[pod/ollama-arm64-deployment-678dc8556f-md222/ollama-multiarch] 2025-03-27T00:26:30
10098
```
10199

102-
The output shows more than a 15% performance increase of arm64 over amd64.
103-
104-
### Notes on Evaluating Price/Performance
100+
In this example, the output shows more than a 15% performance increase of arm64 over amd64.
105101

106-
### Price performance notes
102+
## Evaluating Price and Performance
107103

108-
We chose GKE amd64-based c4 and arm64-based c4a instances to compare similar virtual machines. Advertised similarly for memory and vCPU performance, pricing for arm64 vs other architectures is generally less expensive. If you're interested in learning more, browse your cloud providers' virtual machine pricing to see price/performance benefits of Arm processors for your workloads.
104+
This Learning Path compared GKE amd64-based c4 against arm64-based c4a instances, both similarly specified for vCPU and memory. Typically, arm64 instances provide better cost efficiency. Check your cloud provider's pricing to confirm potential cost-performance advantages for your workloads.
109105

110-
### Summary
106+
## Summary
111107

112108
In this Learning Path, you learned how to:
113109

114-
1. Bring up a GKE cluster with amd64 and arm64 nodes.
115-
2. Use the same multi-architecture container image for both amd64 and arm64 Ollama deployments.
116-
3. Compare inference performance on arm64 and amd64.
110+
1. Create a GKE cluster with amd64 and arm64 nodes.
111+
2. Deploy a multi-architecture container image for both amd64 and arm64 Ollama deployments.
112+
3. Compare inference performance between arm64 and amd64.
117113

118-
You can adopt this methodology on your own workloads to see if Arm provides a price performance advantage.
114+
You can use these insights to evaluate Arm's potential advantages for your workloads.
119115

120-
Make sure to shut down the test cluster and delete the resources you used.
116+
Make sure to shutdown the test cluster and delete all resources after use.
121117

0 commit comments

Comments
 (0)