You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/learning-paths/servers-and-cloud-computing/multiarch_ollama_on_gke/0-spin_up_gke_cluster.md
+32-28Lines changed: 32 additions & 28 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,91 +1,95 @@
1
1
---
2
-
title: Spin up the GKE Cluster
2
+
title: Create the GKE Cluster
3
3
weight: 2
4
4
5
5
### FIXED, DO NOT MODIFY
6
6
layout: learningpathall
7
7
---
8
8
9
-
## Project overview
9
+
## Project Overview
10
10
11
-
Arm CPUs are widely used in Kubernetes AI/ML use cases. In this Learning Path, you learn how to run[Ollama](https://ollama.com/) on Arm-based CPUs in a hybrid architecture (amd64 and arm64) K8s cluster.
11
+
Arm CPUs are widely used in AI/ML workloads on Kubernetes. In this Learning Path, you'll learn how to deploy[Ollama](https://ollama.com/) on Arm-based CPUs within a hybrid architecture (amd64 and arm64) K8s cluster.
12
12
13
-
To demonstrate this, you can bring up an initial Kubernetes cluster (depicted as "*1. Initial Cluster (amd64)*" in the image below) with an amd64 node running an Ollama Deployment and Service.
13
+
First, you'll bring up an initial Kubernetes cluster with an amd64 node running an Ollama Deployment and Service (see **1: Initial Cluster (amd64)** in the image below).
14
14
15
-
Next, as depicted by "*2. Hybrid Cluster amd64/arm64*", you'll add the arm64 node, and apply an arm64 deployment and service to it, so that you can now test both architectures together, and separately, to investigate performance.
15
+
Next, you'll expand the cluster by adding an arm64 deployment and service to it, forming a hybrid cluster (**2: Hybrid Cluster amd64/arm64**). This allows you to test both architectures together, and separately, to investigate performance.
16
16
17
-
When you are satisfied with the arm64 performance over amd64, its easy to delete the amd64-specific node, deployment, and service, to complete the migration, as depicted in "*3. Migrated Cluster (arm64)*".
17
+
Once satisfied with arm64 performance, you can remove the amd64-specific node, deployment, and service, which then completes your migration to an arm64-only cluster (**3: Migrated Cluster (arm64)**.
18
18
19
19

20
20
21
-
Once you've seen how easy it is to add arm64 nodes to an existing cluster, you can apply the knowledge to experiment with arm64 nodes on other workloads in your environment.
21
+
Once you've seen how easy it is to add arm64 nodes to an existing cluster, you will be ready to explore arm64 nodes for other workloads in your environment.
22
22
23
23
### Create the cluster
24
24
25
-
1. From within the GCP Console, navigate to [Google Kubernetes Engine](https://console.cloud.google.com/kubernetes/list/overview) and click *Create*.
25
+
* In the GCP Console, navigate to [Google Kubernetes Engine](https://console.cloud.google.com/kubernetes/list/overview), then select **Create**.
26
26
27
-
2. Select *Standard*->*Configure*
27
+
* Select **Standard: You manage your cluster**, then **Configure**.
28
28
29
29

30
30
31
-
The *Cluster basics* tab appears.
31
+
On the **Cluster basics** tab:
32
32
33
-
3. For *Name*, enter *ollama-on-multiarch*
34
-
4. For *Region*, enter *us-central1*.
33
+
* For **Name**, enter `ollama-on-arm` (see **1**).
34
+
* For **Region**, enter `us-central1` (see **2**).
35
35
36
36

37
37
38
38
{{% notice Note %}}
39
-
Although this will work in all regions and zones where C4 and C4a instance types are supported, the`us-central1` and `us-central1-1a` regions and zones are used. For simplicity and cost savings, only one node per architecture is used.
39
+
Whilst this procedure works in all regions and zones supporting C4 and C4a instance types, this example uses`us-central1` and `us-central1-1a` regions and zones. For simplicity and cost savings, only one node per architecture is used.
40
40
{{% /notice %}}
41
41
42
-
5. Click on *NODE POOLS*->*default-pool*
43
-
6. For *Name*, enter *amd64-pool*
44
-
7. For size, enter *1*
45
-
8. Select *Specify node locations*, and select *us-central1-a*
42
+
* Under **NODE POOLS**, select **default-pool**.
43
+
* For **Name**, enter `amd64-pool`(see **1** below).
44
+
* For **Size**, enter **1** (see **2** below).
45
+
* Select **Specify node locations** (**3**), and select **us-central1-a** (**4**).
11.*Click* the *Create* button at the bottom of the screen.
56
+
*Click the **Create** button at the bottom of the screen.
57
57
58
-
It will take a few moments, but when the green checkmark is showing next to the `ollama-on-multiarch` cluster, you're ready to continue to test your connection to the cluster.
58
+
Wait until the cluster shows a green checkmark next to the `ollama-on-multiarch` cluster, then you're ready to continue to test your connection to the cluster.
59
59
60
60
### Connect to the cluster
61
61
62
-
Before continuing, make sure you have *kubectl* and *gcloud* installed. You can verify by running each command, for example, entering *gcloud* and enter:
62
+
Ensure you have `kubectl` and `gcloud` installed.
63
+
64
+
You can verify by running each command, for example, enter `gcloud`, and run:
63
65
64
66
```bash
65
67
gcloud
66
68
```
67
-
should return
69
+
This should return:
68
70
```output
69
71
ERROR: (gcloud) Command name argument expected.
70
72
...
71
73
```
72
-
and entering *kubectl* and enter should return:
74
+
Then enter `kubectl` and run it, which should return:
73
75
74
76
```output
75
77
kubectl controls the Kubernetes cluster manager.
76
78
77
79
Find more information at: https://kubernetes.io/docs/reference/kubectl/
78
80
...
79
81
```
80
-
If you get something similar to:
82
+
Otherwise, it might return a message like this:
81
83
82
84
```output
83
85
command not found
84
86
```
85
87
86
-
Please follow prerequisite instructions on the first page to install the missing utilities.
88
+
If you see this, follow the prerequisite instructions on the first page to install the missing utilities.
89
+
90
+
Now you can set up your newly-created K8s cluster credentials using the gcloud utility.
87
91
88
-
With prerequisites out of the way, you will next setup your newly created K8s cluster credentials using the gcloud utility. Enter the following in your command prompt (or cloud shell), and make sure to replace "YOUR_PROJECT_ID" with the ID of your GCP project:
92
+
Enter the following in your command prompt (or cloud shell), and make sure to replace `YOUR_PROJECT_ID` with the ID of your GCP project:
Copy file name to clipboardExpand all lines: content/learning-paths/servers-and-cloud-computing/multiarch_ollama_on_gke/1-deploy-amd64.md
+9-9Lines changed: 9 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,9 +6,9 @@ weight: 3
6
6
layout: learningpathall
7
7
---
8
8
9
-
In this section, you'll bootstrap the cluster with Ollama on amd64, to simulate an "existing" K8s cluster running Ollama. In the next section you will add arm64 nodes alongside the amd64 nodes so you can compare them.
9
+
## Deployment and service
10
10
11
-
### Deployment and service
11
+
In this section, you'll bootstrap the cluster with Ollama on amd64, simulating an existing Kubernetes (K8s) cluster running Ollama. In the next section, you'll add arm64 nodes alongside the amd64 nodes for performance comparison.
12
12
13
13
1. Use a text editor to copy the following YAML and save it to a file called `namespace.yaml`:
14
14
@@ -19,7 +19,7 @@ metadata:
19
19
name: ollama
20
20
```
21
21
22
-
When the above is applied, a new K8s namespace named`ollama` is created. This is where all the K8s objects will live.
22
+
Applying this YAML creates a new namespace called`ollama`, which contains all subsequent K8s objects.
23
23
24
24
2. Use a text editor to copy the following YAML and save it to a file called `amd64_ollama.yaml`:
25
25
@@ -81,7 +81,7 @@ When the above is applied:
81
81
82
82
Of particular interest is the `nodeSelector` `kubernetes.io/arch`, with the value of `amd64`. This ensures that the deployment only runs on amd64 nodes, utilizing the amd64 version of the Ollama container image.
83
83
84
-
* A new load balancer service `ollama-amd64-svc` is created, which targets all pods with the `arch: amd64` label (the amd64 deployment creates these pods).
84
+
* A new load balancer service `ollama-amd64-svc` is created, targeting all pods with the `arch: amd64` label (the amd64 deployment creates these pods).
85
85
86
86
A `sessionAffinity` tag is added to this service to remove sticky connections to the target pods. This removes persistent connections to the same pod on each request.
87
87
@@ -102,19 +102,19 @@ deployment.apps/ollama-amd64-deployment created
102
102
service/ollama-amd64-svc created
103
103
```
104
104
105
-
2. Optionally, set the `default Namespace` to `ollama` so you don't need to specify the namespace each time, by entering the following:
105
+
2. Optionally, set the `default Namespace` to `ollama` to simplify future commands:
106
106
107
107
```bash
108
108
config set-context --current --namespace=ollama
109
109
```
110
110
111
-
3. Get the status of the pods and the services by running the following:
111
+
3. Get the status of nodes, pods and services by running:
112
112
113
113
```bash
114
114
kubectl get nodes,pods,svc -nollama
115
115
```
116
116
117
-
Your output is similar to the following, showing one node, one pod, and one service:
117
+
Your output should be similar to the following, showing one node, one pod, and one service:
118
118
119
119
```output
120
120
NAME STATUS ROLES AGE VERSION
@@ -127,12 +127,12 @@ NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S)
When the pods show `Running` and the service shows a valid `External IP`, you are ready to test the Ollama amd64 service!
130
+
When the pods show `Running` and the service shows a valid `External IP`, you're ready to test the Ollama amd64 service.
131
131
132
132
### Test the Ollama web service on amd64
133
133
134
134
{{% notice Note %}}
135
-
The following utility `modelUtil.sh` is provided for convenience.
135
+
The following utility `model_util.sh` is provided for convenience.
136
136
137
137
It's a wrapper for kubectl, utilizing [curl](https://curl.se/), [jq](https://jqlang.org/), [bc](https://www.gnu.org/software/bc/), and [stdbuf](https://www.gnu.org/software/coreutils/manual/html_node/stdbuf-invocation.html).
Copy file name to clipboardExpand all lines: content/learning-paths/servers-and-cloud-computing/multiarch_ollama_on_gke/2-deploy-arm64.md
+10-11Lines changed: 10 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,20 +5,19 @@ weight: 4
5
5
### FIXED, DO NOT MODIFY
6
6
layout: learningpathall
7
7
---
8
+
## Adding the arm64-pool node pool
8
9
9
10
You have reached the point from which most projects start investigating migration to Arm. You have a workload running on an amd64 cluster and you want to evaluate the benefits of Arm.
10
11
11
12
In this section, you will add an Arm-based node pool to the cluster, and apply an Ollama Arm deployment and service to mimic what you did in the previous section.
12
13
13
-
### Adding the arm64-pool node pool
14
-
15
14
To add Arm nodes to the cluster:
16
15
17
-
1. From the Clusters menu, select *ollama-on-multiarch*
18
-
2. Select *Add node pool*
19
-
3. For *Name*, enter *arm64-pool*
20
-
4. For *Size*, enter *1*
21
-
5. Check *Specify node locations* and select *us-central1-a*
16
+
1. From the Clusters menu, select *ollama-on-multiarch*.
17
+
2. Select *Add node pool*.
18
+
3. For **Name**, enter `arm64-pool`.
19
+
4. For **Size**, enter `1`.
20
+
5. Check **Specify node locations** and select **us-central1-a**.
22
21
23
22

24
23
@@ -34,7 +33,7 @@ To compare amd64 and arm64 performance, the c4a-standard-4 is used as the arm64
34
33
8. Select *Create*
35
34
9. After provisioning completes, select the newly created *arm64-pool* from the *Clusters* screen to take you to the *Node pool details* page.
36
35
37
-
Notice the taint below that GKE applies by default to the Arm node of `NoSchedule` if`arch=arm64`:
36
+
Notice the default `NoSchedule` taint applied by GKE to Arm nodes with`arch=arm64`:
38
37
39
38

40
39
@@ -175,9 +174,9 @@ When the pods show `Running` and the service shows a valid `External IP`, you ar
175
174
176
175
To test the service, use the previously created `model_util.sh` from the previous section.
177
176
178
-
Instead of the `amd64` parameter, replace it with `arm64`:
177
+
Replace the `amd64` parameter with `arm64`:
179
178
180
-
3. Run the following to make an HTTP request to the amd64 ollama service on port 80:
179
+
3. Run the following to make an HTTP request to the arm64 ollama service on port 80:
Once again, if you see "Ollama is running" then you have successfully setup your GKE cluster with both amd64 and arm64 nodes and pods running a deployment with the Ollama multi-architecture container.
197
+
If you see the message "Ollama is running," you have successfully set up your GKE cluster with both amd64 and arm64 nodes, each running a deployment using the Ollama multi-architecture container.
199
198
200
199
Continue to the next section to analyze the performance.
Use the up arrow (command recall) and run the command multiple times in a row.
32
-
33
-
You see which exact pod was hit, amd64 or arm64, in the pod log output:
29
+
Use the command recall (up arrow) to repeat the command and observe responses from both amd64 and arm64 pods:
34
30
35
31
```output
36
32
[pod/ollama-amd64-... # amd64 pod was hit
37
33
[pod/ollama-arm64-... # arm64 pod was hit
38
34
```
39
35
40
-
You see both architectures responding to a "hello world" ping. Next try to load an LLM and investigate the performance of the Ollama pods.
36
+
With both architectures responding, you can now load an LLM to compare performance.
41
37
42
38
### Load the llama3.2 model into pods
43
39
44
40
{{% notice Note %}}
45
-
The llama3.2 model is used in this demonstration. Because [Ollama supports many different models](https://ollama-operator.ayaka.io/pages/en/guide/supported-models) you can modify the `model_util.sh` script to replace llama3.2 with other models.
41
+
The llama3.2 model is used in this demonstration. [Ollama supports multiple different models](https://ollama-operator.ayaka.io/pages/en/guide/supported-models); you can modify the `model_util.sh` script to test others.
46
42
{{% /notice %}}
47
43
48
-
Ollama will host and run models, but you need to first load the model before performing inference.
44
+
Ollama hosts and runs models, but first you need to load model before performing inference.
49
45
50
-
To do this, run the commands below:
46
+
To do this, run:
51
47
52
48
```bash
53
49
./model_util.sh amd64 pull
54
50
./model_util.sh arm64 pull
55
51
```
56
52
57
-
If the output ends with ```{"status":"success"}``` for each command, the model was pulled successfully.
53
+
If each model returns ```{"status":"success"}``` for each command, the models loaded successfully.
58
54
59
55
### Perform inference
60
56
61
-
Once the models are loaded into both pods, you can perform inference regardless of node architecture or individually by architecture type (amd64 or arm64).
57
+
Once the models are loaded into both pods, you can perform inference either regardless of node architecture or individually, by architecture type (amd64 or arm64).
62
58
63
-
By default, the prompt hardcoded into the `model_util.sh` script is `Create a sentence that makes sense in the English language, with as many palindromes in it as possible`, but you can change it to anything you want to try.
59
+
By default, the prompt hardcoded into the `model_util.sh` script is `Create a sentence that makes sense in the English language, with as many palindromes in it as possible`.
64
60
65
-
To see the inference performance on the amd64 pod:
The output shows more than a 15% performance increase of arm64 over amd64.
103
-
104
-
### Notes on Evaluating Price/Performance
100
+
In this example, the output shows more than a 15% performance increase of arm64 over amd64.
105
101
106
-
### Price performance notes
102
+
##Evaluating Price and Performance
107
103
108
-
We chose GKE amd64-based c4 and arm64-based c4a instances to compare similar virtual machines. Advertised similarly for memory and vCPU performance, pricing for arm64 vs other architectures is generally less expensive. If you're interested in learning more, browse your cloud providers' virtual machine pricing to see price/performance benefits of Arm processors for your workloads.
104
+
This Learning Path compared GKE amd64-based c4 against arm64-based c4a instances, both similarly specified for vCPU and memory. Typically, arm64 instances provide better cost efficiency. Check your cloud provider's pricing to confirm potential cost-performance advantages for your workloads.
109
105
110
-
###Summary
106
+
## Summary
111
107
112
108
In this Learning Path, you learned how to:
113
109
114
-
1.Bring up a GKE cluster with amd64 and arm64 nodes.
115
-
2.Use the same multi-architecture container image for both amd64 and arm64 Ollama deployments.
116
-
3. Compare inference performance on arm64 and amd64.
110
+
1.Create a GKE cluster with amd64 and arm64 nodes.
111
+
2.Deploy a multi-architecture container image for both amd64 and arm64 Ollama deployments.
112
+
3. Compare inference performance between arm64 and amd64.
117
113
118
-
You can adopt this methodology on your own workloads to see if Arm provides a price performance advantage.
114
+
You can use these insights to evaluate Arm's potential advantages for your workloads.
119
115
120
-
Make sure to shut down the test cluster and delete the resources you used.
116
+
Make sure to shutdown the test cluster and delete all resources after use.
0 commit comments