Merge pull request #1774 from madeline-underwood/Ollama

jasonrandrews · web-flow · commit d75e81fc4665 · 2025-03-31T11:05:40.000-05:00
Ollama_JA to review
diff --git a/content/learning-paths/servers-and-cloud-computing/multiarch_ollama_on_gke/0-spin_up_gke_cluster.md b/content/learning-paths/servers-and-cloud-computing/multiarch_ollama_on_gke/0-spin_up_gke_cluster.md
@@ -1,91 +1,95 @@
 ---
-title: Spin up the GKE Cluster
+title: Create the GKE Cluster
 weight: 2
 
 ### FIXED, DO NOT MODIFY
 layout: learningpathall
 ---
 
-## Project overview
+## Project Overview
 
-Arm CPUs are widely used in Kubernetes AI/ML use cases. In this Learning Path, you learn how to run [Ollama](https://ollama.com/) on Arm-based CPUs in a hybrid architecture (amd64 and arm64) K8s cluster.
+Arm CPUs are widely used in  AI/ML workloads on Kubernetes. In this Learning Path, you'll learn how to deploy [Ollama](https://ollama.com/) on Arm-based CPUs within a hybrid architecture (amd64 and arm64) K8s cluster.
 
-To demonstrate this, you can bring up an initial Kubernetes cluster (depicted as "*1. Initial Cluster (amd64)*" in the image below) with an amd64 node running an Ollama Deployment and Service.
+First, you'll bring up an initial Kubernetes cluster with an amd64 node running an Ollama Deployment and Service (see **1:  Initial Cluster (amd64)** in the image below).
 
-Next, as depicted by "*2. Hybrid Cluster amd64/arm64*", you'll add the arm64 node, and apply an arm64 deployment and service to it, so that you can now test both architectures together, and separately, to investigate performance. 
+Next, you'll expand the cluster by adding an arm64 deployment and service to it, forming a hybrid cluster (**2: Hybrid Cluster amd64/arm64**). This allows you to test both architectures together, and separately, to investigate performance. 
 
-When you are satisfied with the arm64 performance over amd64, its easy to delete the amd64-specific node, deployment, and service, to complete the migration, as depicted in "*3. Migrated Cluster (arm64)*".
+Once satisfied with arm64 performance, you can remove the amd64-specific node, deployment, and service, which then completes your migration to an arm64-only cluster (**3: Migrated Cluster (arm64)**.
 
 ![Project Overview](images/general_flow.png)
 
-Once you've seen how easy it is to add arm64 nodes to an existing cluster, you can apply the knowledge to experiment with arm64 nodes on other workloads in your environment.
+Once you've seen how easy it is to add arm64 nodes to an existing cluster, you will be ready to  explore arm64 nodes for other workloads in your environment.
  
 ### Create the cluster
 
-1. From within the GCP Console, navigate to [Google Kubernetes Engine](https://console.cloud.google.com/kubernetes/list/overview) and click *Create*.
+* In the GCP Console, navigate to [Google Kubernetes Engine](https://console.cloud.google.com/kubernetes/list/overview), then select **Create**.
 
-2. Select *Standard*->*Configure*
+* Select **Standard: You manage your cluster**, then **Configure**.
 
 ![Select and Configure Cluster Type](images/select_standard.png)
 
-The *Cluster basics* tab appears.
+On the **Cluster basics** tab:
 
-3. For *Name*, enter *ollama-on-multiarch*
-4. For *Region*, enter *us-central1*.
+* For **Name**, enter `ollama-on-arm` (see **1**).
+* For **Region**, enter `us-central1` (see **2**).
 
 ![Select and Configure Cluster Type](images/cluster_basics.png)
 
 {{% notice Note %}}
-Although this will work in all regions and zones where C4 and C4a instance types are supported, the `us-central1` and `us-central1-1a` regions and zones are used. For simplicity and cost savings, only one node per architecture is used. 
+Whilst this procedure works in all regions and zones supporting C4 and C4a instance types, this example uses `us-central1` and `us-central1-1a` regions and zones. For simplicity and cost savings, only one node per architecture is used. 
 {{% /notice %}}
 
-5. Click on *NODE POOLS*->*default-pool*
-6. For *Name*, enter *amd64-pool*
-7. For size, enter *1*
-8. Select *Specify node locations*, and select *us-central1-a*
+* Under **NODE POOLS**, select **default-pool**.
+* For **Name**, enter `amd64-pool`(see **1** below).
+* For **Size**, enter **1** (see **2** below).
+* Select **Specify node locations** (**3**), and select **us-central1-a** (**4**).
 
 ![Configure amd64 Node pool](images/x86-node-pool.png)
 
 
-8. Click on *NODE POOLS*->*Nodes*
-9. For *Series*, select *C4*
-10. For *Machine Type*, select *c4-standard-8*
+* Click on **NODE POOLS**->**Nodes**
+* For **Series**, select **C4** (see **1** below).
+* For **Machine Type**, select **c4-standard-8** (see **2**).
 
 ![Configure amd64 node type](images/configure-x86-note-type.png)
 
-11. *Click* the *Create* button at the bottom of the screen.
+* Click the **Create** button at the bottom of the screen.
 
-It will take a few moments, but when the green checkmark is showing next to the `ollama-on-multiarch` cluster, you're ready to continue to test your connection to the cluster.
+Wait until the cluster shows a green checkmark next to the `ollama-on-multiarch` cluster, then you're ready to continue to test your connection to the cluster.
 
 ### Connect to the cluster
 
-Before continuing, make sure you have *kubectl* and *gcloud* installed.  You can verify by running each command, for example, entering *gcloud* and enter:
+Ensure you have `kubectl` and `gcloud` installed. 
+
+You can verify by running each command, for example, enter `gcloud`, and run:
 
 ```bash
 gcloud
 ```
-should return
+This should return:
 ```output
 ERROR: (gcloud) Command name argument expected.
 ... 
 ```
-and entering *kubectl* and enter should return:
+Then enter `kubectl` and run it, which should return:
 
 ```output
 kubectl controls the Kubernetes cluster manager.
 
  Find more information at: https://kubernetes.io/docs/reference/kubectl/
 ...
 ```
-If you get something similar to:
+Otherwise, it might return a message like this:
 
 ```output
 command not found
 ```
 
-Please follow prerequisite instructions on the first page to install the missing utilities.
+If you see this, follow the prerequisite instructions on the first page to install the missing utilities.
+
+Now you can set up your newly-created K8s cluster credentials using the gcloud utility.  
 
-With prerequisites out of the way, you will next setup your newly created K8s cluster credentials using the gcloud utility.  Enter the following in your command prompt (or cloud shell), and make sure to replace "YOUR_PROJECT_ID" with the ID of your GCP project:
+Enter the following in your command prompt (or cloud shell), and make sure to replace `YOUR_PROJECT_ID` with the ID of your GCP project:
 
 ```bash
 export ZONE=us-central1
diff --git a/content/learning-paths/servers-and-cloud-computing/multiarch_ollama_on_gke/1-deploy-amd64.md b/content/learning-paths/servers-and-cloud-computing/multiarch_ollama_on_gke/1-deploy-amd64.md
@@ -6,9 +6,9 @@ weight: 3
 layout: learningpathall
 ---
 
-In this section, you'll bootstrap the cluster with Ollama on amd64, to simulate an "existing" K8s cluster running Ollama. In the next section you will add arm64 nodes alongside the amd64 nodes so you can compare them. 
+## Deployment and service
 
-### Deployment and service
+In this section, you'll bootstrap the cluster with Ollama on amd64, simulating an existing Kubernetes (K8s) cluster running Ollama. In the next section, you'll add arm64 nodes alongside the amd64 nodes for performance comparison. 
 
 1. Use a text editor to copy the following YAML and save it to a file called `namespace.yaml`:
 
@@ -19,7 +19,7 @@ metadata:
   name: ollama
 ```
 
-When the above is applied, a new K8s namespace named `ollama` is created.  This is where all the K8s objects will live.
+Applying this YAML creates a new namespace called `ollama`, which contains all subsequent K8s objects.
 
 2. Use a text editor to copy the following YAML and save it to a file called `amd64_ollama.yaml`:
 
@@ -81,7 +81,7 @@ When the above is applied:
 
 Of particular interest is the `nodeSelector` `kubernetes.io/arch`, with the value of `amd64`.  This ensures that the deployment only runs on amd64 nodes, utilizing the amd64 version of the Ollama container image. 
 
-* A new load balancer service `ollama-amd64-svc` is created, which targets all pods with the `arch: amd64` label (the amd64 deployment creates these pods).
+* A new load balancer service `ollama-amd64-svc` is created, targeting all pods with the `arch: amd64` label (the amd64 deployment creates these pods).
 
 A `sessionAffinity` tag is added to this service to remove sticky connections to the target pods. This removes persistent connections to the same pod on each request.
 
@@ -102,19 +102,19 @@ deployment.apps/ollama-amd64-deployment created
 service/ollama-amd64-svc created
 ```
 
-2. Optionally, set the `default Namespace` to `ollama` so you don't need to specify the namespace each time, by entering the following:
+2. Optionally, set the `default Namespace` to `ollama` to simplify future commands:
 
 ```bash
 config set-context --current --namespace=ollama
 ```
 
-3. Get the status of the pods and the services by running the following:
+3. Get the status of nodes, pods and services by running:
 
 ```bash
 kubectl get nodes,pods,svc -nollama 
 ```
 
-Your output is similar to the following, showing one node, one pod, and one service:
+Your output should be similar to the following, showing one node, one pod, and one service:
 
 ```output
 NAME                                              STATUS   ROLES    AGE   VERSION
@@ -127,12 +127,12 @@ NAME                       TYPE           CLUSTER-IP      EXTERNAL-IP    PORT(S)
 service/ollama-amd64-svc   LoadBalancer   1.2.2.3         1.2.3.4        80:30668/TCP   16m
 ```
 
-When the pods show `Running` and the service shows a valid `External IP`, you are ready to test the Ollama amd64 service!
+When the pods show `Running` and the service shows a valid `External IP`, you're ready to test the Ollama amd64 service.
 
 ### Test the Ollama web service on amd64
 
 {{% notice Note %}}
-The following utility `modelUtil.sh` is provided for convenience. 
+The following utility `model_util.sh` is provided for convenience. 
 
 It's a wrapper for kubectl, utilizing [curl](https://curl.se/), [jq](https://jqlang.org/), [bc](https://www.gnu.org/software/bc/), and [stdbuf](https://www.gnu.org/software/coreutils/manual/html_node/stdbuf-invocation.html).  
 
diff --git a/content/learning-paths/servers-and-cloud-computing/multiarch_ollama_on_gke/2-deploy-arm64.md b/content/learning-paths/servers-and-cloud-computing/multiarch_ollama_on_gke/2-deploy-arm64.md
@@ -5,20 +5,19 @@ weight: 4
 ### FIXED, DO NOT MODIFY
 layout: learningpathall
 ---
+## Adding the arm64-pool node pool
 
 You have reached the point from which most projects start investigating migration to Arm. You have a workload running on an amd64 cluster and you want to evaluate the benefits of Arm.
 
 In this section, you will add an Arm-based node pool to the cluster, and apply an Ollama Arm deployment and service to mimic what you did in the previous section.
 
-### Adding the arm64-pool node pool
-
 To add Arm nodes to the cluster:
 
-1. From the Clusters menu, select *ollama-on-multiarch*
-2. Select *Add node pool*
-3. For *Name*, enter *arm64-pool*
-4. For *Size*, enter *1*
-5. Check *Specify node locations* and select *us-central1-a*
+1. From the Clusters menu, select *ollama-on-multiarch*.
+2. Select *Add node pool*.
+3. For **Name**, enter `arm64-pool`.
+4. For **Size**, enter `1`.
+5. Check **Specify node locations** and select **us-central1-a**.
 
 ![YAML Overview](images/arm_node_config-1.png)
 
@@ -34,7 +33,7 @@ To compare amd64 and arm64 performance, the c4a-standard-4 is used as the arm64
 8. Select *Create*
 9. After provisioning completes, select the newly created *arm64-pool* from the *Clusters* screen to take you to the *Node pool details* page.
 
-Notice the taint below that GKE applies by default to the Arm node of `NoSchedule` if `arch=arm64`:
+Notice the default `NoSchedule` taint applied by GKE to Arm nodes with `arch=arm64`:
 
 ![arm node taint](images/taint_on_arm_node.png)
 
@@ -175,9 +174,9 @@ When the pods show `Running` and the service shows a valid `External IP`, you ar
 
 To test the service, use the previously created `model_util.sh` from the previous section.
 
-Instead of the `amd64` parameter, replace it with `arm64`:
+Replace the `amd64` parameter with `arm64`:
 
-3. Run the following to make an HTTP request to the amd64 ollama service on port 80:
+3. Run the following to make an HTTP request to the arm64 ollama service on port 80:
 
 ```bash
 ./model_util.sh arm64 hello
@@ -195,6 +194,6 @@ Pod log output:
 [pod/ollama-arm64-deployment-678dc8556f-956d6/ollama-multiarch] 2025-03-25T21:25:21.547384356Z
 ```
 
-Once again, if you see "Ollama is running" then you have successfully setup your GKE cluster with both amd64 and arm64 nodes and pods running a deployment with the Ollama multi-architecture container.
+If you see the message "Ollama is running," you have successfully set up your GKE cluster with both amd64 and arm64 nodes, each running a deployment using the Ollama multi-architecture container.
 
 Continue to the next section to analyze the performance. 
diff --git a/content/learning-paths/servers-and-cloud-computing/multiarch_ollama_on_gke/3-perf-tests.md b/content/learning-paths/servers-and-cloud-computing/multiarch_ollama_on_gke/3-perf-tests.md
@@ -1,24 +1,22 @@
 ---
-title: Testing functionality and performance
+title: Test functionality and performance
 weight: 5
 
 ### FIXED, DO NOT MODIFY
 layout: learningpathall
 ---
+## Use the multiarch service 
 
-Now that you have a hybrid cluster running Ollama, you can investigate the advantages of running on Arm.
-
-### Use the multiarch service to run the application on any platform
+With your hybrid cluster running Ollama, you can now explore the advantages of running on Arm.
 
 You may wish to access Ollama without regard to architecture.
 
-To send a request to either based on availability, run the following command:
+To send a request to either based on availability, run:
 
 ```bash
 ./model_util.sh multiarch hello
 ```
-
-You see a server response, and the pod that handled the request prefixed with the deployment, node, pod, and timestamp:
+The response indicates which pod handled the request, along with its deployment, node, and timestamp:
 
 ```commandline
 Server response:
@@ -28,47 +26,47 @@ Pod log output:
 [pod/ollama-amd64-deployment-cbfc4b865-rf4p9/ollama-multiarch] 06:25:48
 ```
 
-Use the up arrow (command recall) and run the command multiple times in a row. 
-
-You see which exact pod was hit, amd64 or arm64, in the pod log output:
+Use the command recall (up arrow) to repeat the command and observe responses from both amd64 and arm64 pods:
 
 ```output
 [pod/ollama-amd64-... # amd64 pod was hit
 [pod/ollama-arm64-... # arm64 pod was hit
 ```
 
-You see both architectures responding to a "hello world" ping.  Next try to load an LLM and investigate the performance of the Ollama pods. 
+With both architectures responding, you can now load an LLM to compare performance.
 
 ### Load the llama3.2 model into pods
 
 {{% notice Note %}}
-The llama3.2 model is used in this demonstration.  Because [Ollama supports many different models](https://ollama-operator.ayaka.io/pages/en/guide/supported-models) you can modify the `model_util.sh` script to replace llama3.2 with other models.
+The llama3.2 model is used in this demonstration. [Ollama supports multiple different models](https://ollama-operator.ayaka.io/pages/en/guide/supported-models); you can modify the `model_util.sh` script to test others.
 {{% /notice %}}
 
-Ollama will host and run models, but you need to first load the model before performing inference.  
+Ollama hosts and runs models, but first you need to load model before performing inference.  
 
-To do this, run the commands below:
+To do this, run:
 
 ```bash
 ./model_util.sh amd64 pull
 ./model_util.sh arm64 pull
 ```
 
-If the output ends with ```{"status":"success"}``` for each command, the model was pulled successfully.
+If each model returns ```{"status":"success"}``` for each command, the models loaded successfully.
 
 ### Perform inference
 
-Once the models are loaded into both pods, you can perform inference regardless of node architecture or individually by architecture type (amd64 or arm64).
+Once the models are loaded into both pods, you can perform inference either regardless of node architecture or individually, by architecture type (amd64 or arm64).
 
-By default, the prompt hardcoded into the `model_util.sh` script is `Create a sentence that makes sense in the English language, with as many palindromes in it as possible`, but you can change it to anything you want to try. 
+By default, the prompt hardcoded into the `model_util.sh` script is `Create a sentence that makes sense in the English language, with as many palindromes in it as possible`.
 
-To see the inference performance on the amd64 pod:
+You can modify the prompt as desired.
+
+Test inference on the amd64 pod:
 
 ```bash
 ./model_util.sh amd64 infer
 ```
 
-The output is similar to: 
+Example output: 
 
 ```output
 ...
@@ -80,15 +78,15 @@ Pod log output:
 [pod/ollama-amd64-deployment-cbfc4b865-k2gc4/ollama-multiarch] 2025-03-27T00:25:21
 ```
 
-You can see tokens per second rate measured at 13.12 (from the log output example, your actual value may vary a bit).
+You can see tokens per second rate measured at 13.12 (from the log output example, your actual value might vary).
 
 Next, run the same inference on the arm64 node with the following command:
 
 ```bash
 ./model_util.sh arm64 infer
 ```
 
-Visually, you see the output streaming out faster on arm64 than on amd64. Look at the output to verify it is indeed faster.
+You will notice the output streams faster on arm64 compared to amd64. Review the tokens-per-second metric to verify the performance difference.
 
 ```output
 4202,72,426,13],"total_duration":13259950101,"load_duration":1257990283,"prompt_eval_count":32,"prompt_eval_duration":1431000000,"eval_count":153,"eval_duration":10570000000}
@@ -99,23 +97,21 @@ Pod log output:
 [pod/ollama-arm64-deployment-678dc8556f-md222/ollama-multiarch] 2025-03-27T00:26:30
 ```
 
-The output shows more than a 15% performance increase of arm64 over amd64.
-
-### Notes on Evaluating Price/Performance
+In this example, the output shows more than a 15% performance increase of arm64 over amd64.
 
-### Price performance notes
+## Evaluating Price and Performance
 
-We chose GKE amd64-based c4 and arm64-based c4a instances to compare similar virtual machines. Advertised similarly for memory and vCPU performance, pricing for arm64 vs other architectures is generally less expensive.  If you're interested in learning more, browse your cloud providers' virtual machine pricing to see price/performance benefits of Arm processors for your workloads.
+This Learning Path compared GKE amd64-based c4 against arm64-based c4a instances, both similarly specified for vCPU and memory. Typically, arm64 instances provide better cost efficiency. Check your cloud provider's pricing to confirm potential cost-performance advantages for your workloads.
 
-### Summary
+## Summary
 
 In this Learning Path, you learned how to:
 
-1. Bring up a GKE cluster with amd64 and arm64 nodes.
-2. Use the same multi-architecture container image for both amd64 and arm64 Ollama deployments.
-3. Compare inference performance on arm64 and amd64.
+1. Create a GKE cluster with amd64 and arm64 nodes.
+2. Deploy a multi-architecture container image for both amd64 and arm64 Ollama deployments.
+3. Compare inference performance between arm64 and amd64.
 
-You can adopt this methodology on your own workloads to see if Arm provides a price performance advantage.
+You can use these insights to evaluate Arm's potential advantages for your workloads.
 
-Make sure to shut down the test cluster and delete the resources you used. 
+Make sure to shutdown the test cluster and delete all resources after use. 
 
diff --git a/content/learning-paths/servers-and-cloud-computing/multiarch_ollama_on_gke/_index.md b/content/learning-paths/servers-and-cloud-computing/multiarch_ollama_on_gke/_index.md