|
| 1 | +# Deploying Ollama and Open WebUI on OKE |
| 2 | + |
| 3 | +In this tutorial, we will explain how to use famous Mistral models in a browser using Open WebUI. The LLM will be served using Ollama and the overal infrastructure will rely on an Oracle Kubernetes Engine with a NVIDIA A10 GPU node pool. |
| 4 | + |
| 5 | +## Prerequisites |
| 6 | + |
| 7 | +To run this tutorial, you will need: |
| 8 | +* An OCI tenancy with limits set for A10 based instances |
| 9 | + |
| 10 | +## Deploying the infrastructure |
| 11 | + |
| 12 | +### Creating the OKE cluster with a CPU node pool |
| 13 | + |
| 14 | +The first step consists in creating a Kubernetes cluster. Initially, the cluster will be configured with a CPU node pool only. GPU node pool will be added afterwards. |
| 15 | + |
| 16 | + |
| 17 | + |
| 18 | +The easiest way is to use the Quick Create cluster assistant with the following options: |
| 19 | +* Public Endpoint, |
| 20 | +* Self-Managed nodes, |
| 21 | +* Private workers, |
| 22 | +* VM.Standard.E5.Flex compute shapes, |
| 23 | +* Oracle Linux OKE specific image. |
| 24 | + |
| 25 | +### Accessing the cluster |
| 26 | + |
| 27 | +Click Access Cluster, choose Cloud Shell Access or Local Access and follow the instructions. If you select Local Access, you must first install and configure the OCI CLI package. Check that the nodes are there: |
| 28 | +``` |
| 29 | +kubectl get nodes |
| 30 | +``` |
| 31 | + |
| 32 | +### Adding a GPU node pool |
| 33 | + |
| 34 | +Once the cluster is available, we can add a GPU node pool. Go to `Node pools` on the left panel and click on `Add node pool` and use the following options: |
| 35 | +* Public endpoint, |
| 36 | +* Self-Managed Nodes, |
| 37 | +* VM.GPU.A10.1 nodes, |
| 38 | +* Oracle Linux GPU OKE image. |
| 39 | +* Specify a custom boot volume size of 250 GB and add an Initialization script (Advanced options) to apply the changes. |
| 40 | +``` |
| 41 | +#!/bin/bash |
| 42 | +curl --fail -H "Authorization: Bearer Oracle" -L0 http://169.254.169.254/opc/v2/instance/metadata/oke_init_script | base64 --decode >/var/run/oke-init.sh |
| 43 | +bash /usr/libexec/oci-growfs -y |
| 44 | +bash /var/run/oke-init.sh |
| 45 | +``` |
| 46 | +Click on Create to add the GPU instances and wait for the node pool to be `Active` and the nodes to be in the `Ready` state. Check again the nodes that are available: |
| 47 | +``` |
| 48 | +kubectl get nodes |
| 49 | +``` |
| 50 | +Check device visibility on the GPU node whose name is `xxx.xxx.xxx.xxx`: |
| 51 | +``` |
| 52 | +kubectl describe nodes xxx.xxx.xxx.xxx | grep gpu |
| 53 | +``` |
| 54 | +You will get the following output: |
| 55 | +``` |
| 56 | + nvidia.com/gpu=true |
| 57 | +Taints: nvidia.com/gpu=present:NoSchedule |
| 58 | + nvidia.com/gpu: 1 |
| 59 | + nvidia.com/gpu: 1 |
| 60 | + kube-system nvidia-gpu-device-plugin-8ktcj 50m (0%) 50m (0%) 200Mi (0%) 200Mi (0%) 4m48s |
| 61 | + nvidia.com/gpu 0 0 |
| 62 | +``` |
| 63 | +(The following manifests are not tied to any GPU type.) |
| 64 | + |
| 65 | + |
| 66 | +### Installing the NVIDIA GPU Operator |
| 67 | + |
| 68 | +You can access the cluster either using Cloud Shell or using a standalone instance. The NVIDIA GPU Operator enhances the GPU features visibility in Kubernetes. The easiest way to install it is to use `helm`. |
| 69 | +``` |
| 70 | +helm repo add nvidia https://helm.ngc.nvidia.com/nvidia |
| 71 | +helm repo update |
| 72 | +helm install gpu-operator nvidia/gpu-operator --namespace gpu-operator --create-namespace |
| 73 | +``` |
| 74 | +Check again the device visibility on the GPU node: |
| 75 | +``` |
| 76 | +kubectl describe nodes xxx.xxx.xxx.xxx | grep gpu |
| 77 | +``` |
| 78 | +You will get the following output: |
| 79 | +``` |
| 80 | + nvidia.com/gpu=true |
| 81 | + nvidia.com/gpu-driver-upgrade-state=upgrade-done |
| 82 | + nvidia.com/gpu.compute.major=8 |
| 83 | + nvidia.com/gpu.compute.minor=6 |
| 84 | + nvidia.com/gpu.count=1 |
| 85 | + nvidia.com/gpu.deploy.container-toolkit=true |
| 86 | + nvidia.com/gpu.deploy.dcgm=true |
| 87 | + nvidia.com/gpu.deploy.dcgm-exporter=true |
| 88 | + nvidia.com/gpu.deploy.device-plugin=true |
| 89 | + nvidia.com/gpu.deploy.driver=pre-installed |
| 90 | + nvidia.com/gpu.deploy.gpu-feature-discovery=true |
| 91 | + nvidia.com/gpu.deploy.node-status-exporter=true |
| 92 | + nvidia.com/gpu.deploy.operator-validator=true |
| 93 | + nvidia.com/gpu.family=ampere |
| 94 | + nvidia.com/gpu.machine=Standard-PC-i440FX-PIIX-1996 |
| 95 | + nvidia.com/gpu.memory=23028 |
| 96 | + nvidia.com/gpu.mode=compute |
| 97 | + nvidia.com/gpu.present=true |
| 98 | + nvidia.com/gpu.product=NVIDIA-A10 |
| 99 | + nvidia.com/gpu.replicas=1 |
| 100 | + nvidia.com/gpu.sharing-strategy=none |
| 101 | + nvidia.com/vgpu.present=false |
| 102 | + nvidia.com/gpu-driver-upgrade-enabled: true |
| 103 | +Taints: nvidia.com/gpu=present:NoSchedule |
| 104 | + nvidia.com/gpu: 1 |
| 105 | + nvidia.com/gpu: 1 |
| 106 | + gpu-operator gpu-feature-discovery-9jmph 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3m1s |
| 107 | + gpu-operator gpu-operator-node-feature-discovery-worker-t6b75 5m (0%) 0 (0%) 64Mi (0%) 512Mi (0%) 3m16s |
| 108 | + gpu-operator nvidia-container-toolkit-daemonset-t5tpc 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3m3s |
| 109 | + gpu-operator nvidia-dcgm-exporter-2jvhz 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3m2s |
| 110 | + gpu-operator nvidia-device-plugin-daemonset-zbk2b 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3m2s |
| 111 | + gpu-operator nvidia-operator-validator-wpkxt 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3m3s |
| 112 | + kube-system nvidia-gpu-device-plugin-8ktcj 50m (0%) 50m (0%) 200Mi (0%) 200Mi (0%) 12m |
| 113 | + nvidia.com/gpu 0 0 |
| 114 | + Normal GPUDriverUpgrade 2m52s nvidia-gpu-operator Successfully updated node state label to upgrade-done |
| 115 | +``` |
| 116 | + |
| 117 | +## Deploying Ollama |
| 118 | + |
| 119 | +### Creating Ollama deployment |
| 120 | + |
| 121 | +To deploy Ollama, simply use the `ollama-deployment.yml` manifest. |
| 122 | +``` |
| 123 | +kubectl apply -f ollama-deployment.yaml |
| 124 | +``` |
| 125 | +Check that the deployment is ready: |
| 126 | +``` |
| 127 | +kubectl get all |
| 128 | +``` |
| 129 | + |
| 130 | +### Pulling the model from pod |
| 131 | + |
| 132 | +Enter the container: |
| 133 | +``` |
| 134 | +kubectl exec -ti ollama-deployment-pod -- /bin/bash |
| 135 | +``` |
| 136 | +where `ollama-deployment-pod` is the name of the pod displayed by the `kubectl get pods` command. |
| 137 | +Check Ollama installation and pull desired model(s), here Mistral 7B version 0.3 from Mistral AI, simply referred to as `mistral`: |
| 138 | +``` |
| 139 | +ollama --version (optional) |
| 140 | +ollama pull mistral |
| 141 | +``` |
| 142 | +For more model options, the list of all supported models can be found in [here](https://ollama.com/search). |
| 143 | + |
| 144 | +Optionnally, the model can be tested from the container: |
| 145 | +``` |
| 146 | +ollama run mistral |
| 147 | +>>> Tell me about Mistral AI. |
| 148 | + Mistral AI is a cutting-edge company based in Paris, France, developing large language models. Founded by CTO Edouard Dumoulin and CEO Thibault Favodi in 2021, Mistral AI aims to create advanced artificial intelligence technologies that can understand, learn, and generate human-like text with a focus on French and European languages. |
| 149 | +
|
| 150 | +Mistral AI is backed by prominent European investors, including Daphni, Founders Future, and Iris Capital, among others, and has received significant financial support from the French government to further its research and development in large language models. The company's ultimate goal is to contribute to France's technological sovereignty and help shape the future of artificial |
| 151 | +intelligence on the European continent. |
| 152 | +
|
| 153 | +One of Mistral AI's most notable projects is "La Mesure," a large-scale French language model that has achieved impressive results in various natural language processing tasks, such as text generation and understanding. The company is dedicated to pushing the boundaries of what AI can do and applying its technology to real-world applications like education, entertainment, and more. |
| 154 | +
|
| 155 | +>>> /bye |
| 156 | +``` |
| 157 | +Exit the container by simply typing `exit`. |
| 158 | + |
| 159 | + |
| 160 | +### Creating Ollama service |
| 161 | + |
| 162 | +The Ollama service can be created using the `ollama-service.yaml` manifest: |
| 163 | +``` |
| 164 | +kubectl apply -f ollama-service.yaml |
| 165 | +``` |
| 166 | + |
| 167 | +## Deploying Open WebUI |
| 168 | + |
| 169 | +### Creating Open WebUI deployment |
| 170 | + |
| 171 | +Open WebUI can be deployed using the `openwebui-deployment.yaml` manifest: |
| 172 | +``` |
| 173 | +kubectl apply -f openwebui-deployment.yaml |
| 174 | +``` |
| 175 | + |
| 176 | +### Creating Open WebUI service |
| 177 | + |
| 178 | +The Open WebUI service can be created using the `openwebui-service.yaml` manifest: |
| 179 | +``` |
| 180 | +kubectl apply -f openwebui-service.yaml |
| 181 | +``` |
| 182 | + |
| 183 | +## Testing the platform |
| 184 | + |
| 185 | +Check that everything is running: |
| 186 | +``` |
| 187 | +kubectl get all |
| 188 | +``` |
| 189 | +Go to `http://xxx.xxx.xxx.xxx:81` where xxx.xxx.xxx.xxx is the external IP address of the Open WebUI load balancer and click on `Get started` and create admin account (local). |
| 190 | + |
| 191 | +If no model can be found, go to `Profile > Settings > Admin Settings > Connections > Manage Ollama API Connections`. Verify that the Ollama address matches the Ollama service load balancer external IP address and check the connection by clicking on the `Configure icon > Verify Connection`. |
| 192 | + |
| 193 | +You can now start chatting with the model. |
| 194 | + |
| 195 | + |
| 196 | + |
| 197 | +## External links |
| 198 | + |
| 199 | +* [Mistral AI official website](https://mistral.ai/) |
| 200 | +* [Ollama official website](https://ollama.com/) |
| 201 | +* [Open WebUI official website](https://openwebui.com/) |
| 202 | + |
| 203 | +## License |
| 204 | + |
| 205 | +Copyright (c) 2025 Oracle and/or its affiliates. |
| 206 | + |
| 207 | +Licensed under the Universal Permissive License (UPL), Version 1.0. |
| 208 | + |
| 209 | +See [LICENSE](https://github.com/oracle-devrel/technology-engineering/blob/main/LICENSE) for more details. |
0 commit comments