|
| 1 | +*Last Update: 23 November 2024* |
| 2 | + |
| 3 | +<br><h1 align="center">Local LLM Inferencing and Interaction<br>Using the Ollama Open Source Tool</h1> |
| 4 | +<p align="center"><img align="centre" src="./images/ollama-logo.png" width="10%" style="float:right"/></p> |
| 5 | + |
| 6 | + |
| 7 | + |
| 8 | +<a id="toc"></a> |
| 9 | +## Table of Content |
| 10 | +1. [Introduction](#intro) |
| 11 | +2. [Preparation](#prep) |
| 12 | +3. [Security](#security) |
| 13 | +4. [Ollama Installation](#install) |
| 14 | +5. [Testing](#test) |
| 15 | +6. [Useful References](#ref) |
| 16 | + |
| 17 | +<a id="intro"></a> |
| 18 | +## Introduction |
| 19 | + |
| 20 | +[Ollama](https://github.com/ollama/ollama) is an open-source tool that runs large language models (LLMs) directly on a local machine. This makes it particularly appealing to AI developers, researchers, and businesses concerned with data control and privacy. It enables the loading and deployment of selected LLMs and provides access to them through APIs and freely obtainable chatbots. Ollama also contains a text-based server-side chatbot. |
| 21 | + |
| 22 | +By running models locally, you maintain full data ownership and avoid the potential security risks associated with cloud storage. Offline AI tools like Ollama also help reduce latency and reliance on external facilities, making them faster and more reliable. |
| 23 | + |
| 24 | +This article is intended to demonstrate and provide directions to install and create an Ollama LLM processing facility. Despite the fact that Ollama can be run on both personal servers and laptops, this installation is aimed at the Oracle Compute Cloud@Customer (C3) and Private Cloud Appliance (PCA) to capitalize on more readily available resources to increase performance and processing efficiency, especially if large models are used. |
| 25 | + |
| 26 | +Considerations: |
| 27 | +* A firm grasp of C3/PCA/OCI concepts and administration is assumed. |
| 28 | +* The creation and integration of a development environment is outside of the scope of this document. |
| 29 | +* Oracle Linux 8 and macOS Sonoma 14.7.1 clients were used for testing but Windows is however widely supported. |
| 30 | + |
| 31 | +[Back to top](#toc)<br> |
| 32 | +<br> |
| 33 | + |
| 34 | +<a id="prep"></a> |
| 35 | +## Preparation |
| 36 | + |
| 37 | +### System Requirements |
| 38 | + |
| 39 | +| Requirement | Specification | |
| 40 | +|----------|----------| |
| 41 | +| Operating system | Oracle Linux 8 or later<br>Ubuntu 22.04 or later<br>Windows<br> | |
| 42 | +| RAM | 16 GB for running models up to 7B. "The rule of thumb" is to have at least 2x memory for the size of the LLM, also allowing for LLMs that will be loaded in memory simultaneously. | |
| 43 | +| Disk space | 12 GB for installing Ollama and basic models. Additional space is required for storing model data depending on the used models. The LLM sizes can be obtained from the "trained models" link in the References section. For example the Llama 3.1 LLM with 405Bn parameters occupy 229GB of disk space | |
| 44 | +| Processor | Recommended to use a modern CPU with at least 4 cores. For running models of approximately 15B, 8 cores (OCPUs) is recommended. Allocate accordingly | |
| 45 | +| Graphics Processing Unit<br>(optional) | A GPU is not required for running Ollama, but can improve performance, especially when working with large models. If you have a GPU, you can use it to accelerate training of custom models. | |
| 46 | + |
| 47 | +>[!NOTE] |
| 48 | +>The GPU options in the Compute Cloud@Customer will be available soon. |
| 49 | +
|
| 50 | +### Create a Virtual Machine Instance |
| 51 | + |
| 52 | +[C3: Creating an Instance](https://docs.oracle.com/en-us/iaas/compute-cloud-at-customer/topics/compute/creating-an-instance.htm#creating-an-instance)<br> |
| 53 | +[PCA 3.0: Working with Instances](https://docs.oracle.com/en/engineered-systems/private-cloud-appliance/3.0-latest/user/user-usr-instance-lifecycle.html) |
| 54 | + |
| 55 | +Create a VM in a public subnet following these guidelines: |
| 56 | + |
| 57 | +1. Hostname `llm-host` |
| 58 | +2. Select the Oracle Linux 8 image |
| 59 | +3. Start with 6x OCPUs and 96GB of RAM using an available "Flex" shape (resources can be adjusted later depending on workload) |
| 60 | +5. Select the default boot volume size |
| 61 | +6. Select a public subnet and allow a public IP address to be assigned |
| 62 | +7. Configure the public key information |
| 63 | +8. Select "Restore instance lifecycle state after infrastructure maintenance" |
| 64 | +9. Apply appropriate tagging if required |
| 65 | +10. Ensure that the VM is accessible via `ssh` |
| 66 | +11. Configure the proxy setup if required (described below) |
| 67 | +12. Update your local host's `/etc/hosts` file to reflect your public IP address for `llm-host` |
| 68 | +13. Should you have a proxy'd network follow the instruction in the "Proxy Settings" section below prior to performing the next step |
| 69 | +14. Perform an OS update in `llm-host` before proceeding: |
| 70 | + |
| 71 | +``` |
| 72 | +sudo dnf update |
| 73 | +``` |
| 74 | + |
| 75 | +### Create a Block Storage Device for LLMs |
| 76 | + |
| 77 | +[C3: Creating and Attaching Block Volumes](https://docs.oracle.com/en-us/iaas/compute-cloud-at-customer/topics/block/creating-and-attaching-block-volumes.htm)<br> |
| 78 | +[PCA 3.0: Creating and Attaching Block Volumes](https://docs.oracle.com/en/engineered-systems/private-cloud-appliance/3.0-latest/user/user-usr-blk-volume-create-attach.html) |
| 79 | + |
| 80 | +1. Create and attach a block volume to the VM |
| 81 | +2. Volume name `llm-repo` |
| 82 | +3. A block volume of at least 150GB (research the model sizes!) in size is recommended and should multiple standard LLMs be hosted or larger if advanced workloads are foreseen, e.g. copies of LLMs for development, collection and loading of RAG material, etc the recommendation is 1TB. |
| 83 | +4. Select High Performance if available |
| 84 | +5. Select your appropriate backup policy |
| 85 | +6. Apply appropriate tagging if required |
| 86 | +7. It is recommended to use the `xfs` filesystem for formatting the block volume |
| 87 | +8. Configure a persistent mount point (to survive reboots). The entry in the `/etc/fstab` file will typically resemble the following: |
| 88 | +`/dev/disk/by-id/scsi-3600144f096933b92000061b1129e0037 /mnt/llm-repo xfs _netdev,nofail 0 0` |
| 89 | +9. To set initial unlimited access to the mounted filesystem perform the command on the mount point: |
| 90 | +``` |
| 91 | +sudo chmod 777 /mnt/llm-repo |
| 92 | +``` |
| 93 | +>[!IMPORTANT] |
| 94 | +>Note the mount options in the `/etc/fstab` file |
| 95 | +
|
| 96 | +### Proxy Settings |
| 97 | + |
| 98 | +In the event of a proxy'd network add the following to the `/etc/profile.d/proxy.sh` file to set the proxy environment variables system-wide: |
| 99 | + |
| 100 | +``` |
| 101 | +http_proxy=http://<proxy_server>:80 |
| 102 | +https_proxy=http://<proxy_server>:80 |
| 103 | +no_proxy="127.0.0.1, localhost" |
| 104 | +export http_proxy |
| 105 | +export https_proxy |
| 106 | +export no_proxy |
| 107 | +``` |
| 108 | + |
| 109 | +>[!TIP] |
| 110 | +>The `no_proxy` environment variable can be expanded to include your internal domains. It is not required to list IP addresses in internal subnets of the C3/PCA. |
| 111 | +
|
| 112 | +Edit the `/etc/yum.conf` file to include the following line: |
| 113 | +``` |
| 114 | +proxy=http://<proxy_server>:80 |
| 115 | +``` |
| 116 | + |
| 117 | +[Back to top](#toc)<br> |
| 118 | +<br> |
| 119 | + |
| 120 | +<a id="security"></a> |
| 121 | +## Security |
| 122 | + |
| 123 | +### General |
| 124 | + |
| 125 | +Resource constraints are often experienced on personal computers and more compute, memory and disk resources are required to run LLM operations more efficiently. Hence the increasing deployment of Ollama in cloud- or corporate hardware environments in a 2-tier client-server architecture. The security architecture of Ollama for this deployment architecture exposes the installation and data to a number of vulnerabilities. |
| 126 | + |
| 127 | +Several API endpoint vulnerabilities have been identified in the client-server deployment model of Ollama and some have been addressed successfully by means of security patching. Collectively, the vulnerabilities could allow an attacker to carry out a wide-range of malicious actions with a single HTTP request, including denial-of-service (DoS) attacks, model poisoning, model theft, and more. |
| 128 | + |
| 129 | +*A future article will describe a secure (using reverse proxying and TLS) client-server deployment architecture that can be made available for secure corporate use that also ensures data usage privacy.* |
| 130 | + |
| 131 | +>[!NOTE] |
| 132 | +>Refer to the article [Why You Should Trust Meta AI's Ollama for Data Security](https://myscale.com/blog/trust-meta-ai-ollama-data-security) for further information on the benefits of running LLMs locally. |
| 133 | +
|
| 134 | +### Open the Firewall for the Ollama Listening Port |
| 135 | + |
| 136 | +``` |
| 137 | +sudo firewall-cmd –-set-default-zone=public |
| 138 | +``` |
| 139 | +``` |
| 140 | +sudo firewall-cmd –-add-port=11434/tcp --add-service=http –-zone=public |
| 141 | +``` |
| 142 | +``` |
| 143 | +sudo firewall-cmd --runtime-to-permanent |
| 144 | +``` |
| 145 | +``` |
| 146 | +sudo firewall-cmd –reload |
| 147 | +``` |
| 148 | +``` |
| 149 | +sudo firewall-cmd –-info-zone=public |
| 150 | +``` |
| 151 | + |
| 152 | +<p><img src="./images/firewall-info.png" title="Firewall service lising" width="75%" style="float:right"/></p> |
| 153 | + |
| 154 | +### Grant VCN Access through Security List |
| 155 | + |
| 156 | +Edit the VCN default security list to reflect the following: |
| 157 | + |
| 158 | +<p><img src="./images/security-list.png" title="Ollama server port access" width="100%" style="float:right"/></p> |
| 159 | + |
| 160 | +Should you want to limit the access to a specific IP address the source should be: |
| 161 | + |
| 162 | +<p><img src="./images/security-list-individual.png" title="Access limited to a single IP address" width="100%" style="float:right"/></p> |
| 163 | + |
| 164 | +>[!TIP] |
| 165 | +>To avoid continuous changes to the security list obtain a reserved IP address for your client machine from the network administrator. |
| 166 | +
|
| 167 | +[Back to top](#toc)<br> |
| 168 | +<br> |
| 169 | + |
| 170 | +<a id="install"></a> |
| 171 | +## Ollama Installation |
| 172 | + |
| 173 | +### General |
| 174 | + |
| 175 | +The installation comprises the following components: |
| 176 | + |
| 177 | +| Server | Client | |
| 178 | +|----------|----------| |
| 179 | +| Ollama | GUI Tools<sup><sub>1</sup></sub><br>Character based tools<br>API Development kits<sup><sub>2</sup></sub> | |
| 180 | + |
| 181 | +<sup><sub>1</sup></sub> Examples of GUIs: [Msty](https://msty.app/), [OpenWebUI](https://openwebui.com/), [ollama-chats](https://github.com/drazdra/ollama-chats)<br> |
| 182 | +<sup><sub>2</sup></sub> See [Ollama documentation](https://github.com/ollama/ollama/tree/main/docs) |
| 183 | + |
| 184 | +>[!IMPORTANT] |
| 185 | +>When GPU's become available the NVIDIA and CUDA drivers should be installed. This configuration will also be tested on the Roving Edge Device GPU model. |
| 186 | +
|
| 187 | +### Installation |
| 188 | + |
| 189 | +``` |
| 190 | +cd /tmp |
| 191 | +curl -L https://ollama.com/download/ollama-linux-amd64.tgz -o ollama-linux-amd64.tgz |
| 192 | +sudo tar -C /usr -xzf ollama-linux-amd64.tgz |
| 193 | +sudo chmod +x /usr/bin/ollama |
| 194 | +sudo useradd -r -s /bin/false -m -d /usr/share/ollama ollama |
| 195 | +``` |
| 196 | +``` |
| 197 | +sudo tee /usr/lib/systemd/system/ollama.service > /dev/null <<EOF |
| 198 | +[Unit] |
| 199 | +Description=Ollama Service |
| 200 | +After=network-online.target |
| 201 | +
|
| 202 | +[Service] |
| 203 | +ExecStart=/usr/bin/ollama serve |
| 204 | +User=ollama |
| 205 | +Group=ollama |
| 206 | +Restart=always |
| 207 | +RestartSec=3 |
| 208 | +Environment="HTTPS_PROXY=http:<IP_address>:<port>" |
| 209 | +Environment="OLLAMA_MODELS=/mnt/llm-repo" |
| 210 | +Environment="OLLAMA_HOST=0.0.0.0" |
| 211 | +Environment="OLLAMA_ORIGINS=*" |
| 212 | +
|
| 213 | +[Install] |
| 214 | +WantedBy=default.target |
| 215 | +EOF |
| 216 | +``` |
| 217 | + |
| 218 | +The `Environment="HTTPS_PROXY=http:<IP_address>:<port>"` line should be omitted if a proxy is not applicable.<br> |
| 219 | +For Nvidia GPUs, add `Environment="OLLAMA_FLASH_ATTENTION=1"` to improve token generation speed. |
| 220 | + |
| 221 | +Enable and start Ollama: |
| 222 | +``` |
| 223 | +sudo systemctl daemon-reload |
| 224 | +sudo systemctl enable ollama |
| 225 | +sudo systemctl start ollama |
| 226 | +``` |
| 227 | + |
| 228 | +Ollama will be accessible at http://127.0.0.1:11434 or http://<you_server_IP>:11434. |
| 229 | + |
| 230 | +Execute: |
| 231 | +``` |
| 232 | +sudo chown ollama:ollama /mnt/llm-repo |
| 233 | +sudo chmod 755 /mnt/llm-repo |
| 234 | +``` |
| 235 | + |
| 236 | +[Back to top](#toc)<br> |
| 237 | +<br> |
| 238 | + |
| 239 | + |
| 240 | +<a id="test"></a> |
| 241 | +## Testing |
| 242 | + |
| 243 | +From the local host, test the accessibility of the port and the availability of the Ollama server: |
| 244 | + |
| 245 | +``` |
| 246 | +nc -zv llm-host 11434 |
| 247 | +curl http://llm-host:11434 |
| 248 | +curl -I http://llm-host:11434 |
| 249 | +``` |
| 250 | +<p><img src="./images/ollama-remote-test.png" title="Ollama remote test results" width="75%" style="float:right"/></p> |
| 251 | + |
| 252 | +Login to `llm-host` and note the command line options that are available: |
| 253 | + |
| 254 | +``` |
| 255 | +ollama |
| 256 | +``` |
| 257 | + |
| 258 | +<p><img src="./images/ollama-syntax.png" title="Ollama syntax" width="75%" style="float:right"/></p> |
| 259 | + |
| 260 | +Also note the environment variable options that are available: |
| 261 | + |
| 262 | +``` |
| 263 | +ollama help serve |
| 264 | +``` |
| 265 | + |
| 266 | +<p><img src="./images/ollama-env-var.png" title="Ollama environment variables" width="75%" style="float:right"/></p> |
| 267 | + |
| 268 | +Download and test your first LLM (and you will notice the population of `/mnt/llm-repo` with data by running `ls -lR /mnt/llm-repo`): |
| 269 | + |
| 270 | +<p><img src="./images/ollama-pull-and-test.png" title="Ollama pull/test Llama3.2" width="75%" style="float:right"/></p> |
| 271 | + |
| 272 | +Run some more tests from your client to test the APIs: |
| 273 | + |
| 274 | +``` |
| 275 | +$ curl http://llm-host:11434/api/tags |
| 276 | +$ curl http://llm-host:11434/api/ps |
| 277 | +$ curl -X POST http://llm-host:11434/api/generate -d '{ |
| 278 | + "model": "llama3.2", |
| 279 | + "prompt":"Hello Llama3.2!", |
| 280 | + "stream": false |
| 281 | + }' |
| 282 | +``` |
| 283 | + |
| 284 | +<p><img src="./images/ollama-curl-statements.png" title="Ollama additional tests" width="75%" style="float:right"/></p> |
| 285 | + |
| 286 | +1. `curl http://llm-host:11434/api/tags` returns a list of installed LLMs |
| 287 | +2. `curl http://llm-host:11434/api/ps` returns a list of LLMs already loaded into memory |
| 288 | + |
| 289 | +>[!TIP] |
| 290 | +>The duration that the LLM can stay loaded into memory can be adjusted by changing the `OLLAMA_KEEP_ALIVE` environment parameter (default = 5 mins). |
| 291 | +
|
| 292 | +Install any of the GUI clients mentioned previously and test the connectivity and accessibility. For using [the Msty app](https://msty.app/) you need to: |
| 293 | + |
| 294 | +1. Create a Remote Models Provider |
| 295 | +2. Name it appropriately |
| 296 | +3. The Service Endpoint is `http://llm-host:11434` |
| 297 | +4. "Fetch Models" (that are already installed, in this case `llama3.2`) |
| 298 | +5. This step can be repeated as new models are added |
| 299 | + |
| 300 | +Example output as follows: |
| 301 | + |
| 302 | +<p><img src="./images/msty-example.png" title="Msty example" width="75%" style="float:right"/></p> |
| 303 | + |
| 304 | +[Back to top](#toc)<br> |
| 305 | +<br> |
| 306 | + |
| 307 | +<a id="ref"></a> |
| 308 | +## Useful References |
| 309 | + |
| 310 | +* [Ollama documentation](https://github.com/ollama/ollama/tree/main/docs) |
| 311 | +* [Pre-trained Ollama models](https://ollama.com/library) |
| 312 | +* [Msty GUI client](https://msty.app/) |
| 313 | +* [OpenWebUI](https://github.com/open-webui/open-webui) |
| 314 | +* [Ollama-chats](https://github.com/drazdra/ollama-chats) |
| 315 | +* [Ollama Python library](https://github.com/ollama/ollama-python) |
| 316 | +* [Getting started with Ollama for Python](https://github.com/RamiKrispin/ollama-poc) |
| 317 | +* [Ollama and Oracle Database 23ai vector search](https://docs.oracle.com/en/database/oracle/oracle-database/23/vecse/generate-summary-using-ollama.html) |
| 318 | + |
| 319 | +[Back to top](#toc)<br> |
| 320 | +<br> |
| 321 | + |
| 322 | +## License |
| 323 | +Copyright (c) 2024 Oracle and/or its affiliates. |
| 324 | + |
| 325 | +Licensed under the Universal Permissive License (UPL), Version 1.0. |
| 326 | + |
| 327 | +See [LICENSE](LICENSE) for more details. |
0 commit comments