Skip to content

Commit 1d4a3ab

Browse files
authored
Merge pull request #1499 from vandijm/local-llm-processor
Local LLM Processing Facility
2 parents 2f24908 + 42fc61a commit 1d4a3ab

File tree

13 files changed

+334
-0
lines changed

13 files changed

+334
-0
lines changed
Lines changed: 327 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,327 @@
1+
*Last Update: 23 November 2024*
2+
3+
<br><h1 align="center">Local LLM Inferencing and Interaction<br>Using the Ollama Open Source Tool</h1>
4+
<p align="center"><img align="centre" src="./images/ollama-logo.png" width="10%" style="float:right"/></p>
5+
6+
7+
8+
<a id="toc"></a>
9+
## Table of Content
10+
1. [Introduction](#intro)
11+
2. [Preparation](#prep)
12+
3. [Security](#security)
13+
4. [Ollama Installation](#install)
14+
5. [Testing](#test)
15+
6. [Useful References](#ref)
16+
17+
<a id="intro"></a>
18+
## Introduction
19+
20+
[Ollama](https://github.com/ollama/ollama) is an open-source tool that runs large language models (LLMs) directly on a local machine. This makes it particularly appealing to AI developers, researchers, and businesses concerned with data control and privacy. It enables the loading and deployment of selected LLMs and provides access to them through APIs and freely obtainable chatbots. Ollama also contains a text-based server-side chatbot.
21+
22+
By running models locally, you maintain full data ownership and avoid the potential security risks associated with cloud storage. Offline AI tools like Ollama also help reduce latency and reliance on external facilities, making them faster and more reliable.
23+
24+
This article is intended to demonstrate and provide directions to install and create an Ollama LLM processing facility. Despite the fact that Ollama can be run on both personal servers and laptops, this installation is aimed at the Oracle Compute Cloud@Customer (C3) and Private Cloud Appliance (PCA) to capitalize on more readily available resources to increase performance and processing efficiency, especially if large models are used.
25+
26+
Considerations:
27+
* A firm grasp of C3/PCA/OCI concepts and administration is assumed.
28+
* The creation and integration of a development environment is outside of the scope of this document.
29+
* Oracle Linux 8 and macOS Sonoma 14.7.1 clients were used for testing but Windows is however widely supported.
30+
31+
[Back to top](#toc)<br>
32+
<br>
33+
34+
<a id="prep"></a>
35+
## Preparation
36+
37+
### System Requirements
38+
39+
| Requirement | Specification |
40+
|----------|----------|
41+
| Operating system | Oracle Linux 8 or later<br>Ubuntu 22.04 or later<br>Windows<br> |
42+
| RAM | 16 GB for running models up to 7B. "The rule of thumb" is to have at least 2x memory for the size of the LLM, also allowing for LLMs that will be loaded in memory simultaneously. |
43+
| Disk space | 12 GB for installing Ollama and basic models. Additional space is required for storing model data depending on the used models. The LLM sizes can be obtained from the "trained models" link in the References section. For example the Llama 3.1 LLM with 405Bn parameters occupy 229GB of disk space |
44+
| Processor | Recommended to use a modern CPU with at least 4 cores. For running models of approximately 15B, 8 cores (OCPUs) is recommended. Allocate accordingly |
45+
| Graphics Processing Unit<br>(optional) | A GPU is not required for running Ollama, but can improve performance, especially when working with large models. If you have a GPU, you can use it to accelerate training of custom models. |
46+
47+
>[!NOTE]
48+
>The GPU options in the Compute Cloud@Customer will be available soon.
49+
50+
### Create a Virtual Machine Instance
51+
52+
[C3: Creating an Instance](https://docs.oracle.com/en-us/iaas/compute-cloud-at-customer/topics/compute/creating-an-instance.htm#creating-an-instance)<br>
53+
[PCA 3.0: Working with Instances](https://docs.oracle.com/en/engineered-systems/private-cloud-appliance/3.0-latest/user/user-usr-instance-lifecycle.html)
54+
55+
Create a VM in a public subnet following these guidelines:
56+
57+
1. Hostname `llm-host`
58+
2. Select the Oracle Linux 8 image
59+
3. Start with 6x OCPUs and 96GB of RAM using an available "Flex" shape (resources can be adjusted later depending on workload)
60+
5. Select the default boot volume size
61+
6. Select a public subnet and allow a public IP address to be assigned
62+
7. Configure the public key information
63+
8. Select "Restore instance lifecycle state after infrastructure maintenance"
64+
9. Apply appropriate tagging if required
65+
10. Ensure that the VM is accessible via `ssh`
66+
11. Configure the proxy setup if required (described below)
67+
12. Update your local host's `/etc/hosts` file to reflect your public IP address for `llm-host`
68+
13. Should you have a proxy'd network follow the instruction in the "Proxy Settings" section below prior to performing the next step
69+
14. Perform an OS update in `llm-host` before proceeding:
70+
71+
```
72+
sudo dnf update
73+
```
74+
75+
### Create a Block Storage Device for LLMs
76+
77+
[C3: Creating and Attaching Block Volumes](https://docs.oracle.com/en-us/iaas/compute-cloud-at-customer/topics/block/creating-and-attaching-block-volumes.htm)<br>
78+
[PCA 3.0: Creating and Attaching Block Volumes](https://docs.oracle.com/en/engineered-systems/private-cloud-appliance/3.0-latest/user/user-usr-blk-volume-create-attach.html)
79+
80+
1. Create and attach a block volume to the VM
81+
2. Volume name `llm-repo`
82+
3. A block volume of at least 150GB (research the model sizes!) in size is recommended and should multiple standard LLMs be hosted or larger if advanced workloads are foreseen, e.g. copies of LLMs for development, collection and loading of RAG material, etc the recommendation is 1TB.
83+
4. Select High Performance if available
84+
5. Select your appropriate backup policy
85+
6. Apply appropriate tagging if required
86+
7. It is recommended to use the `xfs` filesystem for formatting the block volume
87+
8. Configure a persistent mount point (to survive reboots). The entry in the `/etc/fstab` file will typically resemble the following:
88+
`/dev/disk/by-id/scsi-3600144f096933b92000061b1129e0037 /mnt/llm-repo xfs _netdev,nofail 0 0`
89+
9. To set initial unlimited access to the mounted filesystem perform the command on the mount point:
90+
```
91+
sudo chmod 777 /mnt/llm-repo
92+
```
93+
>[!IMPORTANT]
94+
>Note the mount options in the `/etc/fstab` file
95+
96+
### Proxy Settings
97+
98+
In the event of a proxy'd network add the following to the `/etc/profile.d/proxy.sh` file to set the proxy environment variables system-wide:
99+
100+
```
101+
http_proxy=http://<proxy_server>:80
102+
https_proxy=http://<proxy_server>:80
103+
no_proxy="127.0.0.1, localhost"
104+
export http_proxy
105+
export https_proxy
106+
export no_proxy
107+
```
108+
109+
>[!TIP]
110+
>The `no_proxy` environment variable can be expanded to include your internal domains. It is not required to list IP addresses in internal subnets of the C3/PCA.
111+
112+
Edit the `/etc/yum.conf` file to include the following line:
113+
```
114+
proxy=http://<proxy_server>:80
115+
```
116+
117+
[Back to top](#toc)<br>
118+
<br>
119+
120+
<a id="security"></a>
121+
## Security
122+
123+
### General
124+
125+
Resource constraints are often experienced on personal computers and more compute, memory and disk resources are required to run LLM operations more efficiently. Hence the increasing deployment of Ollama in cloud- or corporate hardware environments in a 2-tier client-server architecture. The security architecture of Ollama for this deployment architecture exposes the installation and data to a number of vulnerabilities.
126+
127+
Several API endpoint vulnerabilities have been identified in the client-server deployment model of Ollama and some have been addressed successfully by means of security patching. Collectively, the vulnerabilities could allow an attacker to carry out a wide-range of malicious actions with a single HTTP request, including denial-of-service (DoS) attacks, model poisoning, model theft, and more.
128+
129+
*A future article will describe a secure (using reverse proxying and TLS) client-server deployment architecture that can be made available for secure corporate use that also ensures data usage privacy.*
130+
131+
>[!NOTE]
132+
>Refer to the article [Why You Should Trust Meta AI's Ollama for Data Security](https://myscale.com/blog/trust-meta-ai-ollama-data-security) for further information on the benefits of running LLMs locally.
133+
134+
### Open the Firewall for the Ollama Listening Port
135+
136+
```
137+
sudo firewall-cmd –-set-default-zone=public
138+
```
139+
```
140+
sudo firewall-cmd –-add-port=11434/tcp --add-service=http –-zone=public
141+
```
142+
```
143+
sudo firewall-cmd --runtime-to-permanent
144+
```
145+
```
146+
sudo firewall-cmd –reload
147+
```
148+
```
149+
sudo firewall-cmd –-info-zone=public
150+
```
151+
152+
<p><img src="./images/firewall-info.png" title="Firewall service lising" width="75%" style="float:right"/></p>
153+
154+
### Grant VCN Access through Security List
155+
156+
Edit the VCN default security list to reflect the following:
157+
158+
<p><img src="./images/security-list.png" title="Ollama server port access" width="100%" style="float:right"/></p>
159+
160+
Should you want to limit the access to a specific IP address the source should be:
161+
162+
<p><img src="./images/security-list-individual.png" title="Access limited to a single IP address" width="100%" style="float:right"/></p>
163+
164+
>[!TIP]
165+
>To avoid continuous changes to the security list obtain a reserved IP address for your client machine from the network administrator.
166+
167+
[Back to top](#toc)<br>
168+
<br>
169+
170+
<a id="install"></a>
171+
## Ollama Installation
172+
173+
### General
174+
175+
The installation comprises the following components:
176+
177+
| Server | Client |
178+
|----------|----------|
179+
| Ollama | GUI Tools<sup><sub>1</sup></sub><br>Character based tools<br>API Development kits<sup><sub>2</sup></sub> |
180+
181+
<sup><sub>1</sup></sub> Examples of GUIs: [Msty](https://msty.app/), [OpenWebUI](https://openwebui.com/), [ollama-chats](https://github.com/drazdra/ollama-chats)<br>
182+
<sup><sub>2</sup></sub> See [Ollama documentation](https://github.com/ollama/ollama/tree/main/docs)
183+
184+
>[!IMPORTANT]
185+
>When GPU's become available the NVIDIA and CUDA drivers should be installed. This configuration will also be tested on the Roving Edge Device GPU model.
186+
187+
### Installation
188+
189+
```
190+
cd /tmp
191+
curl -L https://ollama.com/download/ollama-linux-amd64.tgz -o ollama-linux-amd64.tgz
192+
sudo tar -C /usr -xzf ollama-linux-amd64.tgz
193+
sudo chmod +x /usr/bin/ollama
194+
sudo useradd -r -s /bin/false -m -d /usr/share/ollama ollama
195+
```
196+
```
197+
sudo tee /usr/lib/systemd/system/ollama.service > /dev/null <<EOF
198+
[Unit]
199+
Description=Ollama Service
200+
After=network-online.target
201+
202+
[Service]
203+
ExecStart=/usr/bin/ollama serve
204+
User=ollama
205+
Group=ollama
206+
Restart=always
207+
RestartSec=3
208+
Environment="HTTPS_PROXY=http:<IP_address>:<port>"
209+
Environment="OLLAMA_MODELS=/mnt/llm-repo"
210+
Environment="OLLAMA_HOST=0.0.0.0"
211+
Environment="OLLAMA_ORIGINS=*"
212+
213+
[Install]
214+
WantedBy=default.target
215+
EOF
216+
```
217+
218+
The `Environment="HTTPS_PROXY=http:<IP_address>:<port>"` line should be omitted if a proxy is not applicable.<br>
219+
For Nvidia GPUs, add `Environment="OLLAMA_FLASH_ATTENTION=1"` to improve token generation speed.
220+
221+
Enable and start Ollama:
222+
```
223+
sudo systemctl daemon-reload
224+
sudo systemctl enable ollama
225+
sudo systemctl start ollama
226+
```
227+
228+
Ollama will be accessible at http://127.0.0.1:11434 or http://<you_server_IP>:11434.
229+
230+
Execute:
231+
```
232+
sudo chown ollama:ollama /mnt/llm-repo
233+
sudo chmod 755 /mnt/llm-repo
234+
```
235+
236+
[Back to top](#toc)<br>
237+
<br>
238+
239+
240+
<a id="test"></a>
241+
## Testing
242+
243+
From the local host, test the accessibility of the port and the availability of the Ollama server:
244+
245+
```
246+
nc -zv llm-host 11434
247+
curl http://llm-host:11434
248+
curl -I http://llm-host:11434
249+
```
250+
<p><img src="./images/ollama-remote-test.png" title="Ollama remote test results" width="75%" style="float:right"/></p>
251+
252+
Login to `llm-host` and note the command line options that are available:
253+
254+
```
255+
ollama
256+
```
257+
258+
<p><img src="./images/ollama-syntax.png" title="Ollama syntax" width="75%" style="float:right"/></p>
259+
260+
Also note the environment variable options that are available:
261+
262+
```
263+
ollama help serve
264+
```
265+
266+
<p><img src="./images/ollama-env-var.png" title="Ollama environment variables" width="75%" style="float:right"/></p>
267+
268+
Download and test your first LLM (and you will notice the population of `/mnt/llm-repo` with data by running `ls -lR /mnt/llm-repo`):
269+
270+
<p><img src="./images/ollama-pull-and-test.png" title="Ollama pull/test Llama3.2" width="75%" style="float:right"/></p>
271+
272+
Run some more tests from your client to test the APIs:
273+
274+
```
275+
$ curl http://llm-host:11434/api/tags
276+
$ curl http://llm-host:11434/api/ps
277+
$ curl -X POST http://llm-host:11434/api/generate -d '{
278+
"model": "llama3.2",
279+
"prompt":"Hello Llama3.2!",
280+
"stream": false
281+
}'
282+
```
283+
284+
<p><img src="./images/ollama-curl-statements.png" title="Ollama additional tests" width="75%" style="float:right"/></p>
285+
286+
1. `curl http://llm-host:11434/api/tags` returns a list of installed LLMs
287+
2. `curl http://llm-host:11434/api/ps` returns a list of LLMs already loaded into memory
288+
289+
>[!TIP]
290+
>The duration that the LLM can stay loaded into memory can be adjusted by changing the `OLLAMA_KEEP_ALIVE` environment parameter (default = 5 mins).
291+
292+
Install any of the GUI clients mentioned previously and test the connectivity and accessibility. For using [the Msty app](https://msty.app/) you need to:
293+
294+
1. Create a Remote Models Provider
295+
2. Name it appropriately
296+
3. The Service Endpoint is `http://llm-host:11434`
297+
4. "Fetch Models" (that are already installed, in this case `llama3.2`)
298+
5. This step can be repeated as new models are added
299+
300+
Example output as follows:
301+
302+
<p><img src="./images/msty-example.png" title="Msty example" width="75%" style="float:right"/></p>
303+
304+
[Back to top](#toc)<br>
305+
<br>
306+
307+
<a id="ref"></a>
308+
## Useful References
309+
310+
* [Ollama documentation](https://github.com/ollama/ollama/tree/main/docs)
311+
* [Pre-trained Ollama models](https://ollama.com/library)
312+
* [Msty GUI client](https://msty.app/)
313+
* [OpenWebUI](https://github.com/open-webui/open-webui)
314+
* [Ollama-chats](https://github.com/drazdra/ollama-chats)
315+
* [Ollama Python library](https://github.com/ollama/ollama-python)
316+
* [Getting started with Ollama for Python](https://github.com/RamiKrispin/ollama-poc)
317+
* [Ollama and Oracle Database 23ai vector search](https://docs.oracle.com/en/database/oracle/oracle-database/23/vecse/generate-summary-using-ollama.html)
318+
319+
[Back to top](#toc)<br>
320+
<br>
321+
322+
## License
323+
Copyright (c) 2024 Oracle and/or its affiliates.
324+
325+
Licensed under the Universal Permissive License (UPL), Version 1.0.
326+
327+
See [LICENSE](LICENSE) for more details.
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
## License
2+
3+
Copyright (c) 2024 Oracle and/or its affiliates.
4+
5+
Licensed under the Universal Permissive License (UPL), Version 1.0.
6+
7+
See [LICENSE](https://github.com/oracle-devrel/technology-engineering/blob/main/LICENSE) for more details.
312 KB
Loading
601 KB
Loading
Loading
846 KB
Loading
89.9 KB
Loading
559 KB
Loading
339 KB
Loading
247 KB
Loading

0 commit comments

Comments
 (0)