You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: _posts/2025-04-16-local-llm-openfaas-edge.md
+16-8Lines changed: 16 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -15,6 +15,14 @@ hide_header_image: true
15
15
16
16
The rise of hosted LLMs has been meteoric, but many Non-Disclosure Agreements (NDAs) would prevent you from using them. We explore how a self-hosted solution protects your data.
17
17
18
+
This post at a glance:
19
+
20
+
* Pros and cons of hosted vs. self-hosted LLMs.
21
+
* Bill of materials for a PC with cost-effective Nvidia GPUs
22
+
* Configuration for OpenFaaS Edge with Ollama
23
+
* Sample function and test data for categorizing cold outreach emails
24
+
* Past posts on AI and LLMs from OpenFaaS and sister companies
25
+
18
26
## Why Self-Hosted LLMs?
19
27
20
28
Self-hosted models are great for experimentation and exploring what is possible, without having to worry about how much your API calls are costing you ($$$). Practically speaking, they are the only option if you are dealing with Confidential Information covered by an NDA.
@@ -45,7 +53,7 @@ Downsides for hosted models:
45
53
Pros for self-hosted models:
46
54
47
55
* Tools such as [Ollama](https://ollama.com), [llama.cpp](https://github.com/ggml-org/llama.cpp), [LLM Studio](https://lmstudio.ai) and [vLLM](https://github.com/vllm-project/vllm) make it trivial to run LLMs locally
48
-
* A modest investment in 1 or 2 NVIDIA GPUs such as 3060 or 3090 can give you access to a wide range of models
56
+
* A modest investment in 1 or 2 Nvidia GPUs such as 3060 or 3090 can give you access to a wide range of models
49
57
* Running on your own hardware means there are no API costs - all you can eat
50
58
* You have full control over the model, and can choose to use open source models, or your own fine-tuned models
51
59
* You have full control over the data, and can choose to keep it on-premises or in a private cloud
@@ -59,7 +67,7 @@ Cons for self-hosted models:
59
67
60
68
## Build of materials for a PC
61
69
62
-
For our sister company[actuated.com](https://actuated.com), we built a custom PC to show [how to leverage GPUs and LLMs during CI/CD with GitHub Actions and GitLab CI](https://actuated.com/blog/ollama-in-github-actions).
70
+
For our sister brand[actuated.com](https://actuated.com), we built a custom PC to show [how to leverage GPUs and LLMs during CI/CD with GitHub Actions and GitLab CI](https://actuated.com/blog/ollama-in-github-actions).
63
71
64
72
The build uses an AMD Ryzen 9 5950X 16-Core CPU with 2x 3060 GPUs, 128GB of RAM, 1TB of NVMe storage, and a 1000W power supply.
65
73
@@ -71,7 +79,7 @@ Around 9 months later, we swapped the 2x 3060 GPUs for 2x 3090s taking the VRAM
71
79
72
80
For this post, we allocated one of the two 3090 cards to a microVM, then we installed OpenFaaS Edge.
73
81
74
-
At the time of writing, a brand-new NVIDIA 3060 card with 12GB of VRAM is currently available for around [250 GBP as a one-off cost from Amazon.co.uk](https://amzn.to/42tE1Xp). If you use it heavily, will pay for itself in a short period of time compared to the cost of API credits.
82
+
At the time of writing, a brand-new Nvidia 3060 card with 12GB of VRAM is currently available for around [250 GBP as a one-off cost from Amazon.co.uk](https://amzn.to/42tE1Xp). If you use it heavily, will pay for itself in a short period of time compared to the cost of API credits.
75
83
76
84
## How to get started with OpenFaaS Edge
77
85
@@ -89,11 +97,11 @@ Use the [official instructions to install OpenFaaS Edge](https://docs.openfaas.c
89
97
90
98
Activate your license using your license key or GitHub Sponsorship.
91
99
92
-
### Install the NVIDIA Container Toolkit
100
+
### Install the Nvidia Container Toolkit
93
101
94
-
Follow the instructions for your platform to install the NVIDIA Container Toolkit. This will allow you to run GPU workloads in Docker containers.
102
+
Follow the instructions for your platform to install the Nvidia Container Toolkit. This will allow you to run GPU workloads in Docker containers.
95
103
96
-
[Installing the NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html)
104
+
[Installing the Nvidia Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html)
97
105
98
106
You should be able to run `nvidia-smi` and see your GPUs detected.
99
107
@@ -365,7 +373,7 @@ You can deploy the [function we wrote previously on the blog](https://www.openfa
365
373
366
374
### Conclusion
367
375
368
-
The latest release of [OpenFaaS Edge](https://docs.openfaas.com/deployment/edge/) adds support for NVIDIA GPUs for core services defined in the `docker-compose.yaml` file. This makes it easy to run local LLMs using a tool like Ollama, then to call them for a wide range of tasks and workflows, whilst retaining data privacy and complete confidentiality.
376
+
The latest release of [OpenFaaS Edge](https://docs.openfaas.com/deployment/edge/) adds support for Nvidia GPUs for core services defined in the `docker-compose.yaml` file. This makes it easy to run local LLMs using a tool like Ollama, then to call them for a wide range of tasks and workflows, whilst retaining data privacy and complete confidentiality.
369
377
370
378
The functions can be written in any language, both synchronously and asynchronously for durability and scaling out.
371
379
@@ -380,7 +388,7 @@ We've covered various AI/LLM related topics across our blog in the past:
380
388
* [How to check for price drops with Functions, Cron & LLMs](https://www.openfaas.com/blog/checking-stock-price-drops/)
381
389
* [How to transcribe audio with OpenAI Whisper and OpenFaaS](https://www.openfaas.com/blog/transcribe-audio-with-openai-whisper/)
382
390
383
-
From our sister companies:
391
+
From our sister brands:
384
392
385
393
* Inlets - [Access local Ollama models from a cloud Kubernetes Cluster](https://inlets.dev/blog/2024/08/09/local-ollama-tunnel-k3s.html)
386
394
* Actuated - [Run AI models with ollama in CI with GitHub Actions](https://actuated.com/blog/ollama-in-github-actions)
0 commit comments