Skip to content

Commit ee2511c

Browse files
committed
Add links for LLMs
Signed-off-by: Alex Ellis (OpenFaaS Ltd) <[email protected]>
1 parent bfb5719 commit ee2511c

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

_posts/2024-09-04-checking-stock-price-drops.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -656,7 +656,7 @@ Here's the Discord alert I got when I checked my phone:
656656
657657
![Discord alert](/images/2024-09-stockcheck/discord.png)
658658
659-
Whilst it's overkill for this task because standard HTML scraping and parsing techniques worked perfectly well, I decided to try out running a local Llama3.1 8B model on my Nvidia RTX 3090 GPU to see if it was up to the task.
659+
Whilst it's overkill for this task because standard HTML scraping and parsing techniques worked perfectly well, I decided to try out running a local [Llama3.1](https://ai.meta.com/blog/meta-llama-3-1/) 8B model on my Nvidia RTX 3090 GPU to see if it was up to the task.
660660
661661
It wasn't until I ran `ollama run llama3.1:8b-instruct-q8_0` and pasted in my prompt that I realised just how long that HTML was was. It was huge, over 681KB of text, this is generally considered a large context window for a Large Language Model.
662662
@@ -698,7 +698,7 @@ Provide the result in the following format, but avoid using the example in your
698698
699699
If a local LLM wasn't up to the task, then we could have also used a cloud-hosted service like the OpenAI API, or one of the many other options that charge per request.
700700
701-
And in the case that the local LLM aced the task, we could also try scaling down to something that can run better on CPU, or that doesn't require so many resources. I tried out the phi3 model from Microsoft which was designed with this in mind. After setting the system prompt, to my surprise performed the task just as well and returned the same JSON for me.
701+
And in the case that the local LLM aced the task, we could also try scaling down to something that can run better on CPU, or that doesn't require so many resources. I tried out [Microsoft's phi3 model](https://azure.microsoft.com/en-us/blog/introducing-phi-3-redefining-whats-possible-with-slms/) which was designed with limited resources in mind. After setting the system prompt, to my surprise performed the task just as well and returned the same JSON for me.
702702
703703
To integrate a local LLM with your function, you can package [Ollama](https://ollama.com/) as a container image using the instructions on our sister site: inlets.dev - [Access local Ollama models from a cloud Kubernetes Cluster](https://inlets.dev/blog/2024/08/09/local-ollama-tunnel-k3s.html). There are a few options here including deploying the LLM as a function, or deploying it as a regular Kubernetes Deployment, either will work, but the Deployment allows for easier Pod spec customisation if you're using a more complex GPU sharing technology like [NVidia Time Slicing](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/gpu-sharing.html#time-slicing-gpus-in-kubernetes).
704704

0 commit comments

Comments
 (0)