Skip to content

Commit 93cd3b9

Browse files
fbnavmreso
authored andcommitted
Updating links to running llama3 locally
1 parent 4d69436 commit 93cd3b9

File tree

2 files changed

+2
-2
lines changed

2 files changed

+2
-2
lines changed

recipes/inference/model_servers/llama-on-prem.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Llama 3 On-Prem Inference Using vLLM and TGI
22

3-
Enterprise customers may prefer to deploy Llama 3 on-prem and run Llama in their own servers. This tutorial shows how to use Llama 3 with [vLLM](https://github.com/vllm-project/vllm) and Hugging Face [TGI](https://github.com/huggingface/text-generation-inference), two leading open-source tools to deploy and serve LLMs, and how to create vLLM and TGI hosted Llama 3 instances with [LangChain](https://www.langchain.com/), an open-source LLM app development framework which we used for our other demo apps: [Getting to Know Llama](https://github.com/meta-llama/llama-recipes/blob/main/recipes/quickstart/Getting_to_know_Llama.ipynb), Running Llama 3 <!-- markdown-link-check-disable -->[locally](https://github.com/meta-llama/llama-recipes/blob/main/recipes/quickstart/Running_Llama2_Anywhere/Running_Llama_on_Mac_Windows_Linux.ipynb) <!-- markdown-link-check-disable --> and [in the cloud](https://github.com/meta-llama/llama-recipes/blob/main/recipes/use_cases/RAG/HelloLlamaCloud.ipynb). See [here](https://medium.com/@rohit.k/tgi-vs-vllm-making-informed-choices-for-llm-deployment-37c56d7ff705) for a detailed comparison of vLLM and TGI.
3+
Enterprise customers may prefer to deploy Llama 3 on-prem and run Llama in their own servers. This tutorial shows how to use Llama 3 with [vLLM](https://github.com/vllm-project/vllm) and Hugging Face [TGI](https://github.com/huggingface/text-generation-inference), two leading open-source tools to deploy and serve LLMs, and how to create vLLM and TGI hosted Llama 3 instances with [LangChain](https://www.langchain.com/), an open-source LLM app development framework which we used for our other demo apps: [Getting to Know Llama](https://github.com/meta-llama/llama-recipes/blob/main/recipes/quickstart/Getting_to_know_Llama.ipynb), Running Llama 3 <!-- markdown-link-check-disable -->[locally](https://github.com/meta-llama/llama-recipes/blob/main/recipes/quickstart/Running_Llama3_Anywhere/Running_Llama_on_Mac_Windows_Linux.ipynb) <!-- markdown-link-check-disable --> and [in the cloud](https://github.com/meta-llama/llama-recipes/blob/main/recipes/use_cases/RAG/HelloLlamaCloud.ipynb). See [here](https://medium.com/@rohit.k/tgi-vs-vllm-making-informed-choices-for-llm-deployment-37c56d7ff705) for a detailed comparison of vLLM and TGI.
44

55
For [Ollama](https://ollama.com) based on-prem inference with Llama 3, see the Running Llama 3 locally notebook above.
66

recipes/use_cases/LiveData.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@
3434
"source": [
3535
"You will be using [Replicate](https://replicate.com/meta/meta-llama-3-8b-instruct) to run the examples here. You will need to first sign in with Replicate with your github account, then create a free API token [here](https://replicate.com/account/api-tokens) that you can use for a while. You can also use other Llama 3 cloud providers such as [Groq](https://console.groq.com/), [Together](https://api.together.xyz/playground/language/meta-llama/Llama-3-8b-hf), or [Anyscale](https://app.endpoints.anyscale.com/playground) - see Section 2 of the Getting to Know Llama [notebook](https://github.com/meta-llama/llama-recipes/blob/main/recipes/quickstart/Getting_to_know_Llama.ipynb) for more information.\n",
3636
"\n",
37-
"If you'd like to run Llama 3 locally for the benefits of privacy, no cost or no rate limit (some Llama 3 hosting providers set limits for free plan of queries or tokens per second or minute), see [Running Llama Locally](https://github.com/meta-llama/llama-recipes/blob/main/recipes/quickstart/Running_Llama2_Anywhere/Running_Llama_on_Mac_Windows_Linux.ipynb)."
37+
"If you'd like to run Llama 3 locally for the benefits of privacy, no cost or no rate limit (some Llama 3 hosting providers set limits for free plan of queries or tokens per second or minute), see [Running Llama Locally](https://github.com/meta-llama/llama-recipes/blob/main/recipes/quickstart/Running_Llama3_Anywhere/Running_Llama_on_Mac_Windows_Linux.ipynb)."
3838
]
3939
},
4040
{

0 commit comments

Comments
 (0)