|
8 | 8 | "<a href=\"https://colab.research.google.com/github/meta-llama/llama-recipes/blob/main/recipes/use_cases/RAG/HelloLlamaCloud.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>\n",
|
9 | 9 | "\n",
|
10 | 10 | "## This demo app shows:\n",
|
11 |
| - "* How to run Llama 3 in the cloud hosted on Replicate\n", |
| 11 | + "* How to run Llama 3.1 in the cloud hosted on Replicate\n", |
12 | 12 | "* How to use LangChain to ask Llama general questions and follow up questions\n",
|
13 |
| - "* How to use LangChain to load a recent web page - Hugging Face's [blog post on Llama 3](https://huggingface.co/blog/llama3) - and chat about it. This is the well known RAG (Retrieval Augmented Generation) method to let LLM such as Llama 3 be able to answer questions about the data not publicly available when Llama 3 was trained, or about your own data. RAG is one way to prevent LLM's hallucination\n", |
| 13 | + "* How to use LangChain to load a recent web page - Hugging Face's [blog post on Llama 3.1](https://huggingface.co/blog/llama31) - and chat about it. This is the well known RAG (Retrieval Augmented Generation) method to let LLM such as Llama 3 be able to answer questions about the data not publicly available when Llama 3 was trained, or about your own data. RAG is one way to prevent LLM's hallucination\n", |
14 | 14 | "\n",
|
15 |
| - "**Note** We will be using [Replicate](https://replicate.com/meta/meta-llama-3-8b-instruct) to run the examples here. You will need to first sign in with Replicate with your github account, then create a free API token [here](https://replicate.com/account/api-tokens) that you can use for a while. You can also use other Llama 3 cloud providers such as [Groq](https://console.groq.com/), [Together](https://api.together.xyz/playground/language/meta-llama/Llama-3-8b-hf), or [Anyscale](https://app.endpoints.anyscale.com/playground) - see Section 2 of the Getting to Know Llama [notebook](https://github.com/meta-llama/llama-recipes/blob/main/recipes/quickstart/Getting_to_know_Llama.ipynb) for more information." |
| 15 | + "**Note** We will be using [Replicate](https://replicate.com/meta/meta-llama-3.1-405b-instruct) to run the examples here. You will need to first sign in with Replicate with your github account, then create a free API token [here](https://replicate.com/account/api-tokens) that you can use for a while. You can also use other Llama 3.1 cloud providers such as [Groq](https://console.groq.com/), [Together](https://api.together.xyz/playground/language/meta-llama/Llama-3-8b-hf), or [Anyscale](https://app.endpoints.anyscale.com/playground) - see Section 2 of the Getting to Know Llama [notebook](https://github.com/meta-llama/llama-recipes/blob/main/recipes/quickstart/Getting_to_know_Llama.ipynb) for more information." |
16 | 16 | ]
|
17 | 17 | },
|
18 | 18 | {
|
|
59 | 59 | "id": "3e8870c1",
|
60 | 60 | "metadata": {},
|
61 | 61 | "source": [
|
62 |
| - "Next we call the Llama 3 8b chat model from Replicate. You can also use Llama 3 70b model by replacing the `model` name with \"meta/meta-llama-3-70b-instruct\"." |
| 62 | + "Next we call the Llama 3.1 405b chat model from Replicate. You can also use Llama 3 8B or 70B model by replacing the `model` name with the respective model URL(s)." |
63 | 63 | ]
|
64 | 64 | },
|
65 | 65 | {
|
|
71 | 71 | "source": [
|
72 | 72 | "from langchain_community.llms import Replicate\n",
|
73 | 73 | "llm = Replicate(\n",
|
74 |
| - " model=\"meta/meta-llama-3-8b-instruct\",\n", |
| 74 | + " model=\"meta/meta-llama-3.1-405b-instruct\",\n", |
75 | 75 | " model_kwargs={\"temperature\": 0.0, \"top_p\": 1, \"max_new_tokens\":500}\n",
|
76 | 76 | ")"
|
77 | 77 | ]
|
|
189 | 189 | "id": "fc436163",
|
190 | 190 | "metadata": {},
|
191 | 191 | "source": [
|
192 |
| - "Next, let's explore using Llama 3 to answer questions using documents for context. \n", |
193 |
| - "This gives us the ability to update Llama 3's knowledge thus giving it better context without needing to finetune. " |
| 192 | + "Next, let's explore using Llama 3.1 to answer questions using documents for context. \n", |
| 193 | + "This gives us the ability to update Llama 3.1's knowledge thus giving it better context without needing to finetune. " |
194 | 194 | ]
|
195 | 195 | },
|
196 | 196 | {
|
|
246 | 246 | "\n",
|
247 | 247 | "In general, you should use larger chuck sizes for highly structured text such as code and smaller size for less structured text. You may need to experiment with different chunk sizes and overlap values to find out the best numbers.\n",
|
248 | 248 | "\n",
|
249 |
| - "We then use `RetrievalQA` to retrieve the documents from the vector database and give the model more context on Llama 3, thereby increasing its knowledge.\n", |
| 249 | + "We then use `RetrievalQA` to retrieve the documents from the vector database and give the model more context on Llama 3.1, thereby increasing its knowledge. 3.1 also really shines with the new 128k context!\n", |
250 | 250 | "\n",
|
251 | 251 | "For each question, LangChain performs a semantic similarity search of it in the vector db, then passes the search results as the context to Llama to answer the question."
|
252 | 252 | ]
|
|
0 commit comments