|
7 | 7 | "source": [
|
8 | 8 | "## This demo app shows:\n",
|
9 | 9 | "* How to use LlamaIndex, an open source library to help you build custom data augmented LLM applications\n",
|
10 |
| - "* How to ask Llama questions about recent live data via the You.com live search API and LlamaIndex\n", |
11 |
| - "\n", |
12 |
| - "The LangChain package is used to facilitate the call to Llama2 hosted on Replicate\n", |
13 |
| - "\n", |
14 |
| - "**Note** We will be using Replicate to run the examples here. You will need to first sign in with Replicate with your github account, then create a free API token [here](https://replicate.com/account/api-tokens) that you can use for a while. \n", |
15 |
| - "After the free trial ends, you will need to enter billing info to continue to use Llama2 hosted on Replicate." |
16 |
| - ] |
17 |
| - }, |
18 |
| - { |
19 |
| - "cell_type": "markdown", |
20 |
| - "id": "68cf076e", |
21 |
| - "metadata": {}, |
22 |
| - "source": [ |
23 |
| - "We start by installing the necessary packages:\n", |
24 |
| - "- [langchain](https://python.langchain.com/docs/get_started/introduction) which provides RAG capabilities\n", |
25 |
| - "- [llama-index](https://docs.llamaindex.ai/en/stable/) for data augmentation." |
| 10 | + "* How to ask Llama 3 questions about recent live data via the [Trvily](https://tavily.com) live search API" |
26 | 11 | ]
|
27 | 12 | },
|
28 | 13 | {
|
|
32 | 17 | "metadata": {},
|
33 | 18 | "outputs": [],
|
34 | 19 | "source": [
|
35 |
| - "!pip install llama-index langchain" |
36 |
| - ] |
37 |
| - }, |
38 |
| - { |
39 |
| - "cell_type": "code", |
40 |
| - "execution_count": null, |
41 |
| - "id": "21fe3849", |
42 |
| - "metadata": {}, |
43 |
| - "outputs": [], |
44 |
| - "source": [ |
45 |
| - "# use ServiceContext to configure the LLM used and the custom embeddings \n", |
46 |
| - "from llama_index import ServiceContext\n", |
47 |
| - "\n", |
48 |
| - "# VectorStoreIndex is used to index custom data \n", |
49 |
| - "from llama_index import VectorStoreIndex\n", |
50 |
| - "\n", |
51 |
| - "from langchain.llms import Replicate" |
| 20 | + "!pip install llama-index \n", |
| 21 | + "!pip install llama-index-core\n", |
| 22 | + "!pip install llama-index-llms-replicate\n", |
| 23 | + "!pip install llama-index-embeddings-huggingface\n", |
| 24 | + "!pip install tavily-python" |
52 | 25 | ]
|
53 | 26 | },
|
54 | 27 | {
|
55 | 28 | "cell_type": "markdown",
|
56 |
| - "id": "73e8e661", |
| 29 | + "id": "83639e83-2baa-4156-93a2-b9b6d4baf7d6", |
57 | 30 | "metadata": {},
|
58 | 31 | "source": [
|
59 |
| - "Next we set up the Replicate token." |
| 32 | + "You will be using [Replicate](https://replicate.com/meta/meta-llama-3-8b-instruct) to run the examples here. You will need to first sign in with Replicate with your github account, then create a free API token [here](https://replicate.com/account/api-tokens) that you can use for a while. You can also use other Llama 3 cloud providers such as [Groq](https://console.groq.com/), [Together](https://api.together.xyz/playground/language/meta-llama/Llama-3-8b-hf), or [Anyscale](https://app.endpoints.anyscale.com/playground) - see Section 2 of the Getting to Know Llama [notebook](https://github.com/meta-llama/llama-recipes/blob/main/recipes/quickstart/Getting_to_know_Llama.ipynb) for more information.\n", |
| 33 | + "\n", |
| 34 | + "If you'd like to run Llama 3 locally for the benefits of privacy, no cost or no rate limit (some Llama 3 hosting providers set limits for free plan of queries or tokens per second or minute), see [Running Llama Locally](https://github.com/meta-llama/llama-recipes/blob/main/recipes/quickstart/Running_Llama2_Anywhere/Running_Llama_on_Mac_Windows_Linux.ipynb)." |
60 | 35 | ]
|
61 | 36 | },
|
62 | 37 | {
|
63 | 38 | "cell_type": "code",
|
64 | 39 | "execution_count": null,
|
65 |
| - "id": "d9d76e33", |
| 40 | + "id": "e6affd70-c909-4340-924f-f282912765d5", |
66 | 41 | "metadata": {},
|
67 | 42 | "outputs": [],
|
68 | 43 | "source": [
|
|
75 | 50 | },
|
76 | 51 | {
|
77 | 52 | "cell_type": "markdown",
|
78 |
| - "id": "f8ff812b", |
| 53 | + "id": "18582e1f-30b1-4dc5-918a-de2995eb5b46", |
79 | 54 | "metadata": {},
|
80 | 55 | "source": [
|
81 |
| - "In this example we will use the [YOU.com](https://you.com/) search engine to augment the LLM's responses.\n", |
82 |
| - "To use the You.com Search API, you can email [email protected] to request an API key. " |
| 56 | + "You'll set up the Llama 3 8b chat model from Replicate. You can also use Llama 3 70b model by replacing the `model` name with \"meta/meta-llama-3-70b-instruct\"." |
83 | 57 | ]
|
84 | 58 | },
|
85 | 59 | {
|
86 | 60 | "cell_type": "code",
|
87 | 61 | "execution_count": null,
|
88 |
| - "id": "75275628-5235-4b55-8033-601c76107528", |
| 62 | + "id": "21fe3849", |
89 | 63 | "metadata": {},
|
90 | 64 | "outputs": [],
|
91 | 65 | "source": [
|
| 66 | + "from llama_index.core import Settings, VectorStoreIndex\n", |
| 67 | + "from llama_index.embeddings.huggingface import HuggingFaceEmbedding\n", |
| 68 | + "from llama_index.llms.replicate import Replicate\n", |
| 69 | + "\n", |
| 70 | + "Settings.llm = Replicate(\n", |
| 71 | + " model=\"meta/meta-llama-3-8b-instruct\",\n", |
| 72 | + " temperature=0.0,\n", |
| 73 | + " additional_kwargs={\"top_p\": 1, \"max_new_tokens\": 500},\n", |
| 74 | + ")\n", |
92 | 75 | "\n",
|
93 |
| - "YOUCOM_API_KEY = getpass()\n", |
94 |
| - "os.environ[\"YOUCOM_API_KEY\"] = YOUCOM_API_KEY" |
| 76 | + "Settings.embed_model = HuggingFaceEmbedding(\n", |
| 77 | + " model_name=\"BAAI/bge-small-en-v1.5\"\n", |
| 78 | + ")" |
95 | 79 | ]
|
96 | 80 | },
|
97 | 81 | {
|
98 | 82 | "cell_type": "markdown",
|
99 |
| - "id": "cb210c7c", |
| 83 | + "id": "f8ff812b", |
100 | 84 | "metadata": {},
|
101 | 85 | "source": [
|
102 |
| - "We then call the Llama 2 model from replicate. \n", |
103 |
| - "\n", |
104 |
| - "We will use the llama 2 13b chat model. You can find more Llama 2 models by searching for them on the [Replicate model explore page](https://replicate.com/explore?query=llama).\n", |
105 |
| - "You can add them here in the format: model_name/version" |
| 86 | + "Next you will use the [Trvily](https://tavily.com/) search engine to augment the Llama 3's responses. To create a free trial Trvily Search API, sign in with your Google or Github account [here](https://app.tavily.com/sign-in)." |
106 | 87 | ]
|
107 | 88 | },
|
108 | 89 | {
|
109 | 90 | "cell_type": "code",
|
110 | 91 | "execution_count": null,
|
111 |
| - "id": "c12fc2cb", |
| 92 | + "id": "75275628-5235-4b55-8033-601c76107528", |
112 | 93 | "metadata": {},
|
113 | 94 | "outputs": [],
|
114 | 95 | "source": [
|
115 |
| - "# set llm to be using Llama2 hosted on Replicate\n", |
116 |
| - "llama2_13b_chat = \"meta/llama-2-13b-chat:f4e2de70d66816a838a89eeeb621910adffb0dd0baba3976c96980970978018d\"\n", |
| 96 | + "from tavily import TavilyClient\n", |
117 | 97 | "\n",
|
118 |
| - "llm = Replicate(\n", |
119 |
| - " model=llama2_13b_chat,\n", |
120 |
| - " model_kwargs={\"temperature\": 0.01, \"top_p\": 1, \"max_new_tokens\":500}\n", |
121 |
| - ")" |
| 98 | + "TAVILY_API_KEY = getpass()\n", |
| 99 | + "tavily = TavilyClient(api_key=TAVILY_API_KEY)" |
122 | 100 | ]
|
123 | 101 | },
|
124 | 102 | {
|
125 | 103 | "cell_type": "markdown",
|
126 | 104 | "id": "476d72da",
|
127 | 105 | "metadata": {},
|
128 | 106 | "source": [
|
129 |
| - "Using our api key we set up earlier, we make a request from YOU.com for live data on a particular topic." |
| 107 | + "Do a live web search on \"Llama 3 fine-tuning\"." |
130 | 108 | ]
|
131 | 109 | },
|
132 | 110 | {
|
|
136 | 114 | "metadata": {},
|
137 | 115 | "outputs": [],
|
138 | 116 | "source": [
|
139 |
| - "\n", |
140 |
| - "import requests\n", |
141 |
| - "\n", |
142 |
| - "query = \"Meta Connect\" # you can try other live data query about sports score, stock market and weather info \n", |
143 |
| - "headers = {\"X-API-Key\": os.environ[\"YOUCOM_API_KEY\"]}\n", |
144 |
| - "data = requests.get(\n", |
145 |
| - " f\"https://api.ydc-index.io/search?query={query}\",\n", |
146 |
| - " headers=headers,\n", |
147 |
| - ").json()" |
| 117 | + "response = tavily.search(query=\"Llama 3 fine-tuning\")\n", |
| 118 | + "context = [{\"url\": obj[\"url\"], \"content\": obj[\"content\"]} for obj in response['results']]" |
148 | 119 | ]
|
149 | 120 | },
|
150 | 121 | {
|
|
154 | 125 | "metadata": {},
|
155 | 126 | "outputs": [],
|
156 | 127 | "source": [
|
157 |
| - "# check the query result in JSON\n", |
158 |
| - "import json\n", |
159 |
| - "\n", |
160 |
| - "print(json.dumps(data, indent=2))" |
161 |
| - ] |
162 |
| - }, |
163 |
| - { |
164 |
| - "cell_type": "markdown", |
165 |
| - "id": "b196e697", |
166 |
| - "metadata": {}, |
167 |
| - "source": [ |
168 |
| - "We then use the [`JSONLoader`](https://llamahub.ai/l/file-json) to extract the text from the returned data. The `JSONLoader` gives us the ability to load the data into LamaIndex.\n", |
169 |
| - "In the next cell we show how to load the JSON result with key info stored as \"snippets\".\n", |
170 |
| - "\n", |
171 |
| - "However, you can also add the snippets in the query result to documents like below:\n", |
172 |
| - "```python \n", |
173 |
| - "from llama_index import Document\n", |
174 |
| - "snippets = [snippet for hit in data[\"hits\"] for snippet in hit[\"snippets\"]]\n", |
175 |
| - "documents = [Document(text=s) for s in snippets]\n", |
176 |
| - "```\n", |
177 |
| - "This can be handy if you just need to add a list of text strings to doc" |
178 |
| - ] |
179 |
| - }, |
180 |
| - { |
181 |
| - "cell_type": "code", |
182 |
| - "execution_count": null, |
183 |
| - "id": "7c40e73f-ca13-4f4a-a753-e613df3d389e", |
184 |
| - "metadata": {}, |
185 |
| - "outputs": [], |
186 |
| - "source": [ |
187 |
| - "# one way to load the JSON result with key info stored as \"snippets\"\n", |
188 |
| - "from llama_index import download_loader\n", |
189 |
| - "\n", |
190 |
| - "JsonDataReader = download_loader(\"JsonDataReader\")\n", |
191 |
| - "loader = JsonDataReader()\n", |
192 |
| - "documents = loader.load_data([hit[\"snippets\"] for hit in data[\"hits\"]])\n" |
| 128 | + "context" |
193 | 129 | ]
|
194 | 130 | },
|
195 | 131 | {
|
196 | 132 | "cell_type": "markdown",
|
197 | 133 | "id": "8e5e3b4e",
|
198 | 134 | "metadata": {},
|
199 | 135 | "source": [
|
200 |
| - "With the data set up, we create a vector store for the data and a query engine for it.\n", |
201 |
| - "\n", |
202 |
| - "For our embeddings we will use `HuggingFaceEmbeddings` whose default embedding model is sentence-transformers/all-mpnet-base-v2. This model provides a good balance between speed and performance.\n", |
203 |
| - "To change the default model, call `HuggingFaceEmbeddings(model_name=<another_embedding_model>)`. \n", |
204 |
| - "\n", |
205 |
| - "For more info see https://huggingface.co/blog/mteb. " |
| 136 | + "Create documents based on the search results, index and save them to a vector store, then create a query engine." |
206 | 137 | ]
|
207 | 138 | },
|
208 | 139 | {
|
|
212 | 143 | "metadata": {},
|
213 | 144 | "outputs": [],
|
214 | 145 | "source": [
|
215 |
| - "# use HuggingFace embeddings \n", |
216 |
| - "from langchain.embeddings.huggingface import HuggingFaceEmbeddings\n", |
217 |
| - "from llama_index import LangchainEmbedding\n", |
| 146 | + "from llama_index.core import Document\n", |
218 | 147 | "\n",
|
| 148 | + "documents = [Document(text=ct['content']) for ct in context]\n", |
| 149 | + "index = VectorStoreIndex.from_documents(documents)\n", |
219 | 150 | "\n",
|
220 |
| - "embeddings = LangchainEmbedding(HuggingFaceEmbeddings())\n", |
221 |
| - "print(embeddings)\n", |
222 |
| - "\n", |
223 |
| - "# create a ServiceContext instance to use Llama2 and custom embeddings\n", |
224 |
| - "service_context = ServiceContext.from_defaults(llm=llm, chunk_size=800, chunk_overlap=20, embed_model=embeddings)\n", |
225 |
| - "\n", |
226 |
| - "# create vector store index from the documents created above\n", |
227 |
| - "index = VectorStoreIndex.from_documents(documents, service_context=service_context)\n", |
228 |
| - "\n", |
229 |
| - "# create query engine from the index\n", |
230 | 151 | "query_engine = index.as_query_engine(streaming=True)"
|
231 | 152 | ]
|
232 | 153 | },
|
|
235 | 156 | "id": "2c4ea012",
|
236 | 157 | "metadata": {},
|
237 | 158 | "source": [
|
238 |
| - "We are now ready to ask Llama 2 a question about the live data using our query engine." |
| 159 | + "You are now ready to ask Llama 3 questions about the live data using the query engine." |
239 | 160 | ]
|
240 | 161 | },
|
241 | 162 | {
|
|
245 | 166 | "metadata": {},
|
246 | 167 | "outputs": [],
|
247 | 168 | "source": [
|
248 |
| - "# ask Llama2 a summary question about the search result\n", |
249 | 169 | "response = query_engine.query(\"give me a summary\")\n",
|
250 | 170 | "response.print_response_stream()"
|
251 | 171 | ]
|
|
257 | 177 | "metadata": {},
|
258 | 178 | "outputs": [],
|
259 | 179 | "source": [
|
260 |
| - "# more questions\n", |
261 |
| - "query_engine.query(\"what products were announced\").print_response_stream()" |
| 180 | + "query_engine.query(\"what's the latest about Llama 3 fine-tuning?\").print_response_stream()" |
262 | 181 | ]
|
263 | 182 | },
|
264 | 183 | {
|
|
268 | 187 | "metadata": {},
|
269 | 188 | "outputs": [],
|
270 | 189 | "source": [
|
271 |
| - "query_engine.query(\"tell me more about Meta AI assistant\").print_response_stream()" |
272 |
| - ] |
273 |
| - }, |
274 |
| - { |
275 |
| - "cell_type": "code", |
276 |
| - "execution_count": null, |
277 |
| - "id": "16a56542", |
278 |
| - "metadata": {}, |
279 |
| - "outputs": [], |
280 |
| - "source": [ |
281 |
| - "query_engine.query(\"what are Generative AI stickers\").print_response_stream()" |
| 190 | + "query_engine.query(\"tell me more about Llama 3 fine-tuning\").print_response_stream()" |
282 | 191 | ]
|
283 | 192 | }
|
284 | 193 | ],
|
|
298 | 207 | "name": "python",
|
299 | 208 | "nbconvert_exporter": "python",
|
300 | 209 | "pygments_lexer": "ipython3",
|
301 |
| - "version": "3.8.18" |
| 210 | + "version": "3.11.9" |
302 | 211 | }
|
303 | 212 | },
|
304 | 213 | "nbformat": 4,
|
|
0 commit comments