updated LiveData notebook for Llama 3

jeffxtang · jeffxtang · commit 850ad18a56d2 · 2024-04-24T14:52:31.000-07:00
diff --git a/recipes/use_cases/LiveData.ipynb b/recipes/use_cases/LiveData.ipynb
@@ -7,22 +7,7 @@
    "source": [
     "## This demo app shows:\n",
     "* How to use LlamaIndex, an open source library to help you build custom data augmented LLM applications\n",
-    "* How to ask Llama questions about recent live data via the You.com live search API and LlamaIndex\n",
-    "\n",
-    "The LangChain package is used to facilitate the call to Llama2 hosted on Replicate\n",
-    "\n",
-    "**Note** We will be using Replicate to run the examples here. You will need to first sign in with Replicate with your github account, then create a free API token [here](https://replicate.com/account/api-tokens) that you can use for a while. \n",
-    "After the free trial ends, you will need to enter billing info to continue to use Llama2 hosted on Replicate."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "68cf076e",
-   "metadata": {},
-   "source": [
-    "We start by installing the necessary packages:\n",
-    "- [langchain](https://python.langchain.com/docs/get_started/introduction) which provides RAG capabilities\n",
-    "- [llama-index](https://docs.llamaindex.ai/en/stable/) for data augmentation."
+    "* How to ask Llama 3 questions about recent live data via the [Trvily](https://tavily.com) live search API"
    ]
   },
   {
@@ -32,37 +17,27 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "!pip install llama-index langchain"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "21fe3849",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# use ServiceContext to configure the LLM used and the custom embeddings \n",
-    "from llama_index import ServiceContext\n",
-    "\n",
-    "# VectorStoreIndex is used to index custom data \n",
-    "from llama_index import VectorStoreIndex\n",
-    "\n",
-    "from langchain.llms import Replicate"
+    "!pip install llama-index \n",
+    "!pip install llama-index-core\n",
+    "!pip install llama-index-llms-replicate\n",
+    "!pip install llama-index-embeddings-huggingface\n",
+    "!pip install tavily-python"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "73e8e661",
+   "id": "83639e83-2baa-4156-93a2-b9b6d4baf7d6",
    "metadata": {},
    "source": [
-    "Next we set up the Replicate token."
+    "You will be using [Replicate](https://replicate.com/meta/meta-llama-3-8b-instruct) to run the examples here. You will need to first sign in with Replicate with your github account, then create a free API token [here](https://replicate.com/account/api-tokens) that you can use for a while. You can also use other Llama 3 cloud providers such as [Groq](https://console.groq.com/), [Together](https://api.together.xyz/playground/language/meta-llama/Llama-3-8b-hf), or [Anyscale](https://app.endpoints.anyscale.com/playground) - see Section 2 of the Getting to Know Llama [notebook](https://github.com/meta-llama/llama-recipes/blob/main/recipes/quickstart/Getting_to_know_Llama.ipynb) for more information.\n",
+    "\n",
+    "If you'd like to run Llama 3 locally for the benefits of privacy, no cost or no rate limit (some Llama 3 hosting providers set limits for free plan of queries or tokens per second or minute), see [Running Llama Locally](https://github.com/meta-llama/llama-recipes/blob/main/recipes/quickstart/Running_Llama2_Anywhere/Running_Llama_on_Mac_Windows_Linux.ipynb)."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "d9d76e33",
+   "id": "e6affd70-c909-4340-924f-f282912765d5",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -75,58 +50,61 @@
   },
   {
    "cell_type": "markdown",
-   "id": "f8ff812b",
+   "id": "18582e1f-30b1-4dc5-918a-de2995eb5b46",
    "metadata": {},
    "source": [
-    "In this example we will use the [YOU.com](https://you.com/) search engine to augment the LLM's responses.\n",
-    "To use the You.com Search API, you can email api@you.com to request an API key. "
+    "You'll set up the Llama 3 8b chat model from Replicate. You can also use Llama 3 70b model by replacing the `model` name with \"meta/meta-llama-3-70b-instruct\"."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "75275628-5235-4b55-8033-601c76107528",
+   "id": "21fe3849",
    "metadata": {},
    "outputs": [],
    "source": [
+    "from llama_index.core import Settings, VectorStoreIndex\n",
+    "from llama_index.embeddings.huggingface import HuggingFaceEmbedding\n",
+    "from llama_index.llms.replicate import Replicate\n",
+    "\n",
+    "Settings.llm = Replicate(\n",
+    "    model=\"meta/meta-llama-3-8b-instruct\",\n",
+    "    temperature=0.0,\n",
+    "    additional_kwargs={\"top_p\": 1, \"max_new_tokens\": 500},\n",
+    ")\n",
     "\n",
-    "YOUCOM_API_KEY = getpass()\n",
-    "os.environ[\"YOUCOM_API_KEY\"] = YOUCOM_API_KEY"
+    "Settings.embed_model = HuggingFaceEmbedding(\n",
+    "    model_name=\"BAAI/bge-small-en-v1.5\"\n",
+    ")"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "cb210c7c",
+   "id": "f8ff812b",
    "metadata": {},
    "source": [
-    "We then call the Llama 2 model from replicate. \n",
-    "\n",
-    "We will use the llama 2 13b chat model. You can find more Llama 2 models by searching for them on the [Replicate model explore page](https://replicate.com/explore?query=llama).\n",
-    "You can add them here in the format: model_name/version"
+    "Next you will use the [Trvily](https://tavily.com/) search engine to augment the Llama 3's responses. To create a free trial Trvily Search API, sign in with your Google or Github account [here](https://app.tavily.com/sign-in)."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "c12fc2cb",
+   "id": "75275628-5235-4b55-8033-601c76107528",
    "metadata": {},
    "outputs": [],
    "source": [
-    "# set llm to be using Llama2 hosted on Replicate\n",
-    "llama2_13b_chat = \"meta/llama-2-13b-chat:f4e2de70d66816a838a89eeeb621910adffb0dd0baba3976c96980970978018d\"\n",
+    "from tavily import TavilyClient\n",
     "\n",
-    "llm = Replicate(\n",
-    "    model=llama2_13b_chat,\n",
-    "    model_kwargs={\"temperature\": 0.01, \"top_p\": 1, \"max_new_tokens\":500}\n",
-    ")"
+    "TAVILY_API_KEY = getpass()\n",
+    "tavily = TavilyClient(api_key=TAVILY_API_KEY)"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "476d72da",
    "metadata": {},
    "source": [
-    "Using our api key we set up earlier, we make a request from YOU.com for live data on a particular topic."
+    "Do a live web search on \"Llama 3 fine-tuning\"."
    ]
   },
   {
@@ -136,15 +114,8 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "\n",
-    "import requests\n",
-    "\n",
-    "query = \"Meta Connect\" # you can try other live data query about sports score, stock market and weather info \n",
-    "headers = {\"X-API-Key\": os.environ[\"YOUCOM_API_KEY\"]}\n",
-    "data = requests.get(\n",
-    "    f\"https://api.ydc-index.io/search?query={query}\",\n",
-    "    headers=headers,\n",
-    ").json()"
+    "response = tavily.search(query=\"Llama 3 fine-tuning\")\n",
+    "context = [{\"url\": obj[\"url\"], \"content\": obj[\"content\"]} for obj in response['results']]"
    ]
   },
   {
@@ -154,55 +125,15 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# check the query result in JSON\n",
-    "import json\n",
-    "\n",
-    "print(json.dumps(data, indent=2))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "b196e697",
-   "metadata": {},
-   "source": [
-    "We then use the [`JSONLoader`](https://llamahub.ai/l/file-json) to extract the text from the returned data. The `JSONLoader` gives us the ability to load the data into LamaIndex.\n",
-    "In the next cell we show how to load the JSON result with key info stored as \"snippets\".\n",
-    "\n",
-    "However, you can also add the snippets in the query result to documents like below:\n",
-    "```python \n",
-    "from llama_index import Document\n",
-    "snippets = [snippet for hit in data[\"hits\"] for snippet in hit[\"snippets\"]]\n",
-    "documents = [Document(text=s) for s in snippets]\n",
-    "```\n",
-    "This can be handy if you just need to add a list of text strings to doc"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "7c40e73f-ca13-4f4a-a753-e613df3d389e",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# one way to load the JSON result with key info stored as \"snippets\"\n",
-    "from llama_index import download_loader\n",
-    "\n",
-    "JsonDataReader = download_loader(\"JsonDataReader\")\n",
-    "loader = JsonDataReader()\n",
-    "documents = loader.load_data([hit[\"snippets\"] for hit in data[\"hits\"]])\n"
+    "context"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "8e5e3b4e",
    "metadata": {},
    "source": [
-    "With the data set up, we create a vector store for the data and a query engine for it.\n",
-    "\n",
-    "For our embeddings we will use `HuggingFaceEmbeddings` whose default embedding model is sentence-transformers/all-mpnet-base-v2. This model provides a good balance between speed and performance.\n",
-    "To change the default model, call `HuggingFaceEmbeddings(model_name=<another_embedding_model>)`. \n",
-    "\n",
-    "For more info see https://huggingface.co/blog/mteb. "
+    "Create documents based on the search results, index and save them to a vector store, then create a query engine."
    ]
   },
   {
@@ -212,21 +143,11 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# use HuggingFace embeddings \n",
-    "from langchain.embeddings.huggingface import HuggingFaceEmbeddings\n",
-    "from llama_index import LangchainEmbedding\n",
+    "from llama_index.core import Document\n",
     "\n",
+    "documents = [Document(text=ct['content']) for ct in context]\n",
+    "index = VectorStoreIndex.from_documents(documents)\n",
     "\n",
-    "embeddings = LangchainEmbedding(HuggingFaceEmbeddings())\n",
-    "print(embeddings)\n",
-    "\n",
-    "# create a ServiceContext instance to use Llama2 and custom embeddings\n",
-    "service_context = ServiceContext.from_defaults(llm=llm, chunk_size=800, chunk_overlap=20, embed_model=embeddings)\n",
-    "\n",
-    "# create vector store index from the documents created above\n",
-    "index = VectorStoreIndex.from_documents(documents, service_context=service_context)\n",
-    "\n",
-    "# create query engine from the index\n",
     "query_engine = index.as_query_engine(streaming=True)"
    ]
   },
@@ -235,7 +156,7 @@
    "id": "2c4ea012",
    "metadata": {},
    "source": [
-    "We are now ready to ask Llama 2 a question about the live data using our query engine."
+    "You are now ready to ask Llama 3 questions about the live data using the query engine."
    ]
   },
   {
@@ -245,7 +166,6 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# ask Llama2 a summary question about the search result\n",
     "response = query_engine.query(\"give me a summary\")\n",
     "response.print_response_stream()"
    ]
@@ -257,8 +177,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# more questions\n",
-    "query_engine.query(\"what products were announced\").print_response_stream()"
+    "query_engine.query(\"what's the latest about Llama 3 fine-tuning?\").print_response_stream()"
    ]
   },
   {
@@ -268,17 +187,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "query_engine.query(\"tell me more about Meta AI assistant\").print_response_stream()"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "16a56542",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "query_engine.query(\"what are Generative AI stickers\").print_response_stream()"
+    "query_engine.query(\"tell me more about Llama 3 fine-tuning\").print_response_stream()"
    ]
   }
  ],
@@ -298,7 +207,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.8.18"
+   "version": "3.11.9"
   }
  },
  "nbformat": 4,