Merge pull request #133 from HeidiSteen/main

HeidiSteen · web-flow · commit 45cbe594500d · 2024-10-04T09:20:43.000-07:00
Include output in the RAG tutorial notebook
diff --git a/Tutorial-RAG/Tutorial-rag.ipynb b/Tutorial-RAG/Tutorial-rag.ipynb
@@ -220,9 +220,17 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 2,
    "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "py-rag-tutorial-idx created\n"
+     ]
+    }
+   ],
    "source": [
     "from azure.identity import DefaultAzureCredential\n",
     "from azure.identity import get_bearer_token_provider\n",
@@ -294,7 +302,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 4,
+   "execution_count": 3,
    "metadata": {},
    "outputs": [
     {
@@ -337,7 +345,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": 4,
    "metadata": {},
    "outputs": [
     {
@@ -455,7 +463,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 7,
+   "execution_count": 5,
    "metadata": {},
    "outputs": [
     {
@@ -503,7 +511,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 34,
+   "execution_count": 6,
    "metadata": {},
    "outputs": [
     {
@@ -531,7 +539,7 @@
     "from azure.search.documents import SearchClient\n",
     "from azure.search.documents.models import VectorizableTextQuery\n",
     "\n",
-    "# Vector Search using text-to-vector conversion of the querystring\n",
+    "# Vector Search using text-to-vector conversion of the query string\n",
     "query = \"what's NASA's website?\"  \n",
     "\n",
     "search_client = SearchClient(endpoint=AZURE_SEARCH_SERVICE, credential=credential, index_name=index_name)\n",
@@ -555,14 +563,14 @@
    "source": [
     "## Search using a chat model\n",
     "\n",
-    "This script sends a query, the query response, and a prompt to an LLM for chat completion. This time, the response is created using generative AI.\n",
+    "This script sends a query, the query response, and a prompt to an LLM for chat completion. This time, the response is created using generative AI. We broke this task out into three separate tasks: set up the clients, set up the search query, call the LLM and get the response. We also give it a more interesting query. \n",
     "\n",
-    "We broke this task out into three separate tasks: set up the clients, set up the search query, call the LLM and get the response. For more information about this step, its behaviors, and changing the settings, revisit [Search and generate answers](https://learn.microsoft.com/azure/search/tutorial-rag-build-solution-query) in the tutorial."
+    "To learn more about this step, revisit [Search and generate answers](https://learn.microsoft.com/azure/search/tutorial-rag-build-solution-query) in the tutorial."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 43,
+   "execution_count": 7,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -603,7 +611,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 44,
+   "execution_count": 8,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -631,20 +639,16 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 45,
+   "execution_count": 9,
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "The NASA Earth book stands at the intersection of science and art, using NASA's unique vantage point and tools to study Earth’s physical processes from beneath the crust to the edge of the atmosphere. It presents Earth as a dynamic system, examining cycles and processes such as the water cycle, carbon cycle, ocean circulation, and the movement of heat. The book uses images to tell the story of Earth's land, wind, water, ice, and air as seen from above, showcasing the planet’s diverse colors, textures, and shapes.\n",
-      "\n",
-      "- It aims to inspire by presenting a 4.5-billion-year-old planet through striking images.\n",
-      "- The book highlights how light is observed and studied, reflecting NASA’s scientific pursuits and artistic sensibilities.\n",
-      "- It emphasizes the awe-inspiring beauty of Earth, which NASA captures from space.\n",
+      "The NASA Earth book is about the intricate and captivating science of our planet, studied through NASA's unique perspective and tools. It presents Earth as a dynamic and complex system, observed through various cycles and processes such as the water cycle and ocean circulation. The book combines stunning satellite images with detailed scientific insights, portraying Earth’s beauty and the continuous interaction of land, wind, water, ice, and air seen from above. It aims to inspire and demonstrate that the truth of our planet is as compelling as any fiction.\n",
       "\n",
-      "(Source: page-8.pdf)\n"
+      "Source: page-8.pdf\n"
      ]
     }
    ],
@@ -666,21 +670,82 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
+    "## Try another query\n",
+    "\n",
+    "The first query is very broad. Let's ask another question that requires the search engine and the LLM to find more granular information."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Yes, there are cloud formations specific to oceans and large bodies of water. These include:\n",
+      "\n",
+      "- **Cloud Streets**: Formed when wind blows from a cold surface like sea ice over the warmer, moister air near the open ocean. The winds create cylinders of spinning air, with clouds forming along the upward cycle of these cylinders. This phenomenon was observed over the Bering Strait in January 2010 (Source: page-21.pdf).\n",
+      "- **Dense Marine Clouds**: Commonly form over the ocean due to cooler, moist marine air. For example, along the coast of China, onshore winds carry these clouds toward the land, but they tend to evaporate as they move onshore due to the warmer, drier landmass (Source: page-33.pdf).\n",
+      "\n",
+      "Summary: Specific cloud formations, such as cloud streets and dense marine clouds, occur over oceans and large bodies of water.\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Focused query on cloud formations and bodies of water\n",
+    "query=\"Are there any cloud formations specific to oceans and large bodies of water?\"\n",
+    "vector_query = VectorizableTextQuery(text=query, k_nearest_neighbors=50, fields=\"text_vector\")\n",
+    "\n",
+    "search_results = search_client.search(\n",
+    "    search_text=query,\n",
+    "    vector_queries= [vector_query],\n",
+    "    select=[\"title\", \"chunk\", \"locations\"],\n",
+    "    top=5,\n",
+    ")\n",
+    "\n",
+    "sources_formatted = \"=================\\n\".join([f'TITLE: {document[\"title\"]}, CONTENT: {document[\"chunk\"]}, LOCATIONS: {document[\"locations\"]}' for document in search_results])\n",
     "\n",
+    "response = openai_client.chat.completions.create(\n",
+    "    messages=[\n",
+    "        {\n",
+    "            \"role\": \"user\",\n",
+    "            \"content\": GROUNDED_PROMPT.format(query=query, sources=sources_formatted)\n",
+    "        }\n",
+    "    ],\n",
+    "    model=deployment_name\n",
+    ")\n",
+    "\n",
+    "print(response.choices[0].message.content)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
     "## Update the schema for semantic ranking and scoring profile\n",
     "\n",
-    "Semantic ranking and scoring profile configurations exist in the index schema. You can update an existing index to use both without incurring a [rebuild requirement](/azure/search/search-howto-reindex).\n",
+    "Azure AI Search has multiple features and capabilities that improve relevance. In this step, we add two of them: semantic ranking and scoring profiles. \n",
     "\n",
-    "An update request should include all of the existing schema definitions that you want to keep, plus the new or changed elements. It's a best practice to issue a GET INDEX request to retrieve the current index before adding new elements.\n",
+    "Semantic ranking and scoring profile configurations exist in the index schema. You can update an existing index to use both without incurring a [rebuild requirement](/azure/search/search-howto-reindex). An update request should include all of the existing schema definitions that you want to keep, plus the new or changed elements. It's a best practice to issue a GET INDEX request to retrieve the current index before adding new elements.\n",
     "\n",
-    "For more information about this step, its behaviors, see [Maximimze relevance](https://learn.microsoft.com/azure/search/tutorial-rag-build-maximize-relevance) in the RAG tutorial series."
+    "To learn more about this step, see [Maximimze relevance](https://learn.microsoft.com/azure/search/tutorial-rag-build-maximize-relevance) in the RAG tutorial series."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 11,
    "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "py-rag-tutorial-idx updated\n"
+     ]
+    }
+   ],
    "source": [
     "# Update the classes to include the new fields\n",
     "from azure.identity import DefaultAzureCredential\n",
@@ -783,16 +848,35 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Update query request using semantic configuration and scoring profile\n",
+    "## Rerun the query using semantic configuration and scoring profile\n",
+    "\n",
+    "This example updates the query request. It's using the same query as before, but with the addition of semantic ranking and a scoring profile that boosts any matching search documents that mention water-related terms. \n",
     "\n",
-    "This example updates the query request. An exlanation for this script can be found in [Maximimze relevance](https://learn.microsoft.com/azure/search/tutorial-rag-build-maximize-relevance) in the RAG tutorial series."
+    "Compared to the \"before\" query that gave us a reasonable response to the question about cloud formations and water, this query should provide a better answer based on the extra relevance tuning capabilities.\n",
+    "\n",
+    "An explanation for this script can be found in [Maximimze relevance](https://learn.microsoft.com/azure/search/tutorial-rag-build-maximize-relevance) in the RAG tutorial series."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 13,
    "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Yes, there are specific cloud formations associated with oceans and large bodies of water:\n",
+      "\n",
+      "- **Low Stratus Clouds**: Observed framing a hole over iceberg A-56 in the South Atlantic Ocean. These clouds can be influenced by thermal instabilities created by large obstacles like icebergs (page-39.pdf).\n",
+      "- **Undular Bore/Solitary Wave**: Created by the interaction between cool, dry air from Africa and warm, moist air over the Atlantic Ocean off the coast of Mauritania. This results in a wave structure in the atmosphere that influences cloud formation (page-23.pdf).\n",
+      "- **Ship Tracks**: Narrow clouds formed by water vapor condensing around pollution particles from ship exhaust, observed over the Pacific Ocean. These clouds can stretch for many hundreds of kilometers (page-31.pdf).\n",
+      "- **Volcanic Eruption Plumes**: Ash and volcanic particles from eruptions in the South Sandwich Islands act as seeds for cloud formation. These plumes were observed in the South Atlantic Ocean (page-13.pdf).\n",
+      "\n",
+      "Summary: Specific cloud formations over oceans include low stratus clouds, undular bores, ship tracks, and volcanic eruption plumes, influenced by various factors such as thermal instabilities, air interactions, pollution, and volcanic activity.\n"
+     ]
+    }
+   ],
    "source": [
     "# Import libraries\n",
     "from azure.search.documents import SearchClient\n",
@@ -819,16 +903,16 @@
     "Answer the query using only the sources provided below.\n",
     "Use bullets if the answer has multiple points.\n",
     "If the answer is longer than 3 sentences, provide a summary.\n",
-    "Answer ONLY with the facts listed in the list of sources below.\n",
+    "Answer ONLY with the facts listed in the list of sources below. Cite your source when you answer the question\n",
     "If there isn't enough information below, say you don't know.\n",
     "Do not generate answers that don't use the sources below.\n",
     "Query: {query}\n",
     "Sources:\\n{sources}\n",
     "\"\"\"\n",
     "\n",
     "# Queries are unchanged in this update\n",
-    "query=\"What's the NASA earth book about?\"\n",
-    "vector_query = VectorizableTextQuery(text=query, k_nearest_neighbors=1, fields=\"text_vector\")\n",
+    "query=\"Are there any cloud formations specific to oceans and large bodies of water?\"\n",
+    "vector_query = VectorizableTextQuery(text=query, k_nearest_neighbors=50, fields=\"text_vector\")\n",
     "\n",
     "# Add query_type semantic and semantic_configuration_name\n",
     "# Add scoring_profile and scoring_parameters\n",
@@ -842,7 +926,7 @@
     "    select=\"title, chunk, locations\",\n",
     "    top=5,\n",
     ")\n",
-    "sources_formatted = \"\\n\".join([f'{document[\"title\"]}:{document[\"chunk\"]}:{document[\"locations\"]}' for document in search_results])\n",
+    "sources_formatted = \"=================\\n\".join([f'TITLE: {document[\"title\"]}, CONTENT: {document[\"chunk\"]}, LOCATIONS: {document[\"locations\"]}' for document in search_results])\n",
     "\n",
     "response = openai_client.chat.completions.create(\n",
     "    messages=[\n",