Merge pull request #135 from HeidiSteen/main

HeidiSteen · web-flow · commit 67c92d2a4072 · 2024-12-04T20:50:09.000-08:00
Update Tutorial-RAG with new index that uses vector storage minimizat…
diff --git a/Tutorial-RAG/Tutorial-rag-requirements.txt b/Tutorial-RAG/Tutorial-rag-requirements.txt
@@ -1,6 +1,6 @@
 python-dotenv
 azure-core
-azure-search-documents==11.5.1
+azure-search-documents==11.5.2
 azure-storage-blob
 azure-identity
 openai
diff --git a/Tutorial-RAG/Tutorial-rag.ipynb b/Tutorial-RAG/Tutorial-rag.ipynb
@@ -940,6 +940,177 @@
     "\n",
     "print(response.choices[0].message.content)"
    ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Create a second index with reduced vector size\n",
+    "\n",
+    "Azure AI Search has multiple approaches for reducing vector size, which lowers the cost of vector workloads. In this step, create a new index that uses the following capabilities:\n",
+    "\n",
+    "- Smaller vector indexes by compressing the vectors used during query execution. Scalar quantization provides this capability.\n",
+    "- Smaller vector indexes by opting out of vector storage for search results. If you only need vectors for queries and not in response payload, you can drop the vector copy used for search results.\n",
+    "- Smaller vector fields through narrow data types. You can specify Collection(Edm.Half) on the text_vector field to store incoming float32 dimensions as float16.\n",
+    "\n",
+    "All of these capabilities are specified in a search index. After you load the index, compare the difference between the original index and the new one.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from azure.identity import DefaultAzureCredential\n",
+    "from azure.identity import get_bearer_token_provider\n",
+    "from azure.search.documents.indexes import SearchIndexClient\n",
+    "from azure.search.documents.indexes.models import (\n",
+    "    SearchField,\n",
+    "    SearchFieldDataType,\n",
+    "    VectorSearch,\n",
+    "    HnswAlgorithmConfiguration,\n",
+    "    VectorSearchProfile,\n",
+    "    AzureOpenAIVectorizer,\n",
+    "    AzureOpenAIVectorizerParameters,\n",
+    "    ScalarQuantizationCompression,\n",
+    "    ScalarQuantizationParameters,\n",
+    "    SearchIndex,\n",
+    "    SemanticConfiguration,\n",
+    "    SemanticPrioritizedFields,\n",
+    "    SemanticField,\n",
+    "    SemanticSearch,\n",
+    "    ScoringProfile,\n",
+    "    TagScoringFunction,\n",
+    "    TagScoringParameters\n",
+    ")\n",
+    "\n",
+    "credential = DefaultAzureCredential()\n",
+    "\n",
+    "index_name = \"py-rag-tutorial-small-vectors-idx\"\n",
+    "index_client = SearchIndexClient(endpoint=AZURE_SEARCH_SERVICE, credential=credential)  \n",
+    "fields = [\n",
+    "    SearchField(name=\"parent_id\", type=SearchFieldDataType.String),  \n",
+    "    SearchField(name=\"title\", type=SearchFieldDataType.String),\n",
+    "    SearchField(name=\"locations\", type=SearchFieldDataType.Collection(SearchFieldDataType.String), filterable=True),\n",
+    "    SearchField(name=\"chunk_id\", type=SearchFieldDataType.String, key=True, sortable=True, filterable=True, facetable=True, analyzer_name=\"keyword\"),  \n",
+    "    SearchField(name=\"chunk\", type=SearchFieldDataType.String, sortable=False, filterable=False, facetable=False),  \n",
+    "    SearchField(name=\"text_vector\", type=\"Collection(Edm.Half)\", vector_search_dimensions=1024, vector_search_profile_name=\"myHnswProfile\", stored= False)\n",
+    "    ]  \n",
+    "\n",
+    "# Configure the vector search configuration  \n",
+    "vector_search = VectorSearch(  \n",
+    "    algorithms=[  \n",
+    "        HnswAlgorithmConfiguration(name=\"myHnsw\"),\n",
+    "    ],  \n",
+    "    profiles=[  \n",
+    "        VectorSearchProfile(  \n",
+    "            name=\"myHnswProfile\",  \n",
+    "            algorithm_configuration_name=\"myHnsw\",\n",
+    "            compression_name=\"myScalarQuantization\",\n",
+    "            vectorizer_name=\"myOpenAI\",  \n",
+    "        )\n",
+    "    ],  \n",
+    "    vectorizers=[  \n",
+    "        AzureOpenAIVectorizer(  \n",
+    "            vectorizer_name=\"myOpenAI\",  \n",
+    "            kind=\"azureOpenAI\",  \n",
+    "            parameters=AzureOpenAIVectorizerParameters(  \n",
+    "                resource_url=AZURE_OPENAI_ACCOUNT,  \n",
+    "                deployment_name=\"text-embedding-3-large\",\n",
+    "                model_name=\"text-embedding-3-large\"\n",
+    "            ),\n",
+    "        ),  \n",
+    "    ],\n",
+    "    compressions=[\n",
+    "        ScalarQuantizationCompression(\n",
+    "            compression_name=\"myScalarQuantization\",\n",
+    "            rerank_with_original_vectors=True,\n",
+    "            default_oversampling=10,\n",
+    "            parameters=ScalarQuantizationParameters(quantized_data_type=\"int8\"),\n",
+    "        )\n",
+    "    ]\n",
+    ")\n",
+    "\n",
+    "semantic_config = SemanticConfiguration(\n",
+    "    name=\"my-semantic-config\",\n",
+    "    prioritized_fields=SemanticPrioritizedFields(\n",
+    "        title_field=SemanticField(field_name=\"title\"),\n",
+    "        keywords_fields=[SemanticField(field_name=\"locations\")],\n",
+    "        content_fields=[SemanticField(field_name=\"chunk\")]\n",
+    "    )\n",
+    ")\n",
+    "\n",
+    "semantic_search = SemanticSearch(configurations=[semantic_config])\n",
+    "\n",
+    "scoring_profiles = [  \n",
+    "    ScoringProfile(  \n",
+    "        name=\"my-scoring-profile\",\n",
+    "        functions=[\n",
+    "            TagScoringFunction(  \n",
+    "                field_name=\"locations\",  \n",
+    "                boost=5.0,  \n",
+    "                parameters=TagScoringParameters(  \n",
+    "                    tags_parameter=\"tags\",  \n",
+    "                ),  \n",
+    "            ) \n",
+    "        ]\n",
+    "    )\n",
+    "]\n",
+    "\n",
+    "index = SearchIndex(name=index_name, fields=fields, vector_search=vector_search, semantic_search=semantic_search, scoring_profiles=scoring_profiles)  \n",
+    "result = index_client.create_or_update_index(index)  \n",
+    "print(f\"{result.name} created\")\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Run the indexer to create the new index\n",
+    "\n",
+    "Your ability to create and populate the new index is predicated on reusing data structures created earlier in this tutorial. If you have run every cell in this notebook, you have an existing data source and skillset, but here we create a new indexer so that there's no history or caching to get in the way. The indexer uses the notebook's current values for skillset_name, target_index_name, and data_source_name. Because the skillset and data source haven't changed, the existing references are still valid. Because you just created a new index in the previous cell, the current target_index_name is now \"py-rag-tutorial-small-vectors-idx\"."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from azure.search.documents.indexes.models import (\n",
+    "    SearchIndexer\n",
+    ")\n",
+    "\n",
+    "# Create an indexer  \n",
+    "indexer_name = \"py-rag-tutorial-small-vectors-idxr\" \n",
+    "\n",
+    "indexer_parameters = None\n",
+    "\n",
+    "indexer = SearchIndexer(  \n",
+    "    name=indexer_name,  \n",
+    "    description=\"Indexer to index documents and generate embeddings\",  \n",
+    "    skillset_name=skillset_name,  \n",
+    "    target_index_name=index_name,  \n",
+    "    data_source_name=data_source.name,\n",
+    "    parameters=indexer_parameters\n",
+    ")  \n",
+    "\n",
+    "# Create and run the indexer  \n",
+    "indexer_client = SearchIndexerClient(endpoint=AZURE_SEARCH_SERVICE, credential=credential)  \n",
+    "indexer_result = indexer_client.create_or_update_indexer(indexer)  \n",
+    "\n",
+    "print(f' {indexer_name} is created and running. Give the indexer a few minutes before running a query.')\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "As a final step, switch to the Azure portal to compare the vector storage requirements for the two indexes. The index created in the last step uses half-precision floating-point numbers (float16) for the text vectors. This reduces the storage requirements for the vectors by half compared to the previous index that used single-precision floating-point numbers (float32). Scalar compression and the omission of one set of the vecctors account for the remaining storage savings. For more information about reducing vector size, see [Choose an approach for optimizing vector storage and processing](https://learn.microsoft.com/azure/search/vector-search-how-to-configure-compression-storage).\n",
+    "\n",
+    "Consider revisiting the queries from the previous lessons so that you can compare query speed and utility. You should expect some variation in LLM output whenever you repeat a query, but in general the storage-saving techniques you implemented should not cause significant degradation of search result quality."
+   ]
   }
  ],
  "metadata": {