Skip to content

Commit 7247c86

Browse files
authored
Merge pull request #1836 from HeidiSteen/heidist-rag2
[azure search] RAG tutorial for minimize cost/storage
2 parents 460ed07 + eef2518 commit 7247c86

6 files changed

+345
-8
lines changed
80.5 KB
Loading

articles/search/toc.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -108,6 +108,8 @@ items:
108108
href: tutorial-rag-build-solution-query.md
109109
- name: Maximize relevance
110110
href: tutorial-rag-build-solution-maximize-relevance.md
111+
- name: Minimize storage and costs
112+
href: tutorial-rag-build-solution-minimize-storage.md
111113
- name: Skills tutorials
112114
items:
113115
- name: C#

articles/search/tutorial-rag-build-solution-index-schema.md

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -65,10 +65,8 @@ In Azure AI Search, an index that works best for RAG workloads has these qualiti
6565

6666
- Your schema should either be flat (no complex types or structures), or you should [format the complext type output as JSON](search-get-started-rag.md#send-a-complex-rag-query) before sending it to the LLM. This requirement is specific to the RAG pattern in Azure AI Search.
6767

68-
<!-- Although Azure AI Search can't join indexes, you can create indexes that preserve parent-child relationship, and then use sequential queries in your search logic to pull from both (a query on the chunked data index, a lookup on the parent index). This exercise includes templates for parent-child elements in the same index and in separate indexes, where information from the parent index is retrieved using a lookup query. -->
69-
70-
<!-- > [!NOTE]
71-
> Schema design affects storage and costs. This exercise is focused on schema fundamentals. In the [Minimize storage and costs](tutorial-rag-build-solution-minimize-storage.md) tutorial, you revisit schema design to consider narrow data types, attribution, and vector configurations that offer more efficient. -->
68+
> [!NOTE]
69+
> Schema design affects storage and costs. This exercise is focused on schema fundamentals. In the [Minimize storage and costs](tutorial-rag-build-solution-minimize-storage.md) tutorial, you revisit schemas to learn how narrow data types, compression, and storage options significantly reduce the amount of storage used by vectors.
7270
7371
## Create an index for RAG workloads
7472

articles/search/tutorial-rag-build-solution-maximize-relevance.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -327,8 +327,7 @@ Semantic ranking and scoring profiles operate on nonvector content, but you can
327327
- analyzers and normalizers
328328
- advanced query formats (regular expressions, fuzzy search) -->
329329

330-
<!-- ## Next step
330+
## Next step
331331

332332
> [!div class="nextstepaction"]
333-
> [Reduce vector storage and costs](tutorial-rag-build-solution-minimize-storage.md)
334-
-->
333+
> [Minimize vector storage and costs](tutorial-rag-build-solution-minimize-storage.md)
Lines changed: 338 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,338 @@
1+
---
2+
title: 'RAG tutorial: Minimize storage and costs'
3+
titleSuffix: Azure AI Search
4+
description: Compress vectors using narrow data types and scalar quantization. Remove extra copies of stored vectors to further save on space.
5+
6+
manager: nitinme
7+
author: HeidiSteen
8+
ms.author: heidist
9+
ms.service: azure-ai-search
10+
ms.topic: tutorial
11+
ms.date: 12/05/2024
12+
13+
---
14+
15+
# Tutorial: Minimize storage and costs (RAG in Azure AI Search)
16+
17+
Azure AI Search offers several approaches for reducing the size of vector indexes. These approaches range from vector compression, to being more selective over what you store on your search service.
18+
19+
In this tutorial, you modify the existing search index to use:
20+
21+
> [!div class="checklist"]
22+
> - Narrow data types
23+
> - Scalar quantization
24+
> - Reduced storage by opting out of vectors in search results
25+
26+
This tutorial reprises the search index created by the [indexing pipeline](tutorial-rag-build-solution-pipeline.md). All of these updates affect the existing content, requiring you to rerun the indexer. However, instead of deleting the search index, you create a second one so that you can compare reductions in vector index size after adding the new capabilities.
27+
28+
Altogether, the techniques illustrated in this tutorial can reduce vector storage by about half.
29+
30+
The following screenshot compares the [first index](tutorial-rag-build-solution-pipeline.md) from a previous tutorial to the index built in this one.
31+
32+
:::image type="content" source="media/tutorial-rag-solution/side-by-side-comparison.png" lightbox="media/tutorial-rag-solution/side-by-side-comparison.png" alt-text="Screenshot of the original vector index with the index created using the schema in this tutorial.":::
33+
34+
## Prerequisites
35+
36+
This tutorial is essentially a rerun of the [indexing pipeline](tutorial-rag-build-solution-pipeline.md). You need all of the Azure resources and permissions described in that tutorial.
37+
38+
For comparison, you should have an existing *py-rag-tutorial-idx* index on your Azure AI Search service. It should be almost 2 MB in size, and the vector index portion should be 348 KB.
39+
40+
You should also have the following objects:
41+
42+
- py-rag-tutorial-ds (data source)
43+
44+
- py-rag-tutorial-ss (skillset)
45+
46+
## Download the sample
47+
48+
[Download a Jupyter notebook](https://github.com/Azure-Samples/azure-search-python-samples/blob/main/Tutorial-RAG/Tutorial-rag.ipynb) from GitHub to send the requests to Azure AI Search. For more information, see [Downloading files from GitHub](https://docs.github.com/get-started/start-your-journey/downloading-files-from-github).
49+
50+
## Update the index for reduced storage
51+
52+
Azure AI Search has multiple approaches for reducing vector size, which lowers the cost of vector workloads. In this step, create a new index that uses the following capabilities:
53+
54+
- Smaller vector indexes by compressing the vectors used during query execution. Scalar quantization provides this capability.
55+
56+
- Smaller vector indexes by opting out of vector storage for search results. If you only need vectors for queries and not in response payload, you can drop the vector copy used for search results.
57+
58+
- Smaller vector fields through narrow data types. You can specify `Collection(Edm.Half)` on the text_vector field to store incoming float32 dimensions as float16.
59+
60+
All of these capabilities are specified in a search index. After you load the index, compare the difference between the original index and the new one.
61+
62+
1. Name the new index `py-rag-tutorial-small-vectors-idx`.
63+
64+
1. Use the following definition for the new index. The difference between this schema and the previous schema updates in [Maximize relevance](tutorial-rag-build-solution-maximize-relevance.md) are new classes for scalar quantization and a new compressions section, a new data type (`Collection(Edm.Half)`) for the text_vector field, and a new property `stored` set to false.
65+
66+
```python
67+
from azure.identity import DefaultAzureCredential
68+
from azure.identity import get_bearer_token_provider
69+
from azure.search.documents.indexes import SearchIndexClient
70+
from azure.search.documents.indexes.models import (
71+
SearchField,
72+
SearchFieldDataType,
73+
VectorSearch,
74+
HnswAlgorithmConfiguration,
75+
VectorSearchProfile,
76+
AzureOpenAIVectorizer,
77+
AzureOpenAIVectorizerParameters,
78+
ScalarQuantizationCompression,
79+
ScalarQuantizationParameters,
80+
SearchIndex,
81+
SemanticConfiguration,
82+
SemanticPrioritizedFields,
83+
SemanticField,
84+
SemanticSearch,
85+
ScoringProfile,
86+
TagScoringFunction,
87+
TagScoringParameters
88+
)
89+
90+
credential = DefaultAzureCredential()
91+
92+
index_name = "py-rag-tutorial-small-vectors-idx"
93+
index_client = SearchIndexClient(endpoint=AZURE_SEARCH_SERVICE, credential=credential)
94+
fields = [
95+
SearchField(name="parent_id", type=SearchFieldDataType.String),
96+
SearchField(name="title", type=SearchFieldDataType.String),
97+
SearchField(name="locations", type=SearchFieldDataType.Collection(SearchFieldDataType.String), filterable=True),
98+
SearchField(name="chunk_id", type=SearchFieldDataType.String, key=True, sortable=True, filterable=True, facetable=True, analyzer_name="keyword"),
99+
SearchField(name="chunk", type=SearchFieldDataType.String, sortable=False, filterable=False, facetable=False),
100+
SearchField(name="text_vector", type="Collection(Edm.Half)", vector_search_dimensions=1024, vector_search_profile_name="myHnswProfile", stored= False)
101+
]
102+
103+
# Configure the vector search configuration
104+
vector_search = VectorSearch(
105+
algorithms=[
106+
HnswAlgorithmConfiguration(name="myHnsw"),
107+
],
108+
profiles=[
109+
VectorSearchProfile(
110+
name="myHnswProfile",
111+
algorithm_configuration_name="myHnsw",
112+
compression_name="myScalarQuantization",
113+
vectorizer_name="myOpenAI",
114+
)
115+
],
116+
vectorizers=[
117+
AzureOpenAIVectorizer(
118+
vectorizer_name="myOpenAI",
119+
kind="azureOpenAI",
120+
parameters=AzureOpenAIVectorizerParameters(
121+
resource_url=AZURE_OPENAI_ACCOUNT,
122+
deployment_name="text-embedding-3-large",
123+
model_name="text-embedding-3-large"
124+
),
125+
),
126+
],
127+
compressions=[
128+
ScalarQuantizationCompression(
129+
compression_name="myScalarQuantization",
130+
rerank_with_original_vectors=True,
131+
default_oversampling=10,
132+
parameters=ScalarQuantizationParameters(quantized_data_type="int8"),
133+
)
134+
]
135+
)
136+
137+
semantic_config = SemanticConfiguration(
138+
name="my-semantic-config",
139+
prioritized_fields=SemanticPrioritizedFields(
140+
title_field=SemanticField(field_name="title"),
141+
keywords_fields=[SemanticField(field_name="locations")],
142+
content_fields=[SemanticField(field_name="chunk")]
143+
)
144+
)
145+
146+
semantic_search = SemanticSearch(configurations=[semantic_config])
147+
148+
scoring_profiles = [
149+
ScoringProfile(
150+
name="my-scoring-profile",
151+
functions=[
152+
TagScoringFunction(
153+
field_name="locations",
154+
boost=5.0,
155+
parameters=TagScoringParameters(
156+
tags_parameter="tags",
157+
),
158+
)
159+
]
160+
)
161+
]
162+
163+
index = SearchIndex(name=index_name, fields=fields, vector_search=vector_search, semantic_search=semantic_search, scoring_profiles=scoring_profiles)
164+
result = index_client.create_or_update_index(index)
165+
print(f"{result.name} created")
166+
```
167+
168+
## Create or reuse the data source
169+
170+
Here's the definition of the data source from the previous tutorial. If you already have this data source on your search service, you can skip creating a new one.
171+
172+
```python
173+
from azure.search.documents.indexes import SearchIndexerClient
174+
from azure.search.documents.indexes.models import (
175+
SearchIndexerDataContainer,
176+
SearchIndexerDataSourceConnection
177+
)
178+
179+
# Create a data source
180+
indexer_client = SearchIndexerClient(endpoint=AZURE_SEARCH_SERVICE, credential=credential)
181+
container = SearchIndexerDataContainer(name="nasa-ebooks-pdfs-all")
182+
data_source_connection = SearchIndexerDataSourceConnection(
183+
name="py-rag-tutorial-ds",
184+
type="azureblob",
185+
connection_string=AZURE_STORAGE_CONNECTION,
186+
container=container
187+
)
188+
data_source = indexer_client.create_or_update_data_source_connection(data_source_connection)
189+
190+
print(f"Data source '{data_source.name}' created or updated")
191+
```
192+
193+
## Create or reuse the skillset
194+
195+
The skillset is also unchanged from the previous tutorial. Here it is again so that you can review it.
196+
197+
```python
198+
from azure.search.documents.indexes.models import (
199+
SplitSkill,
200+
InputFieldMappingEntry,
201+
OutputFieldMappingEntry,
202+
AzureOpenAIEmbeddingSkill,
203+
EntityRecognitionSkill,
204+
SearchIndexerIndexProjection,
205+
SearchIndexerIndexProjectionSelector,
206+
SearchIndexerIndexProjectionsParameters,
207+
IndexProjectionMode,
208+
SearchIndexerSkillset,
209+
CognitiveServicesAccountKey
210+
)
211+
212+
# Create a skillset
213+
skillset_name = "py-rag-tutorial-ss"
214+
215+
split_skill = SplitSkill(
216+
description="Split skill to chunk documents",
217+
text_split_mode="pages",
218+
context="/document",
219+
maximum_page_length=2000,
220+
page_overlap_length=500,
221+
inputs=[
222+
InputFieldMappingEntry(name="text", source="/document/content"),
223+
],
224+
outputs=[
225+
OutputFieldMappingEntry(name="textItems", target_name="pages")
226+
],
227+
)
228+
229+
embedding_skill = AzureOpenAIEmbeddingSkill(
230+
description="Skill to generate embeddings via Azure OpenAI",
231+
context="/document/pages/*",
232+
resource_url=AZURE_OPENAI_ACCOUNT,
233+
deployment_name="text-embedding-3-large",
234+
model_name="text-embedding-3-large",
235+
dimensions=1536,
236+
inputs=[
237+
InputFieldMappingEntry(name="text", source="/document/pages/*"),
238+
],
239+
outputs=[
240+
OutputFieldMappingEntry(name="embedding", target_name="text_vector")
241+
],
242+
)
243+
244+
entity_skill = EntityRecognitionSkill(
245+
description="Skill to recognize entities in text",
246+
context="/document/pages/*",
247+
categories=["Location"],
248+
default_language_code="en",
249+
inputs=[
250+
InputFieldMappingEntry(name="text", source="/document/pages/*")
251+
],
252+
outputs=[
253+
OutputFieldMappingEntry(name="locations", target_name="locations")
254+
]
255+
)
256+
257+
index_projections = SearchIndexerIndexProjection(
258+
selectors=[
259+
SearchIndexerIndexProjectionSelector(
260+
target_index_name=index_name,
261+
parent_key_field_name="parent_id",
262+
source_context="/document/pages/*",
263+
mappings=[
264+
InputFieldMappingEntry(name="chunk", source="/document/pages/*"),
265+
InputFieldMappingEntry(name="text_vector", source="/document/pages/*/text_vector"),
266+
InputFieldMappingEntry(name="locations", source="/document/pages/*/locations"),
267+
InputFieldMappingEntry(name="title", source="/document/metadata_storage_name"),
268+
],
269+
),
270+
],
271+
parameters=SearchIndexerIndexProjectionsParameters(
272+
projection_mode=IndexProjectionMode.SKIP_INDEXING_PARENT_DOCUMENTS
273+
),
274+
)
275+
276+
cognitive_services_account = CognitiveServicesAccountKey(key=AZURE_AI_MULTISERVICE_KEY)
277+
278+
skills = [split_skill, embedding_skill, entity_skill]
279+
280+
skillset = SearchIndexerSkillset(
281+
name=skillset_name,
282+
description="Skillset to chunk documents and generating embeddings",
283+
skills=skills,
284+
index_projection=index_projections,
285+
cognitive_services_account=cognitive_services_account
286+
)
287+
288+
client = SearchIndexerClient(endpoint=AZURE_SEARCH_SERVICE, credential=credential)
289+
client.create_or_update_skillset(skillset)
290+
print(f"{skillset.name} created")
291+
```
292+
293+
## Create a new indexer and load the index
294+
295+
Although you could reset and rerun the existing indexer using the new index, it's just as easy to create a new indexer. Having two indexes and indexers preserves the execution history and allows for closer comparisons.
296+
297+
This indexer is identical to the previous indexer, except that it specifies the new index from this tutorial.
298+
299+
```python
300+
from azure.search.documents.indexes.models import (
301+
SearchIndexer
302+
)
303+
304+
# Create an indexer
305+
indexer_name = "py-rag-tutorial-small-vectors-idxr"
306+
307+
indexer_parameters = None
308+
309+
indexer = SearchIndexer(
310+
name=indexer_name,
311+
description="Indexer to index documents and generate embeddings",
312+
target_index_name="py-rag-tutorial-small-vectors-idx",
313+
skillset_name="py-rag-tutorial-ss",
314+
data_source_name="py-rag-tutorial-ds",
315+
parameters=indexer_parameters
316+
)
317+
318+
# Create and run the indexer
319+
indexer_client = SearchIndexerClient(endpoint=AZURE_SEARCH_SERVICE, credential=credential)
320+
indexer_result = indexer_client.create_or_update_indexer(indexer)
321+
322+
print(f' {indexer_name} is created and running. Give the indexer a few minutes before running a query.')
323+
```
324+
325+
As a final step, switch to the Azure portal to compare the vector storage requirements for the two indexes. You should results similar to the following screenshot.
326+
327+
:::image type="content" source="media/tutorial-rag-solution/side-by-side-comparison.png" lightbox="media/tutorial-rag-solution/side-by-side-comparison.png" alt-text="Screenshot of the original vector index with the index created using the schema in this tutorial.":::
328+
329+
The index created in this tutorial uses half-precision floating-point numbers (float16) for the text vectors. This reduces the storage requirements for the vectors by half compared to the previous index that used single-precision floating-point numbers (float32). Scalar compression and the omission of one set of the vectors account for the remaining storage savings. For more information about reducing vector size, see [Choose an approach for optimizing vector storage and processing](vector-search-how-to-configure-compression-storage.md).
330+
331+
Consider revisiting the [queries from the previous tutorial](tutorial-rag-build-solution-query.md) so that you can compare query speed and utility. You should expect some variation in LLM output whenever you repeat a query, but in general the storage-saving techniques you implemented shouldn't degrade the quality of your search results.
332+
333+
## Next step
334+
335+
There are code samples in all of the Azure SDKs that provide Azure AI Search programmability. You can also review vector sample code for specific use cases and technology combinations.
336+
337+
> [!div class="nextstepaction"]
338+
> [azure-search-vector-samples](https://github.com/Azure/azure-search-vector-samples)

articles/search/tutorial-rag-build-solution-query.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ ms.date: 10/04/2024
1515

1616
# Tutorial: Search your data using a chat model (RAG in Azure AI Search)
1717

18-
The defining characteristic of a RAG solution on Azure AI Search is sending queries to a Large Language Model (LLM) and providing a conversational search experience over your indexed content. It can be surprisingly easy if you implement just the basics.
18+
The defining characteristic of a RAG solution on Azure AI Search is sending queries to a Large Language Model (LLM) for a conversational search experience over your indexed content. It can be surprisingly easy if you implement just the basics.
1919

2020
In this tutorial, you:
2121

0 commit comments

Comments
 (0)