Skip to content

Commit 9756ee8

Browse files
committed
checkpoint 9/9
1 parent 1888160 commit 9756ee8

8 files changed

+163
-42
lines changed

articles/search/tutorial-rag-build-solution-app.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ ms.date: 09/12/2024
1212

1313
---
1414

15-
# Deployment checklist for next-level testing (RAG tutorial - Azure AI Search)
15+
# Tutorial: Deployment checklist for next-level testing (RAG in Azure AI Search)
1616

1717
In this lesson, review options for setting up simple web front-end for a RAG prototype. A simple app is useful for scenario testing with users and stakeholders. This lesson also provides a deployment checklist for broader distribution.
1818

articles/search/tutorial-rag-build-solution-index-schema.md

Lines changed: 141 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ ms.date: 09/12/2024
1212

1313
---
1414

15-
# Design an index (RAG tutorial - Azure AI Search)
15+
# Tutorial: Design an index (RAG in Azure AI Search)
1616

1717
An index contains searchable text and vector content, plus configurations. In a RAG pattern that uses a chat model for responses, you want an index that contains chunks of content that can be passed to an LLM at query time.
1818

@@ -23,7 +23,7 @@ In this tutorial, you:
2323
> - Create an index that accommodate vectors and hybrid queries
2424
> - Add vector profiles and configurations
2525
> - Add structured data
26-
> - Add filters
26+
> - Add filtering
2727
2828
## Prerequisites
2929

@@ -37,42 +37,49 @@ In conversational search, LLMs compose the response that the user sees, not the
3737

3838
### Focus on chunks
3939

40-
To generate a response, LLMs operate on chunks of content, and while they need to know where the chunk came from for citation purposes, what matters most is the quality of message inputs and its relevance to the user's question. Whether the chunks come from one document or a thousand, the LLM ingests the information or *grounding data*, and formulates the response using instructions provided in a system prompt.
40+
When LLMs generate a response, they operate on chunks of content for message inputs, and while they need to know where the chunk came from for citation purposes, what matters most is the quality of message inputs and its relevance to the user's question. Whether the chunks come from one document or a thousand, the LLM ingests the information or *grounding data*, and formulates the response using instructions provided in a system prompt.
4141

42-
Chunks are the focus of the schema, and each chunk is the definitive element of a search document in a RAG pattern. You can think of your index as a large collection of chunks, as opposed to traditional search documents that have more structure and fields containing uniform content for a name field, versus a description field, versus a category field.
42+
Chunks are the focus of the schema, and each chunk is the defining element of a search document in a RAG pattern. You can think of your index as a large collection of chunks, as opposed to traditional search documents that probably have more structure, such as fields containing uniform content for a name, descriptions, categories, and addresses.
4343

44-
A minimal index for LLM is designed to store chunks of content. It includes vector fields if you want similarity search for highly relevant results, and nonvector fields for human-readable inputs to the LLM for conversational search. Nonvector chunked content in the search results becomes the grounding data sent to the LLM.
44+
### Focus on content
4545

46-
### Checklist of schema considerations
46+
In addition to structural considerations, like chunked content, you also want to consider the substance of your content because it also informs what fields are indexed.
4747

48-
An index that works best for RAG workloads has these qualities:
48+
In this tutorial, we use PDFs and content from the NASA Earth Book. This content is descriptive and informative, with numerous references to geographies, countries, and areas across the world. To capture this information in our index and potentially use it in queries, we can include skills in our indexing pipeline that recognize and extract this information, loading it into a searchable and filterable `locations` field.
4949

50-
- Returns chunks that are relevant to the query and readable to the LLM. LLMs can handle a certain level of dirty data in chunks, such as mark up, redundancy, and incomplete strings. While chunks need to be readable and relevant to the query, they don't need to be pristine.
50+
The original ebook is large, over 100 pages and 35 MB in size. We broke it up into smaller PDFs, one per page of text, to stay under the REST API payload limit of 16 MB per API call.
5151

52-
- Maintains a parent-child relationship between chunks of a document and the properties of the parent document, such as the file name, file type, title, author, and so forth. To answer a query, chunks could be pulled from anywhere in the index. Association with the parent document providing the chunk is useful for context, citations, and follow up queries.
52+
For simplicity, we omit image vectorization for this exercise.
5353

54-
- Accommodates the queries you want create. You should have fields for vector and hybrid content, and those fields should be attributed to support specific query behaviors. You can only query one index at a time (no joins) so your fields collection should define all of your searchable content.
54+
### Focus on parent-child indexes
5555

56-
- Your schema should be flat (no complex types or structures). This requirement is specific to the RAG pattern in Azure AI Search.
56+
Chunked content typically derives from a larger document. And although the schema is organized around chunks, you also want to capture properties and content at the parent level. Examples of these properties might include the parent file path, title, authors, publication date, summary.
5757

58-
Although Azure AI Search can't join indexes, you can create indexes that preserve parent-child relationship, and then use sequential or parallel queries in your search logic to pull from both. This exercise includes templates for parent-child elements in the same index and in separate indexes, where information from the parent index is retrieved using a look up query.
58+
An inflection point in schema design is whether to have two indexes for parent and child/chunked content, or a single index that repeats parent elements for each chunk.
5959

60-
Schema design affects storage and costs. This exercise is focused on schema fundamentals. In the [Minimize storage and costs](tutorial-rag-build-solution-optimize.md) tutorial, we revisit schema design to consider narrow data types, attribution, and vector configurations that are more efficient.
60+
In this tutorial, because all of the chunks of text originate from a single parent (NASA Earth Book), you don't need a separate index dedicated to up level parent fields. If you index from multiple parent PDFs, you might want a parent-child index pair to capture level-specific fields and then send lookup queries to the parent index to retrieve those fields relevant to each chunk. We include an example of that parent-child index template in this exercise for comparison.
6161

62-
### Sample content for this tutorial
62+
### Checklist of schema considerations
6363

64-
The content you're indexing informs what fields are in the index.
64+
In Azure AI Search, an index that works best for RAG workloads has these qualities:
6565

66-
In this tutorial, we use PDFs and content from the NASA Earth at Night ebook. The original ebook is large, over 100 pages and 35 MB in size. We broke it up into smaller PDFs, one per page of text, to stay under the REST API payload limit of 16 MB per API call.
66+
- Returns chunks that are relevant to the query and readable to the LLM. LLMs can handle a certain level of dirty data in chunks, such as mark up, redundancy, and incomplete strings. While chunks need to be readable and relevant to the question, they don't need to be pristine.
6767

68-
We omit image vectorization for this exercise.
68+
- Maintains a parent-child relationship between chunks of a document and the properties of the parent document, such as the file name, file type, title, author, and so forth. To answer a query, chunks could be pulled from anywhere in the index. Association with the parent document providing the chunk is useful for context, citations, and follow up queries.
69+
70+
- Accommodates the queries you want create. You should have fields for vector and hybrid content, and those fields should be attributed to support specific query behaviors. You can only query one index at a time (no joins) so your fields collection should define all of your searchable content.
71+
72+
- Your schema should be flat (no complex types or structures). This requirement is specific to the RAG pattern in Azure AI Search.
6973

70-
The sample content is descriptive and informative. It also mentions places, regions, and countries across the world. We can include skills in our indexing pipeline that extracts this information and loads it into a queryable and filterable `locations` field.
74+
Although Azure AI Search can't join indexes, you can create indexes that preserve parent-child relationship, and then use sequential or parallel queries in your search logic to pull from both. This exercise includes templates for parent-child elements in the same index and in separate indexes, where information from the parent index is retrieved using a lookup query.
7175

72-
Because all of the chunks of text originate from the same parent (Earth at Night ebook), we don't need a separate index dedicated to parent fields. If we were indexing from multiple parent PDFs, we would want a parent-child index pair to capture PDF-specific fields (path, title, authors, publication date, summary) and then send look up queries to the parent index to retrieve those fields relevant to each chunk. We include an example of that parent-child index template in this exercise for comparison.
76+
> [!NOTE]
77+
> Schema design affects storage and costs. This exercise is focused on schema fundamentals. In the [Minimize storage and costs](tutorial-rag-build-solution-minimize-storage.md) tutorial, you revisit schema design to consider narrow data types, attribution, and vector configurations that offer more efficient.
7378
7479
## Create a basic index
7580

81+
A minimal index for LLM is designed to store chunks of content. It includes vector fields if you want similarity search for highly relevant results, and nonvector fields for human-readable inputs to the LLM for conversational search. Nonvector chunked content in the search results becomes the grounding data sent to the LLM.
82+
7683
1. Open Visual Studio Code and create a new file. It doesn't have to be a Python file type for this exercise.
7784

7885
1. Here's a minimal index definition for RAG solutions that support vector and hybrid search. Review it for an introduction to required elements: name, fields, and a `vectorSearch` configuration for the vector fields.
@@ -101,7 +108,121 @@ Because all of the chunks of text originate from the same parent (Earth at Night
101108

102109
Vector fields have [specific types](/rest/api/searchservice/supported-data-types#edm-data-types-for-vector-fields) and extra attributes for embedding model dimensions and configuration. `Edm.Single` is a data type that works for the more commonly used LLMs. For more information about vector fields, see [Create a vector index](vector-search-how-to-create-index.md).
103110

104-
1. Here's the index schema for the tutorial and the NASA ebook content. It's similar to the basic schema, but adds a parent ID and metadata. It also includes fields for storing generated content that's created in the indexing pipeline.
111+
1. Here's the index schema for the tutorial and the Earth Book content. It's similar to the basic schema, but adds a parent ID, metadata (`title`), strings (`chunks`), and vectors for similarity search (`text_vectors`). It also includes a `locations` field for storing generated content that's created in the [indexing pipeline](tutorial-rag-build-solution-pipeline.md).
112+
113+
```json
114+
{
115+
"name": "rag-tutorial-earth-book",
116+
"defaultScoringProfile": null,
117+
"fields": [
118+
{
119+
"name": "chunk_id",
120+
"type": "Edm.String",
121+
"key": true,
122+
"searchable": true,
123+
"filterable": true,
124+
"retrievable": true,
125+
"stored": true,
126+
"sortable": true,
127+
"facetable": true,
128+
"analyzer": "keyword",
129+
},
130+
{
131+
"name": "parent_id",
132+
"type": "Edm.String",
133+
"searchable": true,
134+
"filterable": true,
135+
"retrievable": true,
136+
"stored": true,
137+
"sortable": true,
138+
"facetable": true,
139+
"analyzer": null,
140+
},
141+
{
142+
"name": "chunk",
143+
"type": "Edm.String",
144+
"searchable": true,
145+
"filterable": false,
146+
"retrievable": true,
147+
"stored": true,
148+
"sortable": false,
149+
"facetable": false,
150+
"analyzer": null,
151+
},
152+
{
153+
"name": "title",
154+
"type": "Edm.String",
155+
"searchable": true,
156+
"filterable": true,
157+
"retrievable": true,
158+
"stored": true,
159+
"sortable": false,
160+
"facetable": false,
161+
"analyzer": null,
162+
},
163+
{
164+
"name": "text_vector",
165+
"type": "Collection(Edm.Single)",
166+
"searchable": true,
167+
"retrievable": true,
168+
"stored": true,
169+
"dimensions": 1536,
170+
"vectorSearchProfile": "rag-tutorial-earth-book-azureOpenAi-text-profile",
171+
"vectorEncoding": null,
172+
},
173+
{
174+
"name": "locations",
175+
"type": "Collection(Edm.String)",
176+
"searchable": true,
177+
"filterable": true,
178+
"retrievable": true,
179+
"stored": true,
180+
"sortable": false,
181+
"facetable": false,
182+
"analyzer": "standard.lucene",
183+
}
184+
],
185+
"vectorSearch": {
186+
"algorithms": [
187+
{
188+
"name": "rag-tutorial-earth-book-algorithm",
189+
"kind": "hnsw",
190+
"hnswParameters": {
191+
"metric": "cosine",
192+
"m": 4,
193+
"efConstruction": 400,
194+
"efSearch": 500
195+
},
196+
"exhaustiveKnnParameters": null
197+
}
198+
],
199+
"profiles": [
200+
{
201+
"name": "rag-tutorial-earth-book-azureOpenAi-text-profile",
202+
"algorithm": "rag-tutorial-earth-book-algorithm",
203+
"vectorizer": "rag-tutorial-earth-book-azureOpenAi-text-vectorizer",
204+
"compression": null
205+
}
206+
],
207+
"vectorizers": [
208+
{
209+
"name": "rag-tutorial-earth-book-azureOpenAi-text-vectorizer",
210+
"kind": "azureOpenAI",
211+
"azureOpenAIParameters": {
212+
"resourceUri": "https://heidistazureopenaieastus.openai.azure.com",
213+
"deploymentId": "text-embedding-ada-002",
214+
"apiKey": null,
215+
"modelName": "text-embedding-ada-002",
216+
"authIdentity": null
217+
},
218+
"customWebApiParameters": null,
219+
"aiServicesVisionParameters": null,
220+
"amlParameters": null
221+
}
222+
],
223+
"compressions": []
224+
}
225+
}
105226

106227

107228
<!-- Objective:
@@ -129,8 +250,6 @@ Tasks:
129250

130251
<!--
131252

132-
ps 1: We have another physical resource limit for our services: vector index size. HNSW requires vector indices to reside entirely in memory. "Vector index size" is our customer-facing resource limit that governs the memory consumed by their vector data. (and this is a big reason why the beefiest VMs have 512 GB of RAM). Increasing partitions also increases the amount of vector quota for customers as well.
133-
134253
ps 2: A richer index has more fields and configurations, and is often better because extra fields support richer queries and more opportunities for relevance tuning. Filters and scoring profiles for boosting apply to nonvector fields. If you have content that should be matched precisely and not similarly, such as a name or employee number, then create fields to contain that information.*
135254

136255
## BLOCKED: Index for hybrid queries and relevance tuning

articles/search/tutorial-rag-build-solution-maximize-relevance.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ ms.date: 09/12/2024
1212

1313
---
1414

15-
# Maximize relevance (RAG tutorial - Azure AI Search)
15+
# Tutorial: Maximize relevance (RAG in Azure AI Search)
1616

1717
In this tutorial, learn how to improve the relevance of search results used in RAG solutions. Azure AI Search includes a broad range of relevance tuning capabilities. Learn which ones are best for solving specific problems.
1818

@@ -36,4 +36,4 @@ Key points:
3636
## Next step
3737

3838
> [!div class="nextstepaction"]
39-
> [Reduce vector storage and costs](tutorial-rag-build-solution-optimize.md)
39+
> [Reduce vector storage and costs](tutorial-rag-build-solution-minimize-storage.md)

articles/search/tutorial-rag-build-solution-minimize-storage.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ ms.date: 09/12/2024
1212

1313
---
1414

15-
# Minimize storage and costs using vector compression and narrow data types (RAG tutorial - Azure AI Search)
15+
# Tutorial: Minimize storage and costs using vector compression and narrow data types (RAG in Azure AI Search)
1616

1717
In this tutorial, learn the techniques for reducing index size, with a focus on vector compression and storage.
1818

@@ -24,6 +24,8 @@ Key points:
2424
- narrow data types
2525
- hnsw vs eknn, does hnsw product a smaller footprint?
2626

27+
<!-- ps 1: We have another physical resource limit for our services: vector index size. HNSW requires vector indices to reside entirely in memory. "Vector index size" is our customer-facing resource limit that governs the memory consumed by their vector data. (and this is a big reason why the beefiest VMs have 512 GB of RAM). Increasing partitions also increases the amount of vector quota for customers as well. -->
28+
2729
## Next step
2830

2931
> [!div class="nextstepaction"]

0 commit comments

Comments
 (0)