You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/search/tutorial-rag-build-solution-app.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,7 +12,7 @@ ms.date: 09/12/2024
12
12
13
13
---
14
14
15
-
# Deployment checklist for next-level testing (RAG tutorial - Azure AI Search)
15
+
# Tutorial: Deployment checklist for next-level testing (RAG in Azure AI Search)
16
16
17
17
In this lesson, review options for setting up simple web front-end for a RAG prototype. A simple app is useful for scenario testing with users and stakeholders. This lesson also provides a deployment checklist for broader distribution.
# Design an index (RAG tutorial - Azure AI Search)
15
+
# Tutorial: Design an index (RAG in Azure AI Search)
16
16
17
17
An index contains searchable text and vector content, plus configurations. In a RAG pattern that uses a chat model for responses, you want an index that contains chunks of content that can be passed to an LLM at query time.
18
18
@@ -23,7 +23,7 @@ In this tutorial, you:
23
23
> - Create an index that accommodate vectors and hybrid queries
24
24
> - Add vector profiles and configurations
25
25
> - Add structured data
26
-
> - Add filters
26
+
> - Add filtering
27
27
28
28
## Prerequisites
29
29
@@ -37,42 +37,49 @@ In conversational search, LLMs compose the response that the user sees, not the
37
37
38
38
### Focus on chunks
39
39
40
-
To generate a response, LLMs operate on chunks of content, and while they need to know where the chunk came from for citation purposes, what matters most is the quality of message inputs and its relevance to the user's question. Whether the chunks come from one document or a thousand, the LLM ingests the information or *grounding data*, and formulates the response using instructions provided in a system prompt.
40
+
When LLMs generate a response, they operate on chunks of content for message inputs, and while they need to know where the chunk came from for citation purposes, what matters most is the quality of message inputs and its relevance to the user's question. Whether the chunks come from one document or a thousand, the LLM ingests the information or *grounding data*, and formulates the response using instructions provided in a system prompt.
41
41
42
-
Chunks are the focus of the schema, and each chunk is the definitive element of a search document in a RAG pattern. You can think of your index as a large collection of chunks, as opposed to traditional search documents that have more structure and fields containing uniform content for a name field, versus a description field, versus a category field.
42
+
Chunks are the focus of the schema, and each chunk is the defining element of a search document in a RAG pattern. You can think of your index as a large collection of chunks, as opposed to traditional search documents that probably have more structure, such as fields containing uniform content for a name, descriptions, categories, and addresses.
43
43
44
-
A minimal index for LLM is designed to store chunks of content. It includes vector fields if you want similarity search for highly relevant results, and nonvector fields for human-readable inputs to the LLM for conversational search. Nonvector chunked content in the search results becomes the grounding data sent to the LLM.
44
+
### Focus on content
45
45
46
-
### Checklist of schema considerations
46
+
In addition to structural considerations, like chunked content, you also want to consider the substance of your content because it also informs what fields are indexed.
47
47
48
-
An index that works best for RAG workloads has these qualities:
48
+
In this tutorial, we use PDFs and content from the NASA Earth Book. This content is descriptive and informative, with numerous references to geographies, countries, and areas across the world. To capture this information in our index and potentially use it in queries, we can include skills in our indexing pipeline that recognize and extract this information, loading it into a searchable and filterable `locations` field.
49
49
50
-
- Returns chunks that are relevant to the query and readable to the LLM. LLMs can handle a certain level of dirty data in chunks, such as mark up, redundancy, and incomplete strings. While chunks need to be readable and relevant to the query, they don't need to be pristine.
50
+
The original ebook is large, over 100 pages and 35 MB in size. We broke it up into smaller PDFs, one per page of text, to stay under the REST API payload limit of 16 MB per API call.
51
51
52
-
- Maintains a parent-child relationship between chunks of a document and the properties of the parent document, such as the file name, file type, title, author, and so forth. To answer a query, chunks could be pulled from anywhere in the index. Association with the parent document providing the chunk is useful for context, citations, and follow up queries.
52
+
For simplicity, we omit image vectorization for this exercise.
53
53
54
-
- Accommodates the queries you want create. You should have fields for vector and hybrid content, and those fields should be attributed to support specific query behaviors. You can only query one index at a time (no joins) so your fields collection should define all of your searchable content.
54
+
### Focus on parent-child indexes
55
55
56
-
- Your schema should be flat (no complex types or structures). This requirement is specific to the RAG pattern in Azure AI Search.
56
+
Chunked content typically derives from a larger document. And although the schema is organized around chunks, you also want to capture properties and content at the parent level. Examples of these properties might include the parent file path, title, authors, publication date, summary.
57
57
58
-
Although Azure AI Search can't join indexes, you can create indexes that preserve parent-child relationship, and then use sequential or parallel queries in your search logic to pull from both. This exercise includes templates for parent-child elements in the same index and in separate indexes, where information from the parent index is retrieved using a look up query.
58
+
An inflection point in schema design is whether to have two indexes for parentand child/chunked content, or a single index that repeats parent elements for each chunk.
59
59
60
-
Schema design affects storage and costs. This exercise is focused on schema fundamentals. In the [Minimize storage and costs](tutorial-rag-build-solution-optimize.md) tutorial, we revisit schema design to consider narrow data types, attribution, and vector configurations that are more efficient.
60
+
In this tutorial, because all of the chunks of text originate from a single parent (NASA Earth Book), you don't need a separate index dedicated to up level parent fields. If you index from multiple parent PDFs, you might want a parent-child index pair to capture level-specific fields and then send lookup queries to the parent index to retrieve those fields relevant to each chunk. We include an example of that parent-child index template in this exercise for comparison.
61
61
62
-
### Sample content for this tutorial
62
+
### Checklist of schema considerations
63
63
64
-
The content you're indexing informs what fields are in the index.
64
+
In Azure AI Search, an index that works best for RAG workloads has these qualities:
65
65
66
-
In this tutorial, we use PDFs and content from the NASA Earth at Night ebook. The original ebook is large, over 100 pages and 35 MB in size. We broke it up into smaller PDFs, one per page of text, to stay under the REST API payload limit of 16 MB per API call.
66
+
- Returns chunks that are relevant to the query and readable to the LLM. LLMs can handle a certain level of dirty data in chunks, such as mark up, redundancy, and incomplete strings. While chunks need to be readable and relevant to the question, they don't need to be pristine.
67
67
68
-
We omit image vectorization for this exercise.
68
+
- Maintains a parent-child relationship between chunks of a document and the properties of the parent document, such as the file name, file type, title, author, and so forth. To answer a query, chunks could be pulled from anywhere in the index. Association with the parent document providing the chunk is useful for context, citations, and follow up queries.
69
+
70
+
- Accommodates the queries you want create. You should have fields for vector and hybrid content, and those fields should be attributed to support specific query behaviors. You can only query one index at a time (no joins) so your fields collection should define all of your searchable content.
71
+
72
+
- Your schema should be flat (no complex types or structures). This requirement is specific to the RAG pattern in Azure AI Search.
69
73
70
-
The sample content is descriptive and informative. It also mentions places, regions, and countries across the world. We can include skills in our indexing pipeline that extracts this information and loads it into a queryable and filterable `locations` field.
74
+
Although Azure AI Search can't join indexes, you can create indexes that preserve parent-child relationship, and then use sequential or parallel queries in your search logic to pull from both. This exercise includes templates for parent-child elements in the same index and in separate indexes, where information from the parent index is retrieved using a lookup query.
71
75
72
-
Because all of the chunks of text originate from the same parent (Earth at Night ebook), we don't need a separate index dedicated to parent fields. If we were indexing from multiple parent PDFs, we would want a parent-child index pair to capture PDF-specific fields (path, title, authors, publication date, summary) and then send look up queries to the parent index to retrieve those fields relevant to each chunk. We include an example of that parent-child index template in this exercise for comparison.
76
+
> [!NOTE]
77
+
> Schema design affects storage and costs. This exercise is focused on schema fundamentals. In the [Minimize storage and costs](tutorial-rag-build-solution-minimize-storage.md) tutorial, you revisit schema design to consider narrow data types, attribution, and vector configurations that offer more efficient.
73
78
74
79
## Create a basic index
75
80
81
+
A minimal index for LLM is designed to store chunks of content. It includes vector fields if you want similarity search for highly relevant results, and nonvector fields for human-readable inputs to the LLM for conversational search. Nonvector chunked content in the search results becomes the grounding data sent to the LLM.
82
+
76
83
1. Open Visual Studio Code and create a new file. It doesn't have to be a Python file type for this exercise.
77
84
78
85
1. Here's a minimal index definition for RAG solutions that support vector and hybrid search. Review it for an introduction to required elements: name, fields, and a `vectorSearch` configuration for the vector fields.
@@ -101,7 +108,121 @@ Because all of the chunks of text originate from the same parent (Earth at Night
101
108
102
109
Vector fields have [specific types](/rest/api/searchservice/supported-data-types#edm-data-types-for-vector-fields) and extra attributes for embedding model dimensions and configuration. `Edm.Single` is a data type that works for the more commonly used LLMs. For more information about vector fields, see [Create a vector index](vector-search-how-to-create-index.md).
103
110
104
-
1. Here's the index schema for the tutorial and the NASA ebook content. It's similar to the basic schema, but adds a parent ID and metadata. It also includes fields for storing generated content that's created in the indexing pipeline.
111
+
1. Here's the index schema for the tutorial and the Earth Book content. It's similar to the basic schema, but adds a parent ID, metadata (`title`), strings (`chunks`), and vectors for similarity search (`text_vectors`). It also includes a `locations` field for storing generated content that's created in the [indexing pipeline](tutorial-rag-build-solution-pipeline.md).
ps 1: We have another physical resource limit for our services: vector index size. HNSW requires vector indices to reside entirely in memory. "Vector index size" is our customer-facing resource limit that governs the memory consumed by their vector data. (and this is a big reason why the beefiest VMs have 512 GB of RAM). Increasing partitions also increases the amount of vector quota for customers as well.
133
-
134
253
ps 2: A richer index has more fields and configurations, and is often better because extra fields support richer queries and more opportunities for relevance tuning. Filters and scoring profiles for boosting apply to nonvector fields. If you have content that should be matched precisely and not similarly, such as a name or employee number, then create fields to contain that information.*
135
254
136
255
## BLOCKED: Index for hybrid queries and relevance tuning
Copy file name to clipboardExpand all lines: articles/search/tutorial-rag-build-solution-maximize-relevance.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,7 +12,7 @@ ms.date: 09/12/2024
12
12
13
13
---
14
14
15
-
# Maximize relevance (RAG tutorial - Azure AI Search)
15
+
# Tutorial: Maximize relevance (RAG in Azure AI Search)
16
16
17
17
In this tutorial, learn how to improve the relevance of search results used in RAG solutions. Azure AI Search includes a broad range of relevance tuning capabilities. Learn which ones are best for solving specific problems.
18
18
@@ -36,4 +36,4 @@ Key points:
36
36
## Next step
37
37
38
38
> [!div class="nextstepaction"]
39
-
> [Reduce vector storage and costs](tutorial-rag-build-solution-optimize.md)
39
+
> [Reduce vector storage and costs](tutorial-rag-build-solution-minimize-storage.md)
Copy file name to clipboardExpand all lines: articles/search/tutorial-rag-build-solution-minimize-storage.md
+3-1Lines changed: 3 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,7 +12,7 @@ ms.date: 09/12/2024
12
12
13
13
---
14
14
15
-
# Minimize storage and costs using vector compression and narrow data types (RAG tutorial - Azure AI Search)
15
+
# Tutorial: Minimize storage and costs using vector compression and narrow data types (RAG in Azure AI Search)
16
16
17
17
In this tutorial, learn the techniques for reducing index size, with a focus on vector compression and storage.
18
18
@@ -24,6 +24,8 @@ Key points:
24
24
- narrow data types
25
25
- hnsw vs eknn, does hnsw product a smaller footprint?
26
26
27
+
<!-- ps 1: We have another physical resource limit for our services: vector index size. HNSW requires vector indices to reside entirely in memory. "Vector index size" is our customer-facing resource limit that governs the memory consumed by their vector data. (and this is a big reason why the beefiest VMs have 512 GB of RAM). Increasing partitions also increases the amount of vector quota for customers as well. -->
0 commit comments