You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/search/tutorial-rag-build-solution-index-schema.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -41,7 +41,7 @@ When LLMs generate a response, they operate on chunks of content for message inp
41
41
42
42
Chunks are the focus of the schema, and each chunk is the defining element of a search document in a RAG pattern. You can think of your index as a large collection of chunks, as opposed to traditional search documents that probably have more structure, such as fields containing uniform content for a name, descriptions, categories, and addresses.
43
43
44
-
### Content-aware
44
+
### Content centricity and structured data
45
45
46
46
In addition to structural considerations, like chunked content, you also want to consider the substance of your content because it also informs what fields are indexed.
Copy file name to clipboardExpand all lines: articles/search/tutorial-rag-build-solution-pipeline.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -95,7 +95,7 @@ print(f"{result.name} created")
95
95
96
96
## Create a data source connection
97
97
98
-
In this step, set up a connection to Azure Blob Storage. The indexer retrieves PDFs from a container. You can create the container and upload files in the Azure portal.
98
+
In this step, set up the sample data and a connection to Azure Blob Storage. The indexer retrieves PDFs from a container. You create the container and upload files in this step.
99
99
100
100
1. Sign in to the Azure portal and find your Azure Storage account.
Copy file name to clipboardExpand all lines: articles/search/tutorial-rag-build-solution-query.md
+18-8Lines changed: 18 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -36,11 +36,11 @@ This tutorial builds on the previous tutorials. It assumes you have a search ind
36
36
37
37
## Download the sample
38
38
39
-
You use the same notebook from the previous indexing pipeline tutorial. Scripts for querying the LLM follow the pipeline steps. If you don't already have the notebook, [download it](https://github.com/Azure-Samples/azure-search-python-samples/blob/main/Tutorial-RAG/Tutorial-rag.ipynb) from GitHub.
39
+
You use the same notebook from the previous indexing pipeline tutorial. Scripts for querying the LLM follow the pipeline creation steps. If you don't already have the notebook, [download it](https://github.com/Azure-Samples/azure-search-python-samples/blob/main/Tutorial-RAG/Tutorial-rag.ipynb) from GitHub.
40
40
41
41
## Configure clients for sending queries
42
42
43
-
The RAG pattern in Azure AI Search is a synchronized connection to a search index to obtain the grounding data, followed by a connection to an LLM to formulate a response to the user's question. The same query string is used by both clients.
43
+
The RAG pattern in Azure AI Search is a synchronized series of connections to a search index to obtain the grounding data, followed by a connection to an LLM to formulate a response to the user's question. The same query string is used by both clients.
44
44
45
45
You're setting up two clients, so you need permissions on both resources. We use API keys for this exercise. The following endpoints and keys are used for queries:
Here's the Python script that instantiates the clients, defines the prompt, and sets up the query. You can run this script in the notebook to generate a response from your chat model deployment.
In this example, the answer is based on a single input (`top=1`) consisting of the one chunk determined by the search engine to be the most relevant. Instructions in the prompt tell the LLM to use only the information in the `sources`, or formatted search results. Results from the first query`"how much of earth is covered by water"` should look similar to the following example.
120
+
In this response, the answer is based on a single input (`top=1`) consisting of the one chunk determined by the search engine to be the most relevant. Instructions in the prompt tell the LLM to use only the information in the `sources`, or formatted search results.
121
+
122
+
Results from the first query`"how much of earth is covered by water"` should look similar to the following example.
119
123
120
124
:::image type="content" source="media/tutorial-rag-solution/chat-results-1.png" alt-text="Screenshot of an LLM response to a simple question using a single match from search results.":::
121
125
122
126
### Changing the inputs
123
127
124
-
Increasing or decreasing the number of inputs to the LLM can have a large effect on the response. Try running the same query again after setting `top=3`. When you increase the inputs, the model returns different results each time, even if the query doesn't change. Here's one example of what the model returns after increasing the inputs to 3.
128
+
Increasing or decreasing the number of inputs to the LLM can have a large effect on the response. Try running the same query again after setting `top=3`. When you increase the inputs, the model returns different results each time, even if the query doesn't change.
129
+
130
+
Here's one example of what the model returns after increasing the inputs to 3.
125
131
126
132
:::image type="content" source="media/tutorial-rag-solution/chat-results-2.png" alt-text="Screenshot of an LLM response to a simple question using a larger result set.":::
127
133
128
-
Because the model is bound to just the grounding data, the answer is larger also more vague. You can use relevance tuning to potentially generate more focused answers.
134
+
Because the model is bound to just the grounding data, the answer becomes more expansive as you increase size of the input. You can use relevance tuning to potentially generate more focused answers.
129
135
130
136
### Changing the prompt
131
137
132
-
You can also change the prompt to control the format of the output, tone, and whether you want the model to supplement the answer with its own training data by changing the prompt. Here's another example of LLM output if we refocus the prompt.
138
+
You can also change the prompt to control the format of the output, tone, and whether you want the model to supplement the answer with its own training data by changing the prompt.
139
+
140
+
Here's another example of LLM output if we refocus the prompt.
133
141
134
142
```python
135
143
# Provide instructions to the model
@@ -149,15 +157,17 @@ Output from changing just the prompt, retaining `top=3` from the previous query,
149
157
150
158
:::image type="content" source="media/tutorial-rag-solution/chat-results-3.png" alt-text="Screenshot of an LLM response to a change in prompt composition.":::
151
159
152
-
In this tutorial, assessing the quality of the answer is subjective, but since the model is working with the same results as the previous query, the answer feels incomplete given the body of content available. Let's try the request one last time, increasing `top=10`.
160
+
In this tutorial, assessing the quality of the answer is subjective, but since the model is working with the same results as the previous query, the answer feels less focused, and some bullets seem only tangential to a question about the surface area of water on earth. Let's try the request one last time, increasing `top=10`.
153
161
154
162
:::image type="content" source="media/tutorial-rag-solution/chat-results-4.png" alt-text="Screenshot of an LLM response to a simple question using top set to 10.":::
155
163
156
164
There are several observations to note:
157
165
158
166
- Raising the `top` value can exhaust available quota on the model. If there's no quota, an error message is returned.
159
167
160
-
- Improving the relevance of the search results from Azure AI Search is the most effective approach for maximizing the utility of your LLM.
168
+
- Raising the `top` value doesn't necessarily improve the outcome. The answer isn't the same as `top=3`, but it's similar. This observation underscores an important point that might be counter-intuitive to expections. Throwing more content at an LLM doesn't always yield better results.
169
+
170
+
- So what might help? Typically, the answer is relevance tuning. Improving the relevance of the search results from Azure AI Search is usually the most effective approach for maximizing the utility of your LLM.
161
171
162
172
In the next series of tutorials, the focus shifts to maximizing relevance and optimizing query performance for speed and concision. We revisit the schema definition and query logic to implement relevance features, but the rest of the pipeline and models remain intact.
0 commit comments