Skip to content

Commit e4d306f

Browse files
committed
resized images, edits to query doc
1 parent fb10a26 commit e4d306f

File tree

4 files changed

+20
-10
lines changed

4 files changed

+20
-10
lines changed
27.4 KB
Loading

articles/search/tutorial-rag-build-solution-index-schema.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ When LLMs generate a response, they operate on chunks of content for message inp
4141

4242
Chunks are the focus of the schema, and each chunk is the defining element of a search document in a RAG pattern. You can think of your index as a large collection of chunks, as opposed to traditional search documents that probably have more structure, such as fields containing uniform content for a name, descriptions, categories, and addresses.
4343

44-
### Content-aware
44+
### Content centricity and structured data
4545

4646
In addition to structural considerations, like chunked content, you also want to consider the substance of your content because it also informs what fields are indexed.
4747

articles/search/tutorial-rag-build-solution-pipeline.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -95,7 +95,7 @@ print(f"{result.name} created")
9595

9696
## Create a data source connection
9797

98-
In this step, set up a connection to Azure Blob Storage. The indexer retrieves PDFs from a container. You can create the container and upload files in the Azure portal.
98+
In this step, set up the sample data and a connection to Azure Blob Storage. The indexer retrieves PDFs from a container. You create the container and upload files in this step.
9999

100100
1. Sign in to the Azure portal and find your Azure Storage account.
101101

articles/search/tutorial-rag-build-solution-query.md

Lines changed: 18 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -36,11 +36,11 @@ This tutorial builds on the previous tutorials. It assumes you have a search ind
3636

3737
## Download the sample
3838

39-
You use the same notebook from the previous indexing pipeline tutorial. Scripts for querying the LLM follow the pipeline steps. If you don't already have the notebook, [download it](https://github.com/Azure-Samples/azure-search-python-samples/blob/main/Tutorial-RAG/Tutorial-rag.ipynb) from GitHub.
39+
You use the same notebook from the previous indexing pipeline tutorial. Scripts for querying the LLM follow the pipeline creation steps. If you don't already have the notebook, [download it](https://github.com/Azure-Samples/azure-search-python-samples/blob/main/Tutorial-RAG/Tutorial-rag.ipynb) from GitHub.
4040

4141
## Configure clients for sending queries
4242

43-
The RAG pattern in Azure AI Search is a synchronized connection to a search index to obtain the grounding data, followed by a connection to an LLM to formulate a response to the user's question. The same query string is used by both clients.
43+
The RAG pattern in Azure AI Search is a synchronized series of connections to a search index to obtain the grounding data, followed by a connection to an LLM to formulate a response to the user's question. The same query string is used by both clients.
4444

4545
You're setting up two clients, so you need permissions on both resources. We use API keys for this exercise. The following endpoints and keys are used for queries:
4646

@@ -54,6 +54,8 @@ AZURE_OPENAI_KEY: str = "PUT YOUR AZURE OPENAI KEY HERE"
5454

5555
## Example script for prompt and query
5656

57+
Here's the Python script that instantiates the clients, defines the prompt, and sets up the query. You can run this script in the notebook to generate a response from your chat model deployment.
58+
5759
```python
5860
# Import libraries
5961
from azure.search.documents import SearchClient
@@ -115,21 +117,27 @@ print(response.choices[0].message.content)
115117

116118
## Review results
117119

118-
In this example, the answer is based on a single input (`top=1`) consisting of the one chunk determined by the search engine to be the most relevant. Instructions in the prompt tell the LLM to use only the information in the `sources`, or formatted search results. Results from the first query`"how much of earth is covered by water"` should look similar to the following example.
120+
In this response, the answer is based on a single input (`top=1`) consisting of the one chunk determined by the search engine to be the most relevant. Instructions in the prompt tell the LLM to use only the information in the `sources`, or formatted search results.
121+
122+
Results from the first query`"how much of earth is covered by water"` should look similar to the following example.
119123

120124
:::image type="content" source="media/tutorial-rag-solution/chat-results-1.png" alt-text="Screenshot of an LLM response to a simple question using a single match from search results.":::
121125

122126
### Changing the inputs
123127

124-
Increasing or decreasing the number of inputs to the LLM can have a large effect on the response. Try running the same query again after setting `top=3`. When you increase the inputs, the model returns different results each time, even if the query doesn't change. Here's one example of what the model returns after increasing the inputs to 3.
128+
Increasing or decreasing the number of inputs to the LLM can have a large effect on the response. Try running the same query again after setting `top=3`. When you increase the inputs, the model returns different results each time, even if the query doesn't change.
129+
130+
Here's one example of what the model returns after increasing the inputs to 3.
125131

126132
:::image type="content" source="media/tutorial-rag-solution/chat-results-2.png" alt-text="Screenshot of an LLM response to a simple question using a larger result set.":::
127133

128-
Because the model is bound to just the grounding data, the answer is larger also more vague. You can use relevance tuning to potentially generate more focused answers.
134+
Because the model is bound to just the grounding data, the answer becomes more expansive as you increase size of the input. You can use relevance tuning to potentially generate more focused answers.
129135

130136
### Changing the prompt
131137

132-
You can also change the prompt to control the format of the output, tone, and whether you want the model to supplement the answer with its own training data by changing the prompt. Here's another example of LLM output if we refocus the prompt.
138+
You can also change the prompt to control the format of the output, tone, and whether you want the model to supplement the answer with its own training data by changing the prompt.
139+
140+
Here's another example of LLM output if we refocus the prompt.
133141

134142
```python
135143
# Provide instructions to the model
@@ -149,15 +157,17 @@ Output from changing just the prompt, retaining `top=3` from the previous query,
149157

150158
:::image type="content" source="media/tutorial-rag-solution/chat-results-3.png" alt-text="Screenshot of an LLM response to a change in prompt composition.":::
151159

152-
In this tutorial, assessing the quality of the answer is subjective, but since the model is working with the same results as the previous query, the answer feels incomplete given the body of content available. Let's try the request one last time, increasing `top=10`.
160+
In this tutorial, assessing the quality of the answer is subjective, but since the model is working with the same results as the previous query, the answer feels less focused, and some bullets seem only tangential to a question about the surface area of water on earth. Let's try the request one last time, increasing `top=10`.
153161

154162
:::image type="content" source="media/tutorial-rag-solution/chat-results-4.png" alt-text="Screenshot of an LLM response to a simple question using top set to 10.":::
155163

156164
There are several observations to note:
157165

158166
- Raising the `top` value can exhaust available quota on the model. If there's no quota, an error message is returned.
159167

160-
- Improving the relevance of the search results from Azure AI Search is the most effective approach for maximizing the utility of your LLM.
168+
- Raising the `top` value doesn't necessarily improve the outcome. The answer isn't the same as `top=3`, but it's similar. This observation underscores an important point that might be counter-intuitive to expections. Throwing more content at an LLM doesn't always yield better results.
169+
170+
- So what might help? Typically, the answer is relevance tuning. Improving the relevance of the search results from Azure AI Search is usually the most effective approach for maximizing the utility of your LLM.
161171

162172
In the next series of tutorials, the focus shifts to maximizing relevance and optimizing query performance for speed and concision. We revisit the schema definition and query logic to implement relevance features, but the rest of the pipeline and models remain intact.
163173

0 commit comments

Comments
 (0)