Skip to content

Commit 30a1f46

Browse files
Merge pull request #284794 from HeidiSteen/heidist-august
RAG article update, replacing code sample
2 parents cc36824 + ffc9cab commit 30a1f46

File tree

1 file changed

+75
-91
lines changed

1 file changed

+75
-91
lines changed

articles/search/retrieval-augmented-generation-overview.md

Lines changed: 75 additions & 91 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ ms.service: cognitive-search
1010
ms.custom:
1111
- ignite-2023
1212
ms.topic: conceptual
13-
ms.date: 07/29/2024
13+
ms.date: 08/15/2024
1414
---
1515

1616
# Retrieval Augmented Generation (RAG) in Azure AI Search
@@ -90,15 +90,15 @@ Since you probably know what kind of content you want to search over, consider t
9090
| Content type | Indexed as | Features |
9191
|--------------|------------|----------|
9292
| text | tokens, unaltered text | [Indexers](search-indexer-overview.md) can pull plain text from other Azure resources like Azure Storage and Cosmos DB. You can also [push any JSON content](search-what-is-data-import.md) to an index. To modify text in flight, use [analyzers](search-analyzers.md) and [normalizers](search-normalizers.md) to add lexical processing during indexing. [Synonym maps](search-synonyms.md) are useful if source documents are missing terminology that might be used in a query. |
93-
| text | vectors <sup>1</sup> | Text can be chunked and vectorized externally and then [indexed as vector fields](vector-search-how-to-create-index.md) in your index. |
93+
| text | vectors <sup>1</sup> | Text can be chunked and vectorized in an indexer pipeline, or handled externally and then [indexed as vector fields](vector-search-how-to-create-index.md) in your index. |
9494
| image | tokens, unaltered text <sup>2</sup> | [Skills](cognitive-search-working-with-skillsets.md) for OCR and Image Analysis can process images for text recognition or image characteristics. Image information is converted to searchable text and added to the index. Skills have an indexer requirement. |
95-
| image | vectors <sup>1</sup> | Images can be vectorized externally for a mathematical representation of image content and then [indexed as vector fields](vector-search-how-to-create-index.md) in your index. You can use an open source model like [OpenAI CLIP](https://github.com/openai/CLIP/blob/main/README.md) to vectorize text and images in the same embedding space.|
95+
| image | vectors <sup>1</sup> | Images can be vectorized in an indexer pipeline, or handled externally for a mathematical representation of image content and then [indexed as vector fields](vector-search-how-to-create-index.md) in your index. You can use [Azure AI Vision multimodal](/azure/ai-services/computer-vision/how-to/image-retrieval) or an open source model like [OpenAI CLIP](https://github.com/openai/CLIP/blob/main/README.md) to vectorize text and images in the same embedding space.|
9696
<!-- | audio | vectors <sup>1</sup> | Vectorized audio content can be [indexed as vector fields](vector-search-how-to-create-index.md) in your index. Vectorization of audio content often requires intermediate processing that converts audio to text, and then text to vecctors. [Azure AI Speech](/azure/ai-services/speech-service/overview) and [OpenAI Whisper](https://platform.openai.com/docs/guides/speech-to-text) are two examples for this scenario. |
9797
| video | vectors <sup>1</sup> | Vectorized video content can be [indexed as vector fields](vector-search-how-to-create-index.md) in your index. Similar to audio, vectorization of video content also requires extra processing, such as breaking up the video into frames or smaller chunks for vectorization. | -->
9898

99-
<sup>1</sup> The generally available functionality of [vector support](vector-search-overview.md) requires that you call other libraries or models for data chunking and vectorization. However, [integrated vectorization](vector-search-integrated-vectorization.md) embeds these steps. For code samples showing both approaches, see [azure-search-vectors repo](https://github.com/Azure/azure-search-vector-samples).
99+
<sup>1</sup> Azure AI Search provides [integrated data chunking and vectorization](vector-search-integrated-vectorization.md), but you must take a dependency on indexers and skillsets. If you can't use an indexer, Microsoft's [Semantic Kernel](/semantic-kernel/overview/) or other community offerings can help you with a full stack solution. For code samples showing both approaches, see [azure-search-vectors repo](https://github.com/Azure/azure-search-vector-samples).
100100

101-
<sup>2</sup> [Skills](cognitive-search-working-with-skillsets.md) are built-in support for [AI enrichment](cognitive-search-concept-intro.md). For OCR and Image Analysis, the indexing pipeline makes an internal call to the Azure AI Vision APIs. These skills pass an extracted image to Azure AI for processing, and receive the output as text that's indexed by Azure AI Search.
101+
<sup>2</sup> [Skills](cognitive-search-working-with-skillsets.md) are built-in support for [applied AI](cognitive-search-concept-intro.md). For OCR and Image Analysis, the indexing pipeline makes an internal call to the Azure AI Vision APIs. These skills pass an extracted image to Azure AI for processing, and receive the output as text that's indexed by Azure AI Search. Skills are also used for integrated data chunking (Text Split skill) and integrated embedding (skills that call Azure AI Vision multimodal, Azure OpenAI, and models in the Azure AI Studio model catalog.)
102102

103103
Vectors provide the best accommodation for dissimilar content (multiple file formats and languages) because content is expressed universally in mathematic representations. Vectors also support similarity search: matching on the coordinates that are most similar to the vector query. Compared to keyword search (or term search) that matches on tokenized terms, similarity search is more nuanced. It's a better choice if there's ambiguity or interpretation requirements in the content or in queries.
104104

@@ -131,115 +131,99 @@ Fields appear in search results when the attribute is "retrievable". A field def
131131

132132
Rows are matches to the query, ranked by relevance, similarity, or both. By default, results are capped at the top 50 matches for full text search or k-nearest-neighbor matches for vector search. You can change the defaults to increase or decrease the limit up to the maximum of 1,000 documents. You can also use top and skip paging parameters to retrieve results as a series of paged results.
133133

134-
### Rank by relevance
134+
### Maximize relevance and recall
135135

136136
When you're working with complex processes, a large amount of data, and expectations for millisecond responses, it's critical that each step adds value and improves the quality of the end result. On the information retrieval side, *relevance tuning* is an activity that improves the quality of the results sent to the LLM. Only the most relevant or the most similar matching documents should be included in results.
137137

138-
Relevance applies to keyword (nonvector) search and to hybrid queries (over the nonvector fields). In Azure AI Search, there's no relevance tuning for similarity search and vector queries. [BM25 ranking](index-similarity-and-scoring.md) is the ranking algorithm for full text search.
138+
Here are some tips for maximizing relevance and recall:
139139

140-
Relevance tuning is supported through features that enhance BM25 ranking. These approaches include:
140+
+ [Hybrid queries](hybrid-search-how-to-query.md) that combine keyword (nonvector) search and vector search give you maximum recall when the inputs are the same. In a hybrid query, if you double down on the same input, a text string and its vector equivalent generate parallel queries for keywords and similarity search, returning the most relevant matches from each query type in a unified result set.
141141

142-
+ [Scoring profiles](index-add-scoring-profiles.md) that boost the search score if matches are found in a specific search field or on other criteria.
143-
+ [Semantic ranking](semantic-ranking.md) that re-ranks a BM25 results set, using semantic models from Bing to reorder results for a better semantic fit to the original query.
142+
+ Hybrid queries can also be expansive. You can run similarity search over verbose chunked content, and keyword search over names, all in the same request.
144143

145-
In comparison and benchmark testing, hybrid queries with text and vector fields, supplemented with semantic ranking over the BM25-ranked results, produce the most relevant results.
144+
+ Relevance tuning is supported through:
145+
146+
+ [Scoring profiles](index-add-scoring-profiles.md) that boost the search score if matches are found in a specific search field or on other criteria.
147+
148+
+ [Semantic ranking](semantic-ranking.md) that re-ranks an initial results set, using semantic models from Bing to reorder results for a better semantic fit to the original query.
149+
150+
+ Query parameters for fine-tuning. You can [bump up the importance of vector queries](vector-search-how-to-query.md#vector-weighting) or [adjust the amount of BM25-ranked results](vector-search-how-to-query.md#maxtextsizerecall-for-hybrid-search-preview) in a hybrid query. You can also [set minimum thresholds to exclude low scoring results](vector-search-how-to-query.md#set-thresholds-to-exclude-low-scoring-results-preview) from a vector query.
151+
152+
In comparison and benchmark testing, hybrid queries with text and vector fields, supplemented with semantic ranking, produce the most relevant results.
146153

147154
### Example code of an Azure AI Search query for RAG scenarios
148155

149-
The following code is copied from the [retrievethenread.py](https://github.com/Azure-Samples/azure-search-openai-demo/blob/main/app/backend/approaches/retrievethenread.py) file from a demo site. It produces `sources_content` for the LLM from hybrid query search results. You can write a simpler query, but this example is inclusive of vector search and keyword search with semantic reranking and spell check. In the demo, this query is used to get initial content.
156+
The following Python code demonstrates the essential components of a RAG workflow in Azure AI Search. You need to set up the clients, define a system prompt, and provide a query. The prompt tells the LLM to use just the results from the query, and how to return the results. For more steps based on this example, see this [RAG quickstart](search-get-started-rag.md).
150157

151158
```python
152-
# Use semantic ranker if requested and if retrieval mode is text or hybrid (vectors + text)
153-
if overrides.get("semantic_ranker") and has_text:
154-
r = await self.search_client.search(query_text,
155-
filter=filter,
156-
query_type=QueryType.SEMANTIC,
157-
query_language="en-us",
158-
query_speller="lexicon",
159-
semantic_configuration_name="default",
160-
top=top,
161-
query_caption="extractive|highlight-false" if use_semantic_captions else None,
162-
vector=query_vector,
163-
top_k=50 if query_vector else None,
164-
vector_fields="embedding" if query_vector else None)
165-
else:
166-
r = await self.search_client.search(query_text,
167-
filter=filter,
168-
top=top,
169-
vector=query_vector,
170-
top_k=50 if query_vector else None,
171-
vector_fields="embedding" if query_vector else None)
172-
if use_semantic_captions:
173-
results = [doc[self.sourcepage_field] + ": " + nonewlines(" . ".join([c.text for c in doc['@search.captions']])) async for doc in r]
174-
else:
175-
results = [doc[self.sourcepage_field] + ": " + nonewlines(doc[self.content_field]) async for doc in r]
176-
content = "\n".join(results)
159+
# Set up the query for generating responses
160+
from azure.identity import DefaultAzureCredential
161+
from azure.identity import get_bearer_token_provider
162+
from azure.search.documents import SearchClient
163+
from openai import AzureOpenAI
164+
165+
credential = DefaultAzureCredential()
166+
token_provider = get_bearer_token_provider(credential, "https://cognitiveservices.azure.com/.default")
167+
openai_client = AzureOpenAI(
168+
api_version="2024-06-01",
169+
azure_endpoint=AZURE_OPENAI_ACCOUNT,
170+
azure_ad_token_provider=token_provider
171+
)
172+
173+
search_client = SearchClient(
174+
endpoint=AZURE_SEARCH_SERVICE,
175+
index_name="hotels-sample-index",
176+
credential=credential
177+
)
178+
179+
# This prompt provides instructions to the model
180+
GROUNDED_PROMPT="""
181+
You are a friendly assistant that recommends hotels based on activities and amenities.
182+
Answer the query using only the sources provided below in a friendly and concise bulleted manner.
183+
Answer ONLY with the facts listed in the list of sources below.
184+
If there isn't enough information below, say you don't know.
185+
Do not generate answers that don't use the sources below.
186+
Query: {query}
187+
Sources:\n{sources}
188+
"""
189+
190+
# Query is the question being asked
191+
query="Can you recommend a few hotels near the ocean with beach access and good views"
192+
193+
# Retrieve the selected fields from the search index related to the question
194+
search_results = search_client.search(
195+
search_text=query,
196+
top=5,
197+
select="Description,HotelName,Tags"
198+
)
199+
sources_formatted = "\n".join([f'{document["HotelName"]}:{document["Description"]}:{document["Tags"]}' for document in search_results])
200+
201+
response = openai_client.chat.completions.create(
202+
messages=[
203+
{
204+
"role": "user",
205+
"content": GROUNDED_PROMPT.format(query=query, sources=sources_formatted)
206+
}
207+
],
208+
model="gpt-4o"
209+
)
210+
211+
print(response.choices[0].message.content)
177212
```
178213

179214
## Integration code and LLMs
180215

181-
A RAG solution that includes Azure AI Search requires other components and code to create a complete solution. Whereas the previous sections covered information retrieval through Azure AI Search and which features are used to create and query searchable content, this section introduces LLM integration and interaction.
216+
A RAG solution that includes Azure AI Search can leverage [built-in data chunking and vectorization capabilities](vector-search-integrated-vectorization.md), or you can build your own using platforms like Semantic Kernel, LangChain, or LlamaIndex.
182217

183-
Notebooks in the demo repositories are a great starting point because they show patterns for passing search results to an LLM. Most of the code in a RAG solution consists of calls to the LLM so you need to develop an understanding of how those APIs work, which is outside the scope of this article.
184-
185-
The following cell block in the [chat-read-retrieve-read.ipynb](https://github.com/Azure-Samples/openai/blob/main/End_to_end_Solutions/AOAISearchDemo/notebooks/chat-read-retrieve-read.ipynb) notebook shows search calls in the context of a chat session:
186-
187-
```python
188-
# Execute this cell multiple times updating user_input to accumulate chat history
189-
user_input = "Does my plan cover annual eye exams?"
190-
191-
# Exclude category, to simulate scenarios where there's a set of docs you can't see
192-
exclude_category = None
193-
194-
if len(history) > 0:
195-
completion = openai.Completion.create(
196-
engine=AZURE_OPENAI_GPT_DEPLOYMENT,
197-
prompt=summary_prompt_template.format(summary="\n".join(history), question=user_input),
198-
temperature=0.7,
199-
max_tokens=32,
200-
stop=["\n"])
201-
search = completion.choices[0].text
202-
else:
203-
search = user_input
204-
205-
# Alternatively simply use search_client.search(q, top=3) if not using semantic search
206-
print("Searching:", search)
207-
print("-------------------")
208-
filter = "category ne '{}'".format(exclude_category.replace("'", "''")) if exclude_category else None
209-
r = search_client.search(search,
210-
filter=filter,
211-
query_type=QueryType.SEMANTIC,
212-
query_language="en-us",
213-
query_speller="lexicon",
214-
semantic_configuration_name="default",
215-
top=3)
216-
results = [doc[KB_FIELDS_SOURCEPAGE] + ": " + doc[KB_FIELDS_CONTENT].replace("\n", "").replace("\r", "") for doc in r]
217-
content = "\n".join(results)
218-
219-
prompt = prompt_prefix.format(sources=content) + prompt_history + user_input + turn_suffix
220-
221-
completion = openai.Completion.create(
222-
engine=AZURE_OPENAI_CHATGPT_DEPLOYMENT,
223-
prompt=prompt,
224-
temperature=0.7,
225-
max_tokens=1024,
226-
stop=["<|im_end|>", "<|im_start|>"])
227-
228-
prompt_history += user_input + turn_suffix + completion.choices[0].text + "\n<|im_end|>" + turn_prefix
229-
history.append("user: " + user_input)
230-
history.append("assistant: " + completion.choices[0].text)
231-
232-
print("\n-------------------\n".join(history))
233-
print("\n-------------------\nPrompt:\n" + prompt)
234-
```
218+
[Notebooks in the demo repository](https://github.com/Azure/azure-search-vector-samples/tree/main/demo-python/code/community-integration) are a great starting point because they show patterns for LLM integration. Much of the code in a RAG solution consists of calls to the LLM so you need to develop an understanding of how those APIs work, which is outside the scope of this article.
235219

236220
## How to get started
237221

238222
+ [Use Azure AI Studio to create a search index](/azure/ai-studio/how-to/index-add).
239223

240224
+ [Use Azure OpenAI Studio and "bring your own data"](/azure/ai-services/openai/concepts/use-your-data) to experiment with prompts on an existing search index in a playground. This step helps you decide what model to use, and shows you how well your existing index works in a RAG scenario.
241225

242-
+ [Try this quickstart](search-get-started-rag.md) for a demonstration of query integration with chat models over a search index.
226+
+ [Try this RAG quickstart](search-get-started-rag.md) for a demonstration of query integration with chat models over a search index.
243227

244228
+ Start with solution accelerators:
245229

0 commit comments

Comments
 (0)