Skip to content

Commit 401c510

Browse files
authored
Merge pull request #280276 from khelanmodi/origin/ragvcore
RAG with vCore conceptual doc
2 parents 4e8d179 + 0623896 commit 401c510

File tree

5 files changed

+263
-0
lines changed

5 files changed

+263
-0
lines changed

articles/cosmos-db/mongodb/vcore/TOC.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,8 @@
3636
href: vector-search.md
3737
- name: Open-source vector databases
3838
href: vector-search-ai.md
39+
- name: RAG with Langchain & OpenAI
40+
href: rag.md
3941
- name: MongoDB feature support
4042
href: compatibility.md
4143
- name: High availability (HA)

articles/cosmos-db/mongodb/vcore/index.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,8 @@ landingContent:
3636
url: vector-search.md
3737
- text: Open-source vector database
3838
url: vector-search-ai.md
39+
- text: RAG
40+
url: rag.md
3941
- title: Develop applications
4042
linkLists:
4143
- linkListType: tutorial
172 KB
Loading
87.2 KB
Loading
Lines changed: 259 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,259 @@
1+
---
2+
title: Optimize Retrieval-Augmented Generation (RAG) with Azure Cosmos DB for MongoDB (vCore), LangChain, and OpenAI
3+
titleSuffix: Azure Cosmos DB
4+
description: Learn how to enhance AI-based applications using Retrieval-Augmented Generation (RAG) with Azure Cosmos DB for MongoDB (vCore), LangChain, and OpenAI. Discover key concepts, architecture, and real-world applications.
5+
author: khelanmodi
6+
ms.author: khelanmodi
7+
ms.reviewer: gahllevy
8+
ms.service: cosmos-db
9+
ms.subservice: mongodb-vcore
10+
ms.topic: conceptual
11+
ms.date: 07/08/2024
12+
---
13+
14+
# RAG with vCore-based Azure Cosmos DB for MongoDB
15+
In the fast-evolving realm of generative AI, Large Language Models (LLMs) like GPT-3.5 have transformed natural language processing. However, an emerging trend in AI is the use of vector stores, which play a pivotal role in enhancing AI applications.
16+
17+
This tutorial explores how to use Azure Cosmos DB for MongoDB (vCore), LangChain, and OpenAI to implement Retrieval-Augmented Generation (RAG) for superior AI performance alongside discussing LLMs and their limitations. We explore the rapidly adopted paradigm of "retrieval-augmented generation" (RAG), and briefly discuss the LangChain framework, Azure OpenAI models. Finally, we integrate these concepts into a real-world application. By the end, readers will have a solid understanding of these concepts.
18+
19+
## Understand Large Language Models (LLMs) and their limitations
20+
21+
Large Language Models (LLMs) are advanced deep neural network models trained on extensive text datasets, enabling them to understand and generate human-like text. While revolutionary in natural language processing, LLMs have inherent limitations:
22+
23+
- **Hallucinations**: LLMs sometimes generate factually incorrect or ungrounded information, known as "hallucinations."
24+
- **Stale Data**: LLMs are trained on static datasets that might not include the most recent information, limiting their current relevance.
25+
- **No Access to User’s Local Data**: LLMs don't have direct access to personal or localized data, restricting their ability to provide personalized responses.
26+
- **Token Limits**: LLMs have a maximum token limit per interaction, constraining the amount of text they can process at once. For example, OpenAI’s gpt-3.5-turbo has a token limit of 4096.
27+
28+
## Leverage Retrieval-Augmented Generation (RAG)
29+
30+
Retrieval-augmented generation (RAG) is an architecture designed to overcome LLM limitations. RAG uses vector search to retrieve relevant documents based on an input query, providing these documents as context to the LLM for generating more accurate responses. Instead of relying solely on pretrained patterns, RAG enhances responses by incorporating up-to-date, relevant information. This approach helps to:
31+
32+
- **Minimize Hallucinations**: Grounding responses in factual information.
33+
- **Ensure Current Information**: Retrieving the most recent data to ensure up-to-date responses.
34+
- **Utilize External Databases**: Though it doesn't grant direct access to personal data, RAG allows integration with external, user-specific knowledge bases.
35+
- **Optimize Token Usage**: By focusing on the most relevant documents, RAG makes token usage more efficient.
36+
37+
This tutorial demonstrates how RAG can be implemented using Azure Cosmos DB for MongoDB (vCore) to build a question-answering application tailored to your data.
38+
39+
## Application architecture overview
40+
41+
The architecture diagram below illustrates the key components of our RAG implementation:
42+
43+
![Architecture Diagram](./media/vector/architecture-diagram.png)
44+
45+
## Key components and frameworks
46+
47+
We'll now discuss the various frameworks, models, and components used in this tutorial, emphasizing their roles and nuances.
48+
49+
### Azure Cosmos DB for MongoDB (vCore)
50+
51+
Azure Cosmos DB for MongoDB (vCore) supports semantic similarity searches, essential for AI-powered applications. It allows data in various formats to be represented as vector embeddings, which can be stored alongside source data and metadata. Using an approximate nearest neighbors algorithm, like Hierarchical navigable small world (HNSW), these embeddings can be queried for fast semantic similarity searches.
52+
53+
### LangChain framework
54+
55+
LangChain simplifies the creation of LLM applications by providing a standard interface for chains, multiple tool integrations, and end-to-end chains for common tasks. It enables AI developers to build LLM applications that leverage external data sources.
56+
57+
Key aspects of LangChain:
58+
59+
- **Chains**: Sequences of components solving specific tasks.
60+
- **Components**: Modules like LLM wrappers, vector store wrappers, prompt templates, data loaders, text splitters, and retrievers.
61+
- **Modularity**: Simplifies development, debugging, and maintenance.
62+
- **Popularity**: An open-source project rapidly gaining adoption and evolving to meet user needs.
63+
64+
### Azure App Services interface
65+
66+
App services provide a robust platform for building user-friendly web interfaces for Gen-AI applications. This tutorial uses Azure App services to create an interactive web interface for the application.
67+
68+
### OpenAI models
69+
70+
OpenAI is a leader in AI research, providing various models for language generation, text vectorization, image creation, and audio-to-text conversion. For this tutorial, we'll use OpenAI’s embedding and language models, crucial for understanding and generating language-based applications.
71+
72+
### Embedding models vs. Language generation models
73+
74+
| | **Text Embedding Model** | **Language Model** |
75+
|---------------------------|----------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------|
76+
| **Purpose** | Converting text into vector embeddings. | Understanding and generating natural language. |
77+
| **Function** | Transforms textual data into high-dimensional arrays of numbers, capturing the semantic meaning of the text. | Comprehends and produces human-like text based on given input. |
78+
| **Output** | Array of numbers (vector embeddings). | Text, answers, translations, code, etc. |
79+
| **Example Output** | Each embedding represents the semantic meaning of the text in numerical form, with a dimensionality determined by the model. For example, `text-embedding-ada-002` generates vectors with 1536 dimensions. | Contextually relevant and coherent text generated based on the input provided. For example, `gpt-3.5-turbo` can generate responses to questions, translate text, write code, and more. |
80+
| **Typical Use Cases** | - Semantic search | - Chatbots |
81+
| | - Recommendation systems | - Automated content creation |
82+
| | - Clustering and classification of text data | - Language translation |
83+
| | - Information retrieval | - Summarization |
84+
| **Data Representation** | Numerical representation (embeddings) | Natural language text |
85+
| **Dimensionality** | The length of the array corresponds to the number of dimensions in the embedding space, for example, 1536 dimensions. | Typically represented as a sequence of tokens, with the context determining the length. |
86+
87+
88+
### Main components of the application
89+
90+
- **Azure Cosmos DB for MongoDB vCore**: Storing and querying vector embeddings.
91+
- **LangChain**: Constructing the application’s LLM workflow. Utilizes tools such as:
92+
- **Document Loader**: For loading and processing documents from a directory.
93+
- **Vector Store Integration**: For storing and querying vector embeddings in Azure Cosmos DB.
94+
- **AzureCosmosDBVectorSearch**: Wrapper around Cosmos DB Vector search
95+
- **Azure App Services**: Building the user interface for Cosmic Food app.
96+
- **Azure OpenAI**: For providing LLM and embedding models, including:
97+
- **text-embedding-ada-002**: A text embedding model that converts text into vector embeddings with 1536 dimensions.
98+
- **gpt-3.5-turbo**: A language model for understanding and generating natural language.
99+
100+
### Set up the environment
101+
102+
To get started with optimizing retrieval-augmented generation (RAG) using Azure Cosmos DB for MongoDB (vCore), follow these steps:
103+
104+
- **Create the following resources on Microsoft Azure:**
105+
- **Azure Cosmos DB for MongoDB vCore cluster**: See the [Quick Start guide here](https://aka.ms/tryvcore).
106+
- **Azure OpenAI resource with:**
107+
- **Embedding model deployment** (for example, `text-embedding-ada-002`).
108+
- **Chat model deployment** (for example, `gpt-35-turbo`).
109+
110+
### Sample documents
111+
In this tutorial, we will be loading a single text file using [Document](https://python.langchain.com/v0.1/docs/modules/data_connection/document_loaders/). These files should be saved in a directory named **data** in the **src** folder. The contents of the are as follows:
112+
```food_items.json
113+
{
114+
"category": "Cold Dishes",
115+
"name": "Hamachi Fig",
116+
"description": "Hamachi sashimi lightly tossed in a fig sauce with rum raisins, and serrano peppers then topped with fried lotus root.",
117+
"price": "16.0 USD"
118+
},
119+
```
120+
121+
### Load documents
122+
1. Set the Cosmos DB for MongoDB (vCore) connection string, Database Name, Collection Name, and Index:
123+
```python
124+
mongo_client = MongoClient(mongo_connection_string)
125+
database_name = "Contoso"
126+
db = mongo_client[database_name]
127+
collection_name = "ContosoCollection"
128+
index_name = "ContosoIndex"
129+
collection = db[collection_name]
130+
```
131+
132+
2. Initialize the Embedding Client.
133+
```python
134+
from langchain_openai import AzureOpenAIEmbeddings
135+
136+
openai_embeddings_model = os.getenv("AZURE_OPENAI_EMBEDDINGS_MODEL_NAME", "text-embedding-ada-002")
137+
openai_embeddings_deployment = os.getenv("AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_NAME", "text-embedding")
138+
139+
azure_openai_embeddings: AzureOpenAIEmbeddings = AzureOpenAIEmbeddings(
140+
model=openai_embeddings_model,
141+
azure_deployment=openai_embeddings_deployment,
142+
)
143+
```
144+
145+
3. Create embeddings from the data, save to the database and return a connection to your vector store, Cosmos DB for MongoDB (vCore).
146+
```python
147+
vector_store: AzureCosmosDBVectorSearch = AzureCosmosDBVectorSearch.from_documents(
148+
json_data,
149+
azure_openai_embeddings,
150+
collection=collection,
151+
index_name=index_name,
152+
)
153+
```
154+
155+
4. Create the following [HNSW vector Index](./vector-search.md) on the collection (Note the name of the index is same as above).
156+
```python
157+
num_lists = 100
158+
dimensions = 1536
159+
similarity_algorithm = CosmosDBSimilarityType.COS
160+
kind = CosmosDBVectorSearchType.VECTOR_HNSW
161+
m = 16
162+
ef_construction = 64
163+
164+
vector_store.create_index(
165+
num_lists, dimensions, similarity_algorithm, kind, m, ef_construction
166+
)
167+
```
168+
169+
### Perform Vector search using Cosmos DB for MongoDB (vCore)
170+
171+
1. Connect to your vector store.
172+
```python
173+
vector_store: AzureCosmosDBVectorSearch = AzureCosmosDBVectorSearch.from_connection_string(
174+
connection_string=mongo_connection_string,
175+
namespace=f"{database_name}.{collection_name}",
176+
embedding=azure_openai_embeddings,
177+
)
178+
```
179+
180+
2. Define a function that performs semantic similarity search using Cosmos DB Vector Search on a query (note this code snippet is just a test function).
181+
```python
182+
query = "beef dishes"
183+
docs = vector_store.similarity_search(query)
184+
print(docs[0].page_content)
185+
```
186+
187+
3. Initialize the Chat Client to implement a RAG function.
188+
```python
189+
azure_openai_chat: AzureChatOpenAI = AzureChatOpenAI(
190+
model=openai_chat_model,
191+
azure_deployment=openai_chat_deployment,
192+
)
193+
```
194+
195+
4. Create a RAG function.
196+
```python
197+
history_prompt = ChatPromptTemplate.from_messages(
198+
[
199+
MessagesPlaceholder(variable_name="chat_history"),
200+
("user", "{input}"),
201+
(
202+
"user",
203+
"""Given the above conversation,
204+
generate a search query to look up to get information relevant to the conversation""",
205+
),
206+
]
207+
)
208+
209+
context_prompt = ChatPromptTemplate.from_messages(
210+
[
211+
("system", "Answer the user's questions based on the below context:\n\n{context}"),
212+
MessagesPlaceholder(variable_name="chat_history"),
213+
("user", "{input}"),
214+
]
215+
)
216+
```
217+
218+
5. Converts the vector store into a retriever, which can search for relevant documents based on specified parameters.
219+
```python
220+
vector_store_retriever = vector_store.as_retriever(
221+
search_type=search_type, search_kwargs={"k": limit, "score_threshold": score_threshold}
222+
)
223+
```
224+
225+
6. Create a retriever chain that is aware of the conversation history, ensuring contextually relevant document retrieval using the **azure_openai_chat** model and **vector_store_retriever**.
226+
```python
227+
retriever_chain = create_history_aware_retriever(azure_openai_chat, vector_store_retriever, history_prompt)
228+
```
229+
230+
7. Create a chain that combines retrieved documents into a coherent response using the language model (**azure_openai_chat**) and a specified prompt (**context_prompt**).
231+
```python
232+
context_chain = create_stuff_documents_chain(llm=azure_openai_chat, prompt=context_prompt)
233+
```
234+
235+
8. Create a chain that handles the entire retrieval process, integrating the history-aware retriever chain and the document combination chain. This RAG chain can be executed to retrieve and generate contextually accurate responses.
236+
```python
237+
rag_chain: Runnable = create_retrieval_chain(
238+
retriever=retriever_chain,
239+
combine_docs_chain=context_chain,
240+
)
241+
```
242+
243+
### Sample outputs
244+
The screenshot below illustrates the outputs for various questions. A purely semantic-similarity search returns the raw text from the source documents, while the question-answering app using the RAG architecture generates precise and personalized answers by combining retrieved document contents with the language model.
245+
246+
![Rag Comic App](./media/vector/rag-cosmic-screenshot.png)
247+
248+
### Conclusion
249+
In this tutorial, we explored how to build a question-answering app that interacts with your private data using Cosmos DB as a vector store. By leveraging the retrieval-augmented generation (RAG) architecture with LangChain and Azure OpenAI, we demonstrated how vector stores are essential for LLM applications.
250+
251+
RAG is a significant advancement in AI, particularly in natural language processing, and combining these technologies allows for the creation of powerful AI-driven applications for various use cases.
252+
253+
## Next steps
254+
255+
For a detailed, hands-on experience and to see how RAG can be implemented using Azure Cosmos DB for MongoDB (vCore), LangChain, and OpenAI models, visit our GitHub repository.
256+
257+
> [!div class="nextstepaction"]
258+
> [Check out RAG sample on GitHub](https://github.com/Azure-Samples/Cosmic-Food-RAG-app)
259+

0 commit comments

Comments
 (0)