|
| 1 | +--- |
| 2 | +title: Use data with Azure OpenAI |
| 3 | +titleSuffix: Azure Cosmos DB |
| 4 | +description: Use Retrieval Augmented Generation (RAG) and vector search to ground your Azure OpenAI models with data stored in Azure Cosmos DB. |
| 5 | +author: jacodel |
| 6 | +ms.author: sidandrews |
| 7 | +ms.service: cosmos-db |
| 8 | +ms.topic: conceptual |
| 9 | +ms.date: 08/16/2023 |
| 10 | +--- |
| 11 | + |
| 12 | +# Use Azure Cosmos DB data with Azure OpenAI |
| 13 | + |
| 14 | +[!INCLUDE[NoSQL, MongoDB vCore, PostgreSQL](includes/appliesto-nosql-mongodbvcore-postgresql.md)] |
| 15 | + |
| 16 | +The Large Language Models (LLMs) in Azure OpenAI are incredibly powerful tools that can take your AI-powered applications to the next level. The utility of LLMs can increase significantly when the models can have access to the right data, at the right time, from your application's data store. This process is known as Retrieval Augmented Generation (RAG) and there are many ways to do this today with Azure Cosmos DB. |
| 17 | + |
| 18 | +In this article, we review key concepts for RAG and then provide links to tutorials and sample code that demonstrate some of most powerful RAG patterns using *vector search* to bring the most semantically relevant data to your LLMs. These tutorials can help you become comfortable with using your Azure Cosmos DB data in Azure OpenAI models. |
| 19 | + |
| 20 | +To jump right into tutorials and sample code for RAG patterns with Azure Cosmos DB, use the following links: |
| 21 | + |
| 22 | +| | Description | |
| 23 | +| --- | --- | |
| 24 | +| **[Azure Cosmos DB for NoSQL with Azure Cognitive Search](#azure-cosmos-db-for-nosql-and-azure-cognitive-search)**. | Augment your Azure Cosmos DB data with semantic and vector search capabilities of Azure Cognitive Search. | |
| 25 | +| **[Azure Cosmos DB for Mongo DB vCore](#azure-cosmos-db-for-mongodb-vcore)**. | Featuring native support for vector search, store your application data and vector embeddings together in a single MongoDB-compatible service. | |
| 26 | +| **[Azure Cosmos DB for PostgreSQL](#azure-cosmos-db-for-postgresql)**. | Offering native support vector search, you can store your data and vectors together in a scalable PostgreSQL offering. | |
| 27 | + |
| 28 | +## Key concepts |
| 29 | + |
| 30 | +This section includes key concepts that are critical to implementing RAG with Azure Cosmos DB and Azure OpenAI. |
| 31 | + |
| 32 | +### Retrieval Augmented Generation (RAG) |
| 33 | + |
| 34 | +RAG involves the process of retrieving supplementary data to provide the LLM with the ability to use this data when it generates responses. When presented with a user's question or prompt, RAG aims to select the most pertinent and current domain-specific knowledge from external sources, such as articles or documents. This retrieved information serves as a valuable reference for the model when generating its response. For example, a simple RAG pattern using Azure Cosmos DB for NoSQL could be: |
| 35 | + |
| 36 | +1. Insert data into an Azure Cosmos DB for NoSQL database and collection. |
| 37 | +2. Create embeddings from a data property using an Azure OpenAI Embeddings model |
| 38 | +3. Link the Azure Cosmos DB for NoSQL to Azure Cognitive Search (for vector indexing/search) |
| 39 | +4. Create a vector index over the embeddings properties. |
| 40 | +5. Create a function to perform vector similarity search based on a user prompt. |
| 41 | +6. Perform question answering over the data using an Azure OpenAI Completions model |
| 42 | + |
| 43 | +The RAG pattern, with prompt engineering, serves the purpose of enhancing response quality by offering more contextual information to the model. RAG enables the model to apply a broader knowledge base by incorporating relevant external sources into the generation process, resulting in more comprehensive and informed responses. For more information on "grounding" LLMs, see [grounding LLMs - Microsoft Community Hub](https://techcommunity.microsoft.com/t5/fasttrack-for-azure/grounding-llms/ba-p/3843857) |
| 44 | + |
| 45 | +### Prompts and prompt engineering |
| 46 | + |
| 47 | +A prompt refers to a specific text or information that can serve as an instruction to an LLM, or as contextual data that the LLM can build upon. A prompt can take various forms, such as a question, a statement, or even a code snippet. Prompts can serve as: |
| 48 | + |
| 49 | +- **Instructions** provide directives to the LLM |
| 50 | +- **Primary content**: gives information to the LLM for processing |
| 51 | +- **Examples**: help condition the model to a particular task or process |
| 52 | +- **Cues**: direct the LLM's output in the right direction |
| 53 | +- **Supporting content**: represents supplemental information the LLM can use to generate output |
| 54 | + |
| 55 | +The process of creating good prompts for a scenario is called *prompt engineering*. For more information about prompts and best practices for prompt engineering, see [Azure OpenAI Service - Azure OpenAI | Microsoft Learn](../ai-services/openai/concepts/prompt-engineering.md). |
| 56 | + |
| 57 | +### Tokens |
| 58 | + |
| 59 | +Tokens are small chunks of text generated by splitting the input text into smaller segments. These segments can either be words or groups of characters, varying in length from a single character to an entire word. For instance, the word `hamburger` would be divided into tokens such as `ham`, `bur`, and `ger` while a short and common word like `pear` would be considered a single token. |
| 60 | + |
| 61 | +In Azure OpenAI, input text provided to the API is turned into tokens (tokenized). The number of tokens processed in each API request depends on factors such as the length of the input, output, and request parameters. The quantity of tokens being processed also impacts the response time and throughput of the models. There are limits to the amount tokens each model can take in a single request/response from Azure OpenAI. [Learn more about Azure OpenAI Service quotas and limits here](../ai-services/openai/quotas-limits.md) |
| 62 | + |
| 63 | +### Vectors |
| 64 | + |
| 65 | +Vectors are ordered arrays of numbers (typically floats) that can represent information about some data. For example, an image can be represented as a vector of pixel values, or a string of text can be represented as a vector or ASCII values. The process for turning data into a vector is called *vectorization*. |
| 66 | + |
| 67 | +### Embeddings |
| 68 | + |
| 69 | +Embeddings are vectors that represent important features of data. Embeddings are often learned by using a deep learning model, and machine learning and AI models utilized them as features. Embeddings can also capture semantic similarity between similar concepts. For example, in generating an embedding for the words `person` and `human`, we would expect their embeddings (vector representation) to be similar in value since the words are also semantically similar. |
| 70 | + |
| 71 | + Azure OpenAI features models for creating embeddings from text data. The service breaks text out into tokens and generates embeddings using models pretrained by OpenAI. [Learn more about creating embeddings with Azure OpenAI here.](../ai-services/openai//concepts/understand-embeddings.md) |
| 72 | + |
| 73 | +### Vector search |
| 74 | + |
| 75 | +Vector search refers to the process of finding all vectors in a dataset that are semantically similar to a specific query vector. Therefore, a query vector for the word `human`, and I search the entire dictionary for semantically similar words, I would expect to find the word `person` as a close match. This closeness, or distance, is measured using a similarity metric such as cosine similarity. The more similar the vectors are, the smaller the distance between them. |
| 76 | + |
| 77 | +Consider a scenario where you have a query over millions of document and you want to find the most similar document in your data. You can create embeddings for your data and the query document using Azure OpenAI. Then, you can perform a vector search to find the most similar documents from your dataset. However, performing a vector search across a few examples is trivial. Performing this same search across thousands or millions of data points becomes challenging. There are also trade-offs between exhaustive search and approximate nearest neighbor (ANN) search methods including latency, throughput, accuracy, and cost, all of which can depend on the requirements of your application. |
| 78 | + |
| 79 | +Adding Azure Cosmos DB vector search capabilities to Azure OpenAI Service enables you to store long term memory and chat history to improve your Large Language Model (LLM) solution. Vector search allows you to efficiently query back the most relevant context to personalize Azure OpenAI prompts in a token-efficient manner. Storing vector embeddings alongside the data in an integrated solution minimizes the need to manage data synchronization and accelerates your time-to-market for AI app development. |
| 80 | + |
| 81 | +## Using Azure Cosmos DB data with Azure OpenAI |
| 82 | + |
| 83 | +The RAG pattern harnesses external knowledge and models to effectively handle custom data or domain-specific knowledge. It involves extracting pertinent information from an external data source and integrating it into the model request through prompt engineering. |
| 84 | + |
| 85 | +A robust mechanism is necessary to identify the most relevant data from the external source that can be passed to the model considering the limitation of a restricted number of tokens per request. This limitation is where embeddings play a crucial role. By converting the data in our database into embeddings and storing them as vectors for future use, we apply the advantage of capturing the semantic meaning of the text, going beyond mere keywords to comprehend the context. |
| 86 | + |
| 87 | +Prior to sending a request to Azure OpenAI, the user input/query/request is also transformed into an embedding, and vector search techniques are employed to locate the most similar embeddings within the database. This technique enables the identification of the most relevant data records in the database. These retrieved records are then supplied as input to the model request using prompt engineering. |
| 88 | + |
| 89 | +## Azure Cosmos DB for NoSQL and Azure Cognitive Search |
| 90 | + |
| 91 | +Implement RAG-patterns with Azure Cosmos DB for NoSQL and Azure Cognitive Search. This approach enables powerful integration of your data residing in Azure Cosmos DB for NoSQL into your AI-oriented applications. Azure Cognitive Search empowers you to efficiently index, and query high-dimensional vector data, which is stored in Azure Cosmos DB for NoSQL. |
| 92 | + |
| 93 | +### Code samples |
| 94 | + |
| 95 | +- [.NET retail chatbot demo](https://github.com/AzureCosmosDB/VectorSearchAiAssistant/tree/cognitive-search-vector-v2) |
| 96 | +- [.NET samples - Hackathon project](https://github.com/AzureCosmosDB/OpenAIHackathon) |
| 97 | +- [.NET tutorial - recipe chatbot](https://github.com/microsoft/AzureDataRetrievalAugmentedGenerationSamples/tree/main/C%23/CosmosDB-NoSQL_CognitiveSearch) |
| 98 | +- [.NET tutorial - recipe chatbot w/ Semantic Kernel](https://github.com/microsoft/AzureDataRetrievalAugmentedGenerationSamples/tree/main/C%23/CosmosDB-NoSQL_CognitiveSearch_SemanticKernel) |
| 99 | +- [Python notebook tutorial - Azure product chatbot](https://github.com/microsoft/AzureDataRetrievalAugmentedGenerationSamples/tree/main/Python/CosmosDB-NoSQL_CognitiveSearch) |
| 100 | + |
| 101 | +## Azure Cosmos DB for MongoDB vCore |
| 102 | + |
| 103 | +RAG can be applied using the native vector search feature in Azure Cosmos DB for MongoDB vCore, facilitating a smooth merger of your AI-centric applications with your stored data in Azure Cosmos DB. The use of vector search offers an efficient way to store, index, and search high-dimensional vector data directly within Azure Cosmos DB for MongoDB vCore alongside other application data. This approach removes the necessity of migrating your data to costlier alternatives for vector search. |
| 104 | + |
| 105 | +### Code samples |
| 106 | + |
| 107 | +- [.NET retail chatbot demo](https://github.com/AzureCosmosDB/VectorSearchAiAssistant/tree/mongovcorev2) |
| 108 | +- [.NET tutorial - recipe chatbot](https://github.com/microsoft/AzureDataRetrievalAugmentedGenerationSamples/tree/main/C%23/CosmosDB-MongoDBvCore) |
| 109 | +- [Python notebook tutorial - Azure product chatbot](https://github.com/microsoft/AzureDataRetrievalAugmentedGenerationSamples/tree/main/Python/CosmosDB-MongoDB-vCore) |
| 110 | + |
| 111 | +## Azure Cosmos DB for PostgreSQL |
| 112 | + |
| 113 | +You can employ RAG by utilizing native vector search within Azure Cosmos DB for PostgreSQL. This strategy provides a seamless integration of your AI-driven applications, including the ones developed using Azure OpenAI embeddings, with your data housed in Azure Cosmos DB. By taking advantage of vector search, you can effectively store, index, and execute queries on high-dimensional vector data directly within Azure Cosmos DB for PostgreSQL along with the rest of your data. |
| 114 | + |
| 115 | +### Code samples |
| 116 | + |
| 117 | +- Python: [Python notebook tutorial - food review chatbot](https://github.com/microsoft/AzureDataRetrievalAugmentedGenerationSamples/tree/main/Python/CosmosDB-PostgreSQL_CognitiveSearch) |
| 118 | + |
| 119 | +## Next steps |
| 120 | + |
| 121 | +- [Vector search with Azure Cognitive Search](../search/vector-search-overview.md) |
| 122 | +- [Vector search with Azure Cosmos DB for MongoDB vCore(mongodb/vcore/vector-search.md) |
| 123 | +- [Vector search with Azure Cosmos DB PostgreSQL](postgresql/howto-use-pgvector.md) |
0 commit comments