Skip to content

Commit 501df48

Browse files
Merge pull request #1041 from aahill/mongo-db
Mongo db for On Your Data
2 parents 3a411d0 + f154139 commit 501df48

File tree

7 files changed

+264
-3
lines changed

7 files changed

+264
-3
lines changed

articles/ai-services/openai/concepts/use-your-data.md

Lines changed: 52 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ ms.service: azure-ai-openai
88
ms.topic: quickstart
99
author: aahill
1010
ms.author: aahi
11-
ms.date: 04/08/2024
11+
ms.date: 10/25/2024
1212
recommendations: false
1313
ms.custom: references_regions
1414
---
@@ -26,14 +26,14 @@ Azure OpenAI On Your Data enables you to run advanced AI models such as GPT-35-T
2626
:::image type="content" source="../media/use-your-data/workflow-diagram.png" alt-text="A diagram showing an example workflow.":::
2727

2828
Typically, the development process you'd use with Azure OpenAI On Your Data is:
29-
1. **Ingest**: Upload files using either Azure OpenAI Studio or the ingestion API. This enables your data to be cracked, chunked and embedded into an Azure AI Search instance that can be used by Azure Open AI models. If you have an existing [supported data source](#supported-data-sources), you can also connect it directly.
29+
1. **Ingest**: Upload files using either Azure OpenAI Studio or the ingestion API. This enables your data to be cracked, chunked and embedded into an Azure AI Search instance that can be used by Azure OpenAI models. If you have an existing [supported data source](#supported-data-sources), you can also connect it directly.
3030

3131
1. **Develop**: After trying Azure OpenAI On Your Data, begin developing your application using the available REST API and SDKs, which are available in several languages. It will create prompts and search intents to pass to the Azure OpenAI service.
3232

3333
1. **Inference**: After your application is deployed in your preferred environment, it will send prompts to Azure OpenAI, which will perform several steps before returning a response:
3434
1. **Intent generation**: The service will determine the intent of the user's prompt to determine a proper response.
3535

36-
1. **Retrieval**: The service retrieves relevant chunks of available data from the connected data source by querying it. For example by using a semantic or vector search. [Parameters](#runtime-parameters) such as strictness and number of documents to retreive are utilized to influence the retrieval.
36+
1. **Retrieval**: The service retrieves relevant chunks of available data from the connected data source by querying it. For example by using a semantic or vector search. [Parameters](#runtime-parameters) such as strictness and number of documents to retrieve are utilized to influence the retrieval.
3737

3838
1. **Filtration and reranking**: Search results from the retrieval step are improved by ranking and filtering data to refine relevance.
3939

@@ -307,6 +307,55 @@ Mapping these fields correctly helps ensure the model has better response and ci
307307

308308
Along with using Elasticsearch databases in Azure OpenAI Studio, you can also use your Elasticsearch database using the [API](../references/elasticsearch.md).
309309

310+
# [MongoDB Atlas (preview)](#tab/mongo-db-atlas)
311+
312+
You can connect your MongoDB Atlas vector index with Azure OpenAI On Your Data for inferencing. You can use it through the Azure AI Studio, API and SDK.
313+
314+
### Prerequisites
315+
316+
* A [MongoDB Atlas account](https://account.mongodb.com/account/register)
317+
* An [Azure OpenAI ada002 embedding model](./models.md#embeddings)
318+
* To achieve good retrieval quality, make sure your vector index is created with Azure OpenAI ada002 embedding model.
319+
320+
We recommend using one of the following models for MongoDB Atlas
321+
* gpt-4 (0613)
322+
* gpt-4 (turbo-2024-04-09)
323+
* gpt-4o (2024-05-13)
324+
* gpt-35-turbo (1106)
325+
326+
### Configuration
327+
328+
Only public network access is supported. Please make sure the database allows public access
329+
:::image type="content" source="../media/use-your-data/mongo-db-network-access.png" alt-text="A screenshot showing the network access screen for Mongo DB.":::
330+
331+
### Data preparation
332+
333+
If you want to create a new vector search index with your documents, you can use the [available script on GitHub](https://github.com/microsoft/sample-app-aoai-chatGPT/blob/rawan/mongodbdataprep/scripts/mongo_vector_db_data_preparation.py) to prepare your data for use with Azure OpenAI On Your Data.
334+
335+
### Connection to MongoDB account
336+
337+
To add your data source, you first need to create a connection to MongoDB Atlas. This connection includes information such as authentication (username and password). Enter the endpoint of your MongoDB Atlas connection string using the following format: `mongodb+srv://{user_name}:{password}@{endpoint}/?appName={application_name}`. See the [MongoDB documentation](https://aka.ms/mongodb-connection-string) for more information about connection string methods.
338+
339+
:::image type="content" source="../media/use-your-data/mongo-db-atlas-connection.png" alt-text="A screenshot showing the connection screen for Mongo DB." lightbox="../media/use-your-data/mongo-db-atlas-connection.png":::
340+
341+
### Source index
342+
343+
Once you have created a connection or chosen an existing connection, you can enter the information to connect to a specific vector index within this connected account. You need to input the name of your database, collection and vector index. Make sure you have entered the information correctly to successfully build the connection.
344+
345+
:::image type="content" source="../media/use-your-data/mongo-db-atlas-source-index.png" alt-text="A screenshot showing the field required information for Mongo DB Atlas." lightbox="../media/use-your-data/mongo-db-atlas-source-index.png":::
346+
347+
To use MongoDB Atlas, you'll need an Azure OpenAI ada002 embedding model. This model will be created for you if you don't already have one, which will incur [usage](https://go.microsoft.com/fwlink/?linkid=2264246) on your account.
348+
349+
### Index field mapping
350+
351+
When you add your MongoDB Atlas data source, you can specify data fields to properly map your data for retrieval.
352+
353+
* Content data (required): This is the main text content of each document. For multiple fields, separate the values with commas, with no spaces.
354+
* Vector field (required): The field name in your MongoDB Atlas search index that contains the vectors.
355+
* File name/title/URL: Used to display more information when a document is referenced in the chat.
356+
357+
:::image type="content" source="../media/use-your-data/mongo-db-atlas-field-mapping.png" alt-text="A screenshot showing the field mapping options for Mongo DB Atlas." lightbox="../media/use-your-data/mongo-db-atlas-field-mapping.png":::
358+
310359
---
311360

312361
## Deploy to a copilot (preview), Teams app (preview), or web app
97.7 KB
Loading
109 KB
Loading
154 KB
Loading
48.7 KB
Loading
Lines changed: 210 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,210 @@
1+
---
2+
title: Azure OpenAI on your Mongo DB Atlas data Python & REST API reference
3+
titleSuffix: Azure OpenAI
4+
description: Learn how to use Azure OpenAI on your Mongo DB Atlas data with Python & REST API.
5+
manager: nitinme
6+
ms.service: azure-ai-openai
7+
ms.topic: conceptual
8+
ms.date: 10/25/2024
9+
author: aahill
10+
ms.author: aahi
11+
recommendations: false
12+
ms.custom: devx-track-python
13+
---
14+
15+
# Data source - Mongo DB Atlas
16+
17+
The configurable options of Mongo DB Atlas when using Azure OpenAI On Your Data. This data source is supported starting in API version `2024-08-01`.
18+
19+
20+
|Name | Type | Required | Description |
21+
|--- | --- | --- | --- |
22+
|`parameters`| [Parameters](#parameters)| True| The parameters to use when configuring Mongo DB Atlas.|
23+
| `type`| string| True | Must be `mongo_db`. |
24+
25+
## Parameters
26+
27+
|Name | Type | Required | Description |
28+
|--- | --- | --- | --- |
29+
| `authentication` | object | True | The [authentication options](#authentication) for Azure OpenAI On Your Data when using a username and a password. |
30+
| `app_name` | string | True | The name of the Mongo DB Atlas Application. |
31+
| `collection_name` | string | True | The name of the Mongo DB Atlas Collection. |
32+
| `database_name` | string | True | The name of the Mongo DB Atlas database. |
33+
| `endpoint` | string | True | The name of the Mongo DB Atlas cluster endpoint. |
34+
| `embedding_dependency` | One of [DeploymentNameVectorizationSource](#deployment-name-vectorization-source), [EndpointVectorizationSource](#endpoint-vectorization-source) | True | The embedding dependency for vector search.|
35+
| `fields_mapping` | object | True | [Settings](#field-mapping-options) to control how fields are processed when using a configured Mongo DB Atlas resource. |
36+
| `index_name` | string | True | The name of the Mongo DB Atlas index.|
37+
| `top_n_documents` | integer | False | The configured top number of documents to feature for the configured query.|
38+
| `max_search_queries` | integer | False | The max number of rewritten queries should be sent to search provider for one user message. If not specified, the system will decide the number of queries to send.|
39+
| `allow_partial_result` | boolean | False | If specified as true, the system will allow partial search results to be used and the request fails if all the queries fail. If not specified, or specified as false, the request will fail if any search query fails.|
40+
| `in_scope` | boolean | False | Whether queries should be restricted to use indexed data. |
41+
| `strictness` | integer | False | The configured strictness of the search relevance filtering, from 1 to 5. The higher the strictness, the higher precision but lower recall of the answer. |
42+
| `include_contexts` | array | False | The included properties of the output context. If not specified, the default value is `citations` and `intent`. Valid properties are `all_retrieved_documents`, `citations` and `intent`. |
43+
44+
## Authentication
45+
46+
The authentication options for Azure OpenAI On Your Data when using a username and a password.
47+
48+
|Name | Type | Required | Description |
49+
|--- | --- | --- | --- |
50+
| `type` | string | True | Must be `username_and_password`. |
51+
| `username` | string | True | The username to use for authentication. |
52+
| `password` | string | True | The password to use for authentication. |
53+
54+
## Deployment name vectorization source
55+
56+
The details of the vectorization source, used by Azure OpenAI On Your Data when applying vector search. This vectorization source is based on an internal embeddings model deployment name in the same Azure OpenAI resource. This vectorization source enables you to use vector search without Azure OpenAI api-key and without Azure OpenAI public network access.
57+
58+
|Name | Type | Required | Description |
59+
|--- | --- | --- | --- |
60+
| `deployment_name`|string|True|The embedding model deployment name within the same Azure OpenAI resource. |
61+
| `type`|string|True| Must be `deployment_name`.|
62+
63+
## Endpoint vectorization source
64+
65+
The details of the vectorization source, used by Azure OpenAI On Your Data when applying vector search. This vectorization source is based on the Azure OpenAI embedding API endpoint.
66+
67+
|Name | Type | Required | Description |
68+
|--- | --- | --- | --- |
69+
| `endpoint`|string|True|Specifies the resource endpoint URL from which embeddings should be retrieved. It should be in the format of `https://{YOUR_RESOURCE_NAME}.openai.azure.com/openai/deployments/YOUR_DEPLOYMENT_NAME/embeddings`. The api-version query parameter isn't allowed.|
70+
| `authentication`| [ApiKeyAuthenticationOptions](#authentication)|True | Specifies the authentication options to use when retrieving embeddings from the specified endpoint.|
71+
| `type`|string|True| Must be `endpoint`.|
72+
73+
## Field mapping options
74+
75+
Optional settings to control how fields are processed when using a configured Mongo DB Atlas resource.
76+
77+
|Name | Type | Required | Description |
78+
|--- | --- | --- | --- |
79+
| `content_fields`|string[] |True| The names of index fields that should be treated as content. |
80+
| `vector_fields`|string[] |True| The names of fields that represent vector data. |
81+
| `title_field`|string |False | The name of the index field to use as a title. |
82+
| `url_field`|string |False | The name of the index field to use as a URL. |
83+
| `filepath_field`|string |False | The name of the index field to use as a filepath. |
84+
| `content_fields_separator` | string | False | The separator pattern that content fields should use.|
85+
86+
## Examples
87+
88+
89+
# [Python 1.x](#tab/python)
90+
91+
Install the latest pip packages `openai`, `azure-identity`.
92+
93+
```python
94+
import os
95+
from openai import AzureOpenAI
96+
from azure.identity import DefaultAzureCredential, get_bearer_token_provider
97+
98+
endpoint = os.environ.get("AzureOpenAIEndpoint")
99+
deployment = os.environ.get("ChatCompletionsDeploymentName")
100+
index_name = os.environ.get("IndexName")
101+
key = os.environ.get("Key")
102+
embedding_name = os.environ.get("EmbeddingName")
103+
embedding_type = os.environ.get("EmbeddingType")
104+
105+
# Additional variables for Mongo DB Atlas
106+
mongo_db_username = os.environ.get("MongoDBUsername")
107+
mongo_db_password = os.environ.get("MongoDBPassword")
108+
mongo_db_endpoint = os.environ.get("MongoDBEndpoint")
109+
mongo_db_app_name = os.environ.get("MongoDBAppName")
110+
mongo_db_database_name = os.environ.get("MongoDBName")
111+
mongo_db_collection = os.environ.get("MongoDBCollection")
112+
mongo_db_index = os.environ.get("MongoDBIndex")
113+
114+
token_provider = get_bearer_token_provider(
115+
DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default")
116+
117+
client = AzureOpenAI(
118+
azure_endpoint=endpoint,
119+
azure_ad_token_provider=token_provider,
120+
api_version="2024-05-01-preview",
121+
)
122+
123+
completion = client.chat.completions.create(
124+
model=deployment,
125+
messages=[
126+
{
127+
"role": "user",
128+
"content": "Who is DRI?",
129+
},
130+
],
131+
extra_body={
132+
"data_sources": [
133+
{
134+
"type": "mongo_db",
135+
"parameters": {
136+
"authentication": {
137+
"type": "username_and_password",
138+
"username": mongo_db_username,
139+
"password": mongo_db_password
140+
},
141+
"endpoint": mongo_db_endpoint,
142+
"app_name": mongo_db_app_name,
143+
"database_name": mongo_db_database_name,
144+
"collection_name": mongo_db_collection,
145+
"index_name": mongo_db_index,
146+
"embedding_dependency": {
147+
"type": embedding_type,
148+
"deployment_name": embedding_name
149+
},
150+
"fields_mapping": {
151+
"content_fields": [
152+
"content"
153+
],
154+
"vector_fields": [
155+
"contentvector"
156+
]
157+
}
158+
}
159+
}
160+
]
161+
}
162+
)
163+
164+
print(completion.model_dump_json(indent=2))
165+
166+
```
167+
168+
# [REST](#tab/rest)
169+
170+
```bash
171+
az rest --method POST \
172+
--uri $AzureOpenAIEndpoint/openai/deployments/$ChatCompletionsDeploymentName/chat/completions?api-version=2024-05-01-preview \
173+
--resource https://cognitiveservices.azure.com/ \
174+
--body \
175+
'
176+
{
177+
"data_sources": [
178+
{
179+
"type": "mongo_db",
180+
"parameters": {
181+
"authentication": {
182+
"type": "username_and_password",
183+
"username": "<username>",
184+
"password": "<password>"
185+
},
186+
"endpoint": "<endpoint_name>",
187+
"app_name": "<application name>",
188+
"database_name": "sampledb",
189+
"collection_name": "samplecollection",
190+
"index_name": "sampleindex",
191+
"embedding_dependency": {
192+
"type": "deployment_name",
193+
"deployment_name": "{embedding deployment name}"
194+
},
195+
"fields_mapping": {
196+
"content_fields": [
197+
"content"
198+
],
199+
"vector_fields": [
200+
"contentvector"
201+
]
202+
}
203+
}
204+
}
205+
]
206+
}
207+
'
208+
```
209+
210+
---

articles/ai-services/openai/toc.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -286,6 +286,8 @@ items:
286286
href: ./references/elasticsearch.md
287287
- name: Data source - Pinecone (preview)
288288
href: ./references/pinecone.md
289+
- name: Data source - Mongo DB (preview)
290+
href: ./references/mongo-db.md
289291
- name: Ingestion API (preview)
290292
href: /rest/api/azureopenai/ingestion-jobs?context=/azure/ai-services/openai/context/context
291293
- name: Azure Resource Manager/Bicep/Terraform

0 commit comments

Comments
 (0)