Skip to content

Commit 871421b

Browse files
committed
Mongo db reference
1 parent a837949 commit 871421b

File tree

4 files changed

+233
-1
lines changed

4 files changed

+233
-1
lines changed

articles/ai-services/openai/concepts/use-your-data.md

Lines changed: 22 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ ms.service: azure-ai-openai
88
ms.topic: quickstart
99
author: aahill
1010
ms.author: aahi
11-
ms.date: 04/08/2024
11+
ms.date: 10/25/2024
1212
recommendations: false
1313
ms.custom: references_regions
1414
---
@@ -307,6 +307,27 @@ Mapping these fields correctly helps ensure the model has better response and ci
307307

308308
Along with using Elasticsearch databases in Azure OpenAI Studio, you can also use your Elasticsearch database using the [API](../references/elasticsearch.md).
309309

310+
# [MongoDB Atlas (preview)](#tab/elasticsearch)
311+
312+
### Prerequisites
313+
314+
* A [MongoDB Atlas account](https://account.mongodb.com/account/register)
315+
316+
We recommend using one of the following models for MongoDB Atlas
317+
* gpt-4 (0613)
318+
* gpt-4 (turbo-2024-04-09)
319+
* gpt-4o (2024-05-13)
320+
* gpt-35-turbo (1106)
321+
322+
### Configuration
323+
324+
Only public network access is supported. Please make sure the database allows public access
325+
:::image type="content" source="../media/use-your-data/mongo-db-network-access.png" alt-text="A screenshot showing the network access screen for Mongo DB":::
326+
327+
### Data preparation
328+
329+
Use the [available script on Github](https://github.com/microsoft/sample-app-aoai-chatGPT/blob/rawan/mongodbdataprep/scripts/mongo_vector_db_data_preparation.py) to prepare your Data for use with Azure OpenAI On Your Data.
330+
310331
---
311332

312333
## Deploy to a copilot (preview), Teams app (preview), or web app
48.7 KB
Loading
Lines changed: 209 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,209 @@
1+
---
2+
title: Azure OpenAI on your Mongo DB Atlas data Python & REST API reference
3+
titleSuffix: Azure OpenAI
4+
description: Learn how to use Azure OpenAI on your Mongo DB Atlas data with Python & REST API.
5+
manager: nitinme
6+
ms.service: azure-ai-openai
7+
ms.topic: conceptual
8+
ms.date: 10/25/2024
9+
author: aahill
10+
ms.author: aahi
11+
recommendations: false
12+
ms.custom: devx-track-python
13+
---
14+
15+
# Data source - Mongo DB Atlas
16+
17+
The configurable options of Mongo DB Atlas when using Azure OpenAI On Your Data. This data source is supported starting in API version `2024-08-01`.
18+
19+
20+
|Name | Type | Required | Description |
21+
|--- | --- | --- | --- |
22+
|`parameters`| [Parameters](#parameters)| True| The parameters to use when configuring Mongo DB Atlas.|
23+
| `type`| string| True | Must be `mongo_db`. |
24+
25+
## Parameters
26+
27+
|Name | Type | Required | Description |
28+
|--- | --- | --- | --- |
29+
| `authentication` | object | True | The [authentication options](#authentication) for Azure OpenAI On Your Data when using a username and a password. |
30+
| `app_name` | string | True | The name of the Mongo DB Atlas Application. |
31+
| `collection_name` | string | True | The name of the Mongo DB Atlas Collection. |
32+
| `database_name` | string | True | The name of the Mongo DB Atlas database. |
33+
| `endpoint` | string | True | The name of the Mongo DB Atlas cluster endpoint. |
34+
| `embedding_dependency` | One of [DeploymentNameVectorizationSource](#deployment-name-vectorization-source), [EndpointVectorizationSource](#endpoint-vectorization-source) | True | The embedding dependency for vector search.|
35+
| `fields_mapping` | object | True | [Settings](#field-mapping-options) to control how fields are processed when using a configured Mongo DB Atlas resource. |
36+
| `index_name` | string | True | The name of the Mongo DB Atlas index.|
37+
| `top_n_documents` | integer | False | The configured top number of documents to feature for the configured query.|
38+
| `max_search_queries` | integer | False | The max number of rewritten queries should be send to search provider for one user message. If not specified, the system will decide the number of queries to send.|
39+
| `allow_partial_result` | boolean | False | If specified as true, the system will allow partial search results to be used and the request fails if all the queries fail. If not specified, or specified as false, the request will fail if any search query fails.|
40+
| `in_scope` | boolean | False | Whether queries should be restricted to use indexed data. |
41+
| `strictness` | integer | False | The configured strictness of the search relevance filtering, from 1 to 5. The higher the strictness, the higher precision but lower recall of the answer. |
42+
| `include_contexts` | array | False | The included properties of the output context. If not specified, the default value is `citations` and `intent`. Valid properties are `all_retrieved_documents`, `citations` and `intent`. |
43+
44+
## Authentication
45+
46+
The authentication options for Azure OpenAI On Your Data when using a username and a password.
47+
48+
|Name | Type | Required | Description |
49+
|--- | --- | --- | --- |
50+
| `username` | string | True | The username to use for authentication. |
51+
| `password` | string | True | The password to use for authentication. |
52+
53+
## Deployment name vectorization source
54+
55+
The details of the vectorization source, used by Azure OpenAI On Your Data when applying vector search. This vectorization source is based on an internal embeddings model deployment name in the same Azure OpenAI resource. This vectorization source enables you to use vector search without Azure OpenAI api-key and without Azure OpenAI public network access.
56+
57+
|Name | Type | Required | Description |
58+
|--- | --- | --- | --- |
59+
| `deployment_name`|string|True|The embedding model deployment name within the same Azure OpenAI resource. |
60+
| `type`|string|True| Must be `deployment_name`.|
61+
62+
## Endpoint vectorization source
63+
64+
The details of the vectorization source, used by Azure OpenAI On Your Data when applying vector search. This vectorization source is based on the Azure OpenAI embedding API endpoint.
65+
66+
|Name | Type | Required | Description |
67+
|--- | --- | --- | --- |
68+
| `endpoint`|string|True|Specifies the resource endpoint URL from which embeddings should be retrieved. It should be in the format of `https://{YOUR_RESOURCE_NAME}.openai.azure.com/openai/deployments/YOUR_DEPLOYMENT_NAME/embeddings`. The api-version query parameter isn't allowed.|
69+
| `authentication`| [ApiKeyAuthenticationOptions](#authentication)|True | Specifies the authentication options to use when retrieving embeddings from the specified endpoint.|
70+
| `type`|string|True| Must be `endpoint`.|
71+
72+
## Field mapping options
73+
74+
Optional settings to control how fields are processed when using a configured Mongo DB Atlas resource.
75+
76+
|Name | Type | Required | Description |
77+
|--- | --- | --- | --- |
78+
| `content_fields`|string[] |True| The names of index fields that should be treated as content. |
79+
| `vector_fields`|string[] |True| The names of fields that represent vector data. |
80+
| `title_field`|string |False | The name of the index field to use as a title. |
81+
| `url_field`|string |False | The name of the index field to use as a URL. |
82+
| `filepath_field`|string |False | The name of the index field to use as a filepath. |
83+
| `content_fields_separator` | string | False | The separator pattern that content fields should use.|
84+
85+
## Examples
86+
87+
88+
# [Python 1.x](#tab/python)
89+
90+
Install the latest pip packages `openai`, `azure-identity`.
91+
92+
```python
93+
import os
94+
from openai import AzureOpenAI
95+
from azure.identity import DefaultAzureCredential, get_bearer_token_provider
96+
97+
endpoint = os.environ.get("AzureOpenAIEndpoint")
98+
deployment = os.environ.get("ChatCompletionsDeploymentName")
99+
index_name = os.environ.get("IndexName")
100+
key = os.environ.get("Key")
101+
embedding_name = os.environ.get("EmbeddingName")
102+
embedding_type = os.environ.get("EmbeddingType")
103+
104+
# Additional variables for Mongo DB Atlas
105+
mongo_db_username = os.environ.get("MongoDBUsername")
106+
mongo_db_password = os.environ.get("MongoDBPassword")
107+
mongo_db_endpoint = os.environ.get("MongoDBEndpoint")
108+
mongo_db_app_name = os.environ.get("MongoDBAppName")
109+
mongo_db_database_name = os.environ.get("MongoDBName")
110+
mongo_db_collection = os.environ.get("MongoDBCollection")
111+
mongo_db_index = os.environ.get("MongoDBIndex")
112+
113+
token_provider = get_bearer_token_provider(
114+
DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default")
115+
116+
client = AzureOpenAI(
117+
azure_endpoint=endpoint,
118+
azure_ad_token_provider=token_provider,
119+
api_version="2024-05-01-preview",
120+
)
121+
122+
completion = client.chat.completions.create(
123+
model=deployment,
124+
messages=[
125+
{
126+
"role": "user",
127+
"content": "Who is DRI?",
128+
},
129+
],
130+
extra_body={
131+
"data_sources": [
132+
{
133+
"type": "mongo_db",
134+
"parameters": {
135+
"authentication": {
136+
"type": "username_and_password",
137+
"username": mongo_db_username,
138+
"password": mongo_db_password
139+
},
140+
"endpoint": mongo_db_endpoint,
141+
"app_name": mongo_db_app_name,
142+
"database_name": mongo_db_database_name,
143+
"collection_name": mongo_db_collection,
144+
"index_name": mongo_db_index,
145+
"embedding_dependency": {
146+
"type": embedding_type,
147+
"deployment_name": embedding_name
148+
},
149+
"fields_mapping": {
150+
"content_fields": [
151+
"content"
152+
],
153+
"vector_fields": [
154+
"contentvector"
155+
]
156+
}
157+
}
158+
}
159+
]
160+
}
161+
)
162+
163+
print(completion.model_dump_json(indent=2))
164+
165+
```
166+
167+
# [REST](#tab/rest)
168+
169+
```bash
170+
az rest --method POST \
171+
--uri $AzureOpenAIEndpoint/openai/deployments/$ChatCompletionsDeploymentName/chat/completions?api-version=2024-05-01-preview \
172+
--resource https://cognitiveservices.azure.com/ \
173+
--body \
174+
'
175+
{
176+
"data_sources": [
177+
{
178+
"type": "mongo_db",
179+
"parameters": {
180+
"authentication": {
181+
"type": "username_and_password",
182+
"username": "<username>",
183+
"password": "<password>"
184+
},
185+
"endpoint": "<endpoint_name>",
186+
"app_name": "<application name>",
187+
"database_name": "sampledb",
188+
"collection_name": "samplecollection",
189+
"index_name": "sampleindex",
190+
"embedding_dependency": {
191+
"type": "deployment_name",
192+
"deployment_name": "{embedding deployment name}"
193+
},
194+
"fields_mapping": {
195+
"content_fields": [
196+
"content"
197+
],
198+
"vector_fields": [
199+
"contentvector"
200+
]
201+
}
202+
}
203+
}
204+
]
205+
}
206+
'
207+
```
208+
209+
---

articles/ai-services/openai/toc.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -286,6 +286,8 @@ items:
286286
href: ./references/elasticsearch.md
287287
- name: Data source - Pinecone (preview)
288288
href: ./references/pinecone.md
289+
- name: Data source - Mongo DB (preview)
290+
href: ./references/mongo-db.md
289291
- name: Ingestion API (preview)
290292
href: /rest/api/azureopenai/ingestion-jobs?context=/azure/ai-services/openai/context/context
291293
- name: Azure Resource Manager/Bicep/Terraform

0 commit comments

Comments
 (0)