Skip to content

Commit 3309758

Browse files
authored
Merge pull request #188508 from HeidiSteen/heidist-fresh
[azure search] Cosmos DB topic updates
2 parents 77769e0 + 32db9d0 commit 3309758

9 files changed

+688
-481
lines changed

articles/search/TOC.yml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -355,8 +355,10 @@
355355
href: search-howto-index-changed-deleted-blobs.md
356356
- name: Azure Cosmos DB
357357
items:
358-
- name: SQL and MongoDB APIs
358+
- name: SQL API
359359
href: search-howto-index-cosmosdb.md
360+
- name: MongoDB API
361+
href: search-howto-index-cosmosdb-mongodb.md
360362
- name: Gremlin API
361363
href: search-howto-index-cosmosdb-gremlin.md
362364
- name: Azure DB for MySQL

articles/search/search-howto-index-azure-data-lake-storage.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -127,7 +127,7 @@ In a [search index](search-what-is-an-index.md), add fields to accept the conten
127127

128128
+ A custom metadata property that you add to blobs. This option requires that your blob upload process adds that metadata property to all blobs. Since the key is a required property, any blobs that are missing a value will fail to be indexed. If you use a custom metadata property as a key, avoid making changes to that property. Indexers will add duplicate documents for the same blob if the key property changes.
129129

130-
Metadata properties often include characters, such as `/` and `-`, that are invalid for document keys. Because the indexer has a "base64EncodeKeys" property (true by default), it automatically encodes the metadata property, with no configuration or field mapping required.
130+
Metadata properties often include characters, such as `/` and `-`, that are invalid for document keys. Because the indexer has a "base64EncodeKeys" property (true by default), it automatically encodes the metadata property, with no configuration or field mapping required.
131131

132132
1. Add a "content" field to store extracted text from each file through the blob's "content" property. You aren't required to use this name, but doing so lets you take advantage of implicit field mappings.
133133

articles/search/search-howto-index-cosmosdb-gremlin.md

Lines changed: 140 additions & 113 deletions
Large diffs are not rendered by default.
Lines changed: 236 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,236 @@
1+
---
2+
title: Azure Cosmos DB MongoDB indexer
3+
titleSuffix: Azure Cognitive Search
4+
description: Set up a search indexer to index data stored in Azure Cosmos DB for full text search in Azure Cognitive Search. This article explains how index data using the MongoDB API protocol.
5+
6+
author: mgottein
7+
ms.author: magottei
8+
ms.service: cognitive-search
9+
ms.topic: how-to
10+
ms.date: 02/15/2022
11+
---
12+
13+
# Index data from Azure Cosmos DB using the MongoDB API
14+
15+
> [!IMPORTANT]
16+
> MongoDB API support is currently in public preview under [supplemental Terms of Use](https://azure.microsoft.com/support/legal/preview-supplemental-terms/). Currently, there is no SDK support.
17+
18+
This article shows you how to configure an Azure Cosmos DB [indexer](search-indexer-overview.md) to extract content and make it searchable in Azure Cognitive Search. This workflow creates an Azure Cognitive Search index and loads it with existing text extracted from Azure Cosmos DB using the [MongoDB API](../cosmos-db/choose-api.md#api-for-mongodb).
19+
20+
Because terminology can be confusing, it's worth noting that [Azure Cosmos DB indexing](../cosmos-db/index-overview.md) and [Azure Cognitive Search indexing](search-what-is-an-index.md) are different operations. Indexing in Cognitive Search creates and loads a search index on your search service.
21+
22+
Although Cosmos DB indexing is easiest with the [Import data wizard](search-import-data-portal.md), this article uses the REST APIs to explain concepts and steps.
23+
24+
## Prerequisites
25+
26+
+ [Register for the preview](https://aka.ms/azure-cognitive-search/indexer-preview) to provide feedback and get help with any issues you encounter.
27+
28+
+ An [Azure Cosmos DB account, database, collection, and documents](../cosmos-db/sql/create-cosmosdb-resources-portal.md). Use the same region for both Cognitive Search and Cosmos DB for lower latency and to avoid bandwidth charges.
29+
30+
+ An [automatic indexing policy](../cosmos-db/index-policy.md) on the Cosmos DB collection, set to [Consistent](../cosmos-db/index-policy.md#indexing-mode). This is the default configuration. Lazy indexing isn't recommended and may result in missing data.
31+
32+
Unfamiliar with indexers? Start with [**Create an indexer**](search-howto-create-indexers.md) for more background.
33+
34+
## Define the data source
35+
36+
The data source definition specifies the data to index, credentials, and policies for identifying changes in the data. A data source is defined as an independent resource so that it can be used by multiple indexers.
37+
38+
For this call, specify a [preview REST API version](search-api-preview.md) (2020-06-30-Preview or 2021-04-30-Preview) to create a data source that connects using MongoDB API.
39+
40+
1. [Create or update a data source](/rest/api/searchservice/preview-api/create-or-update-data-source) to set its definition:
41+
42+
```http
43+
POST https://[service name].search.windows.net/datasources?api-version=2021-04-30-Preview
44+
Content-Type: application/json
45+
api-key: [Search service admin key]
46+
{
47+
"name": "[my-cosmosdb-mongodb-ds]",
48+
"type": "cosmosdb",
49+
"credentials": {
50+
"connectionString": "AccountEndpoint=https://[cosmos-account-name].documents.azure.com;AccountKey=[cosmos-account-key];Database=[cosmos-database-name];ApiKind=MongoDb;"
51+
},
52+
"container": {
53+
"name": "[cosmos-db-collection]",
54+
"query": null
55+
},
56+
"dataChangeDetectionPolicy": {
57+
"@odata.type": "#Microsoft.Azure.Search.HighWaterMarkChangeDetectionPolicy",
58+
"highWaterMarkColumnName": "_ts"
59+
},
60+
"dataDeletionDetectionPolicy": null,
61+
"encryptionKey": null,
62+
"identity": null
63+
}
64+
```
65+
66+
1. Set "type" to `"cosmosdb"` (required).
67+
68+
1. Set "credentials" to a connection string. The next section describes the supported formats.
69+
70+
1. Set "container" to the collection. The "name" property is required and it specifies the ID of the database collection to be indexed. For the MongoDB API, "query" isn't supported.
71+
72+
1. [Set "dataChangeDetectionPolicy"](#DataChangeDetectionPolicy) if data is volatile and you want the indexer to pick up just the new and updated items on subsequent runs.
73+
74+
1. [Set "dataDeletionDetectionPolicy"](#DataDeletionDetectionPolicy) if you want to remove search documents from a search index when the source item is deleted.
75+
76+
<a name="credentials"></a>
77+
78+
### Supported credentials and connection strings
79+
80+
Indexers can connect to a collection using the following connections. For connections that target the [MongoDB API](../cosmos-db/mongodb/mongodb-introduction.md), be sure to include "ApiKind" in the connection string.
81+
82+
Avoid port numbers in the endpoint URL. If you include the port number, the connection will fail.
83+
84+
| Full access connection string |
85+
|-----------------------------------------------|
86+
|`{ "connectionString" : "AccountEndpoint=https://<Cosmos DB account name>.documents.azure.com;AccountKey=<Cosmos DB auth key>;Database=<Cosmos DB database id>;ApiKind=MongoDb" }` |
87+
| You can get the connection string from the Cosmos DB account page in Azure portal by selecting **Keys** in the left navigation pane. Make sure to select a full connection string and not just a key. |
88+
89+
| Managed identity connection string |
90+
|------------------------------------|
91+
|`{ "connectionString" : "ResourceId=/subscriptions/<your subscription ID>/resourceGroups/<your resource group name>/providers/Microsoft.DocumentDB/databaseAccounts/<your cosmos db account name>/;(ApiKind=[api-kind];)" }`|
92+
|This connection string doesn't require an account key, but you must have previously configured a search service to [connect using a managed identity](search-howto-managed-identities-data-sources.md) and created a role assignment that grants **Cosmos DB Account Reader Role** permissions. See [Setting up an indexer connection to a Cosmos DB database using a managed identity](search-howto-managed-identities-cosmos-db.md) for more information. |
93+
94+
## Add search fields to an index
95+
96+
In a [search index](search-what-is-an-index.md), add fields to accept the source JSON documents or the output of your custom query projection. Ensure that the search index schema is compatible with source data. For content in Cosmos DB, your search index schema should correspond to the [Azure Cosmos DB items](../cosmos-db/account-databases-containers-items.md#azure-cosmos-items) in your data source.
97+
98+
1. [Create or update an index](/rest/api/searchservice/create-index) to define search fields that will store data:
99+
100+
```http
101+
POST https://[service name].search.windows.net/indexes?api-version=2020-06-30
102+
Content-Type: application/json
103+
api-key: [Search service admin key]
104+
105+
{
106+
"name": "mysearchindex",
107+
"fields": [{
108+
"name": "id",
109+
"type": "Edm.String",
110+
"key": true,
111+
"retrievable": true,
112+
"searchable": false
113+
}, {
114+
"name": "description",
115+
"type": "Edm.String",
116+
"filterable": false,
117+
"searchable": true,
118+
"sortable": false,
119+
"facetable": false,
120+
"suggestions": true
121+
}]
122+
}
123+
```
124+
125+
1. Create a document key field ("key": true). For MongoDB collections, Azure Cognitive Search automatically renames the `_id` property to `id` because field names can’t start with an underscore character. If `_id` contains characters that are invalid for search document keys, the `id` values are Base64 encoded.
126+
127+
1. Create additional fields for more searchable content. See [Create an index](search-how-to-create-search-index.md) for details.
128+
129+
### Mapping between JSON Data Types and Azure Cognitive Search Data Types
130+
131+
| JSON data type | Compatible target index field types |
132+
| --- | --- |
133+
| Bool |Edm.Boolean, Edm.String |
134+
| Numbers that look like integers |Edm.Int32, Edm.Int64, Edm.String |
135+
| Numbers that look like floating-points |Edm.Double, Edm.String |
136+
| String |Edm.String |
137+
| Arrays of primitive types such as ["a", "b", "c"] |Collection(Edm.String) |
138+
| Strings that look like dates |Edm.DateTimeOffset, Edm.String |
139+
| GeoJSON objects such as { "type": "Point", "coordinates": [long, lat] } |Edm.GeographyPoint |
140+
| Other JSON objects |N/A |
141+
142+
## Configure and run the Cosmos DB indexer
143+
144+
Indexer configuration specifies the inputs, parameters, and properties controlling run time behaviors.
145+
146+
1. [Create or update an indexer](/rest/api/searchservice/create-indexer) to use the predefined data source and search index.
147+
148+
```http
149+
POST https://[service name].search.windows.net/indexers?api-version=2020-06-30
150+
Content-Type: application/json
151+
api-key: [search service admin key]
152+
{
153+
"name" : "[my-cosmosdb-indexer]",
154+
"dataSourceName" : "[my-cosmosdb-mongodb-ds]",
155+
"targetIndexName" : "[my-search-index]",
156+
"disabled": null,
157+
"schedule": null,
158+
"parameters": {
159+
"batchSize": null,
160+
"maxFailedItems": 0,
161+
"maxFailedItemsPerBatch": 0,
162+
"base64EncodeKeys": false,
163+
"configuration": {}
164+
},
165+
"fieldMappings": [],
166+
"encryptionKey": null
167+
}
168+
```
169+
170+
1. [Specify field mappings](search-indexer-field-mappings.md) if there are differences in field name or type, or if you need multiple versions of a source field in the search index.
171+
172+
1. See [Create an indexer](search-howto-create-indexers.md) for more information about other properties.
173+
174+
<a name="DataChangeDetectionPolicy"></a>
175+
176+
## Indexing changed documents
177+
178+
The purpose of a data change detection policy is to efficiently identify changed data items. Currently, the only supported policy is the [`HighWaterMarkChangeDetectionPolicy`](/dotnet/api/azure.search.documents.indexes.models.highwatermarkchangedetectionpolicy) using the `_ts` (timestamp) property provided by Azure Cosmos DB, which is specified in the data source definition as follows:
179+
180+
```http
181+
"dataChangeDetectionPolicy": {
182+
"@odata.type": "#Microsoft.Azure.Search.HighWaterMarkChangeDetectionPolicy",
183+
" highWaterMarkColumnName": "_ts"
184+
},
185+
```
186+
187+
Using this policy is highly recommended to ensure good indexer performance.
188+
189+
<a name="DataDeletionDetectionPolicy"></a>
190+
191+
## Indexing deleted documents
192+
193+
When rows are deleted from the collection, you normally want to delete those rows from the search index as well. The purpose of a data deletion detection policy is to efficiently identify deleted data items. Currently, the only supported policy is the `Soft Delete` policy (deletion is marked with a flag of some sort), which is specified in the data source definition as follows:
194+
195+
```http
196+
"dataDeletionDetectionPolicy"": {
197+
"@odata.type" : "#Microsoft.Azure.Search.SoftDeleteColumnDeletionDetectionPolicy",
198+
"softDeleteColumnName" : "the property that specifies whether a document was deleted",
199+
"softDeleteMarkerValue" : "the value that identifies a document as deleted"
200+
}
201+
```
202+
203+
If you're using a custom query, make sure that the property referenced by `softDeleteColumnName` is projected by the query.
204+
205+
The following example creates a data source with a soft-deletion policy:
206+
207+
```http
208+
POST https://[service name].search.windows.net/datasources?api-version=2020-06-30
209+
Content-Type: application/json
210+
api-key: [Search service admin key]
211+
212+
{
213+
"name": ["my-cosmosdb-mongodb-ds]",
214+
"type": "cosmosdb",
215+
"credentials": {
216+
"connectionString": "AccountEndpoint=https://[cosmos-account-name].documents.azure.com;AccountKey=[cosmos-account-key];Database=[cosmos-database-name];ApiKind=MongoDB"
217+
},
218+
"container": { "name": "[my-cosmos-collection]" },
219+
"dataChangeDetectionPolicy": {
220+
"@odata.type": "#Microsoft.Azure.Search.HighWaterMarkChangeDetectionPolicy",
221+
"highWaterMarkColumnName": "_ts"
222+
},
223+
"dataDeletionDetectionPolicy": {
224+
"@odata.type": "#Microsoft.Azure.Search.SoftDeleteColumnDeletionDetectionPolicy",
225+
"softDeleteColumnName": "isDeleted",
226+
"softDeleteMarkerValue": "true"
227+
}
228+
}
229+
```
230+
231+
## Next steps
232+
233+
You can now control how you [run the indexer](search-howto-run-reset-indexers.md), [monitor status](search-howto-monitor-indexers.md), or [schedule indexer execution](search-howto-schedule-indexers.md). The following articles apply to indexers that pull content from Azure Cosmos DB:
234+
235+
+ [Set up an indexer connection to a Cosmos DB database using a managed identity](search-howto-managed-identities-cosmos-db.md)
236+
+ [Index large data sets](search-howto-large-index.md)

0 commit comments

Comments
 (0)