|
| 1 | +--- |
| 2 | +title: Azure Cosmos DB MongoDB indexer |
| 3 | +titleSuffix: Azure Cognitive Search |
| 4 | +description: Set up a search indexer to index data stored in Azure Cosmos DB for full text search in Azure Cognitive Search. This article explains how index data using the MongoDB API protocol. |
| 5 | + |
| 6 | +author: mgottein |
| 7 | +ms.author: magottei |
| 8 | +ms.service: cognitive-search |
| 9 | +ms.topic: how-to |
| 10 | +ms.date: 02/15/2022 |
| 11 | +--- |
| 12 | + |
| 13 | +# Index data from Azure Cosmos DB using the MongoDB API |
| 14 | + |
| 15 | +> [!IMPORTANT] |
| 16 | +> MongoDB API support is currently in public preview under [supplemental Terms of Use](https://azure.microsoft.com/support/legal/preview-supplemental-terms/). Currently, there is no SDK support. |
| 17 | +
|
| 18 | +This article shows you how to configure an Azure Cosmos DB [indexer](search-indexer-overview.md) to extract content and make it searchable in Azure Cognitive Search. This workflow creates an Azure Cognitive Search index and loads it with existing text extracted from Azure Cosmos DB using the [MongoDB API](../cosmos-db/choose-api.md#api-for-mongodb). |
| 19 | + |
| 20 | +Because terminology can be confusing, it's worth noting that [Azure Cosmos DB indexing](../cosmos-db/index-overview.md) and [Azure Cognitive Search indexing](search-what-is-an-index.md) are different operations. Indexing in Cognitive Search creates and loads a search index on your search service. |
| 21 | + |
| 22 | +Although Cosmos DB indexing is easiest with the [Import data wizard](search-import-data-portal.md), this article uses the REST APIs to explain concepts and steps. |
| 23 | + |
| 24 | +## Prerequisites |
| 25 | + |
| 26 | ++ [Register for the preview](https://aka.ms/azure-cognitive-search/indexer-preview) to provide feedback and get help with any issues you encounter. |
| 27 | + |
| 28 | ++ An [Azure Cosmos DB account, database, collection, and documents](../cosmos-db/sql/create-cosmosdb-resources-portal.md). Use the same region for both Cognitive Search and Cosmos DB for lower latency and to avoid bandwidth charges. |
| 29 | + |
| 30 | ++ An [automatic indexing policy](../cosmos-db/index-policy.md) on the Cosmos DB collection, set to [Consistent](../cosmos-db/index-policy.md#indexing-mode). This is the default configuration. Lazy indexing isn't recommended and may result in missing data. |
| 31 | + |
| 32 | +Unfamiliar with indexers? Start with [**Create an indexer**](search-howto-create-indexers.md) for more background. |
| 33 | + |
| 34 | +## Define the data source |
| 35 | + |
| 36 | +The data source definition specifies the data to index, credentials, and policies for identifying changes in the data. A data source is defined as an independent resource so that it can be used by multiple indexers. |
| 37 | + |
| 38 | +For this call, specify a [preview REST API version](search-api-preview.md) (2020-06-30-Preview or 2021-04-30-Preview) to create a data source that connects using MongoDB API. |
| 39 | + |
| 40 | +1. [Create or update a data source](/rest/api/searchservice/preview-api/create-or-update-data-source) to set its definition: |
| 41 | + |
| 42 | + ```http |
| 43 | + POST https://[service name].search.windows.net/datasources?api-version=2021-04-30-Preview |
| 44 | + Content-Type: application/json |
| 45 | + api-key: [Search service admin key] |
| 46 | + { |
| 47 | + "name": "[my-cosmosdb-mongodb-ds]", |
| 48 | + "type": "cosmosdb", |
| 49 | + "credentials": { |
| 50 | + "connectionString": "AccountEndpoint=https://[cosmos-account-name].documents.azure.com;AccountKey=[cosmos-account-key];Database=[cosmos-database-name];ApiKind=MongoDb;" |
| 51 | + }, |
| 52 | + "container": { |
| 53 | + "name": "[cosmos-db-collection]", |
| 54 | + "query": null |
| 55 | + }, |
| 56 | + "dataChangeDetectionPolicy": { |
| 57 | + "@odata.type": "#Microsoft.Azure.Search.HighWaterMarkChangeDetectionPolicy", |
| 58 | + "highWaterMarkColumnName": "_ts" |
| 59 | + }, |
| 60 | + "dataDeletionDetectionPolicy": null, |
| 61 | + "encryptionKey": null, |
| 62 | + "identity": null |
| 63 | + } |
| 64 | + ``` |
| 65 | +
|
| 66 | +1. Set "type" to `"cosmosdb"` (required). |
| 67 | +
|
| 68 | +1. Set "credentials" to a connection string. The next section describes the supported formats. |
| 69 | +
|
| 70 | +1. Set "container" to the collection. The "name" property is required and it specifies the ID of the database collection to be indexed. For the MongoDB API, "query" isn't supported. |
| 71 | +
|
| 72 | +1. [Set "dataChangeDetectionPolicy"](#DataChangeDetectionPolicy) if data is volatile and you want the indexer to pick up just the new and updated items on subsequent runs. |
| 73 | +
|
| 74 | +1. [Set "dataDeletionDetectionPolicy"](#DataDeletionDetectionPolicy) if you want to remove search documents from a search index when the source item is deleted. |
| 75 | +
|
| 76 | +<a name="credentials"></a> |
| 77 | +
|
| 78 | +### Supported credentials and connection strings |
| 79 | +
|
| 80 | +Indexers can connect to a collection using the following connections. For connections that target the [MongoDB API](../cosmos-db/mongodb/mongodb-introduction.md), be sure to include "ApiKind" in the connection string. |
| 81 | +
|
| 82 | +Avoid port numbers in the endpoint URL. If you include the port number, the connection will fail. |
| 83 | +
|
| 84 | +| Full access connection string | |
| 85 | +|-----------------------------------------------| |
| 86 | +|`{ "connectionString" : "AccountEndpoint=https://<Cosmos DB account name>.documents.azure.com;AccountKey=<Cosmos DB auth key>;Database=<Cosmos DB database id>;ApiKind=MongoDb" }` | |
| 87 | +| You can get the connection string from the Cosmos DB account page in Azure portal by selecting **Keys** in the left navigation pane. Make sure to select a full connection string and not just a key. | |
| 88 | +
|
| 89 | +| Managed identity connection string | |
| 90 | +|------------------------------------| |
| 91 | +|`{ "connectionString" : "ResourceId=/subscriptions/<your subscription ID>/resourceGroups/<your resource group name>/providers/Microsoft.DocumentDB/databaseAccounts/<your cosmos db account name>/;(ApiKind=[api-kind];)" }`| |
| 92 | +|This connection string doesn't require an account key, but you must have previously configured a search service to [connect using a managed identity](search-howto-managed-identities-data-sources.md) and created a role assignment that grants **Cosmos DB Account Reader Role** permissions. See [Setting up an indexer connection to a Cosmos DB database using a managed identity](search-howto-managed-identities-cosmos-db.md) for more information. | |
| 93 | +
|
| 94 | +## Add search fields to an index |
| 95 | +
|
| 96 | +In a [search index](search-what-is-an-index.md), add fields to accept the source JSON documents or the output of your custom query projection. Ensure that the search index schema is compatible with source data. For content in Cosmos DB, your search index schema should correspond to the [Azure Cosmos DB items](../cosmos-db/account-databases-containers-items.md#azure-cosmos-items) in your data source. |
| 97 | +
|
| 98 | +1. [Create or update an index](/rest/api/searchservice/create-index) to define search fields that will store data: |
| 99 | +
|
| 100 | + ```http |
| 101 | + POST https://[service name].search.windows.net/indexes?api-version=2020-06-30 |
| 102 | + Content-Type: application/json |
| 103 | + api-key: [Search service admin key] |
| 104 | + |
| 105 | + { |
| 106 | + "name": "mysearchindex", |
| 107 | + "fields": [{ |
| 108 | + "name": "id", |
| 109 | + "type": "Edm.String", |
| 110 | + "key": true, |
| 111 | + "retrievable": true, |
| 112 | + "searchable": false |
| 113 | + }, { |
| 114 | + "name": "description", |
| 115 | + "type": "Edm.String", |
| 116 | + "filterable": false, |
| 117 | + "searchable": true, |
| 118 | + "sortable": false, |
| 119 | + "facetable": false, |
| 120 | + "suggestions": true |
| 121 | + }] |
| 122 | + } |
| 123 | + ``` |
| 124 | +
|
| 125 | +1. Create a document key field ("key": true). For MongoDB collections, Azure Cognitive Search automatically renames the `_id` property to `id` because field names can’t start with an underscore character. If `_id` contains characters that are invalid for search document keys, the `id` values are Base64 encoded. |
| 126 | +
|
| 127 | +1. Create additional fields for more searchable content. See [Create an index](search-how-to-create-search-index.md) for details. |
| 128 | +
|
| 129 | +### Mapping between JSON Data Types and Azure Cognitive Search Data Types |
| 130 | +
|
| 131 | +| JSON data type | Compatible target index field types | |
| 132 | +| --- | --- | |
| 133 | +| Bool |Edm.Boolean, Edm.String | |
| 134 | +| Numbers that look like integers |Edm.Int32, Edm.Int64, Edm.String | |
| 135 | +| Numbers that look like floating-points |Edm.Double, Edm.String | |
| 136 | +| String |Edm.String | |
| 137 | +| Arrays of primitive types such as ["a", "b", "c"] |Collection(Edm.String) | |
| 138 | +| Strings that look like dates |Edm.DateTimeOffset, Edm.String | |
| 139 | +| GeoJSON objects such as { "type": "Point", "coordinates": [long, lat] } |Edm.GeographyPoint | |
| 140 | +| Other JSON objects |N/A | |
| 141 | +
|
| 142 | +## Configure and run the Cosmos DB indexer |
| 143 | +
|
| 144 | +Indexer configuration specifies the inputs, parameters, and properties controlling run time behaviors. |
| 145 | +
|
| 146 | +1. [Create or update an indexer](/rest/api/searchservice/create-indexer) to use the predefined data source and search index. |
| 147 | +
|
| 148 | + ```http |
| 149 | + POST https://[service name].search.windows.net/indexers?api-version=2020-06-30 |
| 150 | + Content-Type: application/json |
| 151 | + api-key: [search service admin key] |
| 152 | + { |
| 153 | + "name" : "[my-cosmosdb-indexer]", |
| 154 | + "dataSourceName" : "[my-cosmosdb-mongodb-ds]", |
| 155 | + "targetIndexName" : "[my-search-index]", |
| 156 | + "disabled": null, |
| 157 | + "schedule": null, |
| 158 | + "parameters": { |
| 159 | + "batchSize": null, |
| 160 | + "maxFailedItems": 0, |
| 161 | + "maxFailedItemsPerBatch": 0, |
| 162 | + "base64EncodeKeys": false, |
| 163 | + "configuration": {} |
| 164 | + }, |
| 165 | + "fieldMappings": [], |
| 166 | + "encryptionKey": null |
| 167 | + } |
| 168 | + ``` |
| 169 | +
|
| 170 | +1. [Specify field mappings](search-indexer-field-mappings.md) if there are differences in field name or type, or if you need multiple versions of a source field in the search index. |
| 171 | +
|
| 172 | +1. See [Create an indexer](search-howto-create-indexers.md) for more information about other properties. |
| 173 | +
|
| 174 | +<a name="DataChangeDetectionPolicy"></a> |
| 175 | +
|
| 176 | +## Indexing changed documents |
| 177 | +
|
| 178 | +The purpose of a data change detection policy is to efficiently identify changed data items. Currently, the only supported policy is the [`HighWaterMarkChangeDetectionPolicy`](/dotnet/api/azure.search.documents.indexes.models.highwatermarkchangedetectionpolicy) using the `_ts` (timestamp) property provided by Azure Cosmos DB, which is specified in the data source definition as follows: |
| 179 | +
|
| 180 | +```http |
| 181 | +"dataChangeDetectionPolicy": { |
| 182 | + "@odata.type": "#Microsoft.Azure.Search.HighWaterMarkChangeDetectionPolicy", |
| 183 | +" highWaterMarkColumnName": "_ts" |
| 184 | +}, |
| 185 | +``` |
| 186 | + |
| 187 | +Using this policy is highly recommended to ensure good indexer performance. |
| 188 | + |
| 189 | +<a name="DataDeletionDetectionPolicy"></a> |
| 190 | + |
| 191 | +## Indexing deleted documents |
| 192 | + |
| 193 | +When rows are deleted from the collection, you normally want to delete those rows from the search index as well. The purpose of a data deletion detection policy is to efficiently identify deleted data items. Currently, the only supported policy is the `Soft Delete` policy (deletion is marked with a flag of some sort), which is specified in the data source definition as follows: |
| 194 | + |
| 195 | +```http |
| 196 | +"dataDeletionDetectionPolicy"": { |
| 197 | + "@odata.type" : "#Microsoft.Azure.Search.SoftDeleteColumnDeletionDetectionPolicy", |
| 198 | + "softDeleteColumnName" : "the property that specifies whether a document was deleted", |
| 199 | + "softDeleteMarkerValue" : "the value that identifies a document as deleted" |
| 200 | +} |
| 201 | +``` |
| 202 | + |
| 203 | +If you're using a custom query, make sure that the property referenced by `softDeleteColumnName` is projected by the query. |
| 204 | + |
| 205 | +The following example creates a data source with a soft-deletion policy: |
| 206 | + |
| 207 | +```http |
| 208 | +POST https://[service name].search.windows.net/datasources?api-version=2020-06-30 |
| 209 | +Content-Type: application/json |
| 210 | +api-key: [Search service admin key] |
| 211 | +
|
| 212 | +{ |
| 213 | + "name": ["my-cosmosdb-mongodb-ds]", |
| 214 | + "type": "cosmosdb", |
| 215 | + "credentials": { |
| 216 | + "connectionString": "AccountEndpoint=https://[cosmos-account-name].documents.azure.com;AccountKey=[cosmos-account-key];Database=[cosmos-database-name];ApiKind=MongoDB" |
| 217 | + }, |
| 218 | + "container": { "name": "[my-cosmos-collection]" }, |
| 219 | + "dataChangeDetectionPolicy": { |
| 220 | + "@odata.type": "#Microsoft.Azure.Search.HighWaterMarkChangeDetectionPolicy", |
| 221 | + "highWaterMarkColumnName": "_ts" |
| 222 | + }, |
| 223 | + "dataDeletionDetectionPolicy": { |
| 224 | + "@odata.type": "#Microsoft.Azure.Search.SoftDeleteColumnDeletionDetectionPolicy", |
| 225 | + "softDeleteColumnName": "isDeleted", |
| 226 | + "softDeleteMarkerValue": "true" |
| 227 | + } |
| 228 | +} |
| 229 | +``` |
| 230 | + |
| 231 | +## Next steps |
| 232 | + |
| 233 | +You can now control how you [run the indexer](search-howto-run-reset-indexers.md), [monitor status](search-howto-monitor-indexers.md), or [schedule indexer execution](search-howto-schedule-indexers.md). The following articles apply to indexers that pull content from Azure Cosmos DB: |
| 234 | + |
| 235 | ++ [Set up an indexer connection to a Cosmos DB database using a managed identity](search-howto-managed-identities-cosmos-db.md) |
| 236 | ++ [Index large data sets](search-howto-large-index.md) |
0 commit comments