Skip to content

Commit 51bdaec

Browse files
committed
Cosmos DB updates for portal BYOE
1 parent 2806ce7 commit 51bdaec

File tree

4 files changed

+105
-23
lines changed

4 files changed

+105
-23
lines changed
20.3 KB
Loading

articles/search/search-get-started-portal-import-vectors.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ The **Import and vectorize data** wizard supports the following data sources:
4242
+ [Azure SQL Database](/azure/azure-sql/database/single-database-create-quickstart), [Azure SQL Managed Instance](/azure/azure-sql/managed-instance/instance-create-quickstart), and Azure SQL Server virtual machines.
4343

4444
> [!NOTE]
45-
> This quicktart provides steps for just those data sources that work with whole files: Azure Blob storage, ADLS Gen2, OneLake. For more information about using this wizard with other data soruces, see [Azure Table indexer](search-howto-indexing-azure-tables.md), [Cosmos DB for NoSQL indexer](search-howto-index-cosmosdb.md), and [Azuer SQL indexer](search-howto-connecting-azure-sql-database-to-azure-search-using-indexers.md).
45+
> This quicktart provides steps for just those data sources that work with whole files: Azure Blob storage, ADLS Gen2, OneLake. For more information about using this wizard with other data soruces, see [Azure Table indexer](search-howto-indexing-azure-tables.md), [Cosmos DB for NoSQL indexer](search-howto-index-cosmosdb.md), and [Azuer SQL indexer](search-how-to-index-sql-database.md).
4646
4747
### Supported embedding models
4848

@@ -230,7 +230,7 @@ The next step is to connect to a data source to use for the search index.
230230

231231
### [Azure Blob Storage](#tab/connect-data-storage)
232232

233-
1. On the **Set up your data connection** page, select **Azure Blob Storage**.
233+
1. On **Connect to your data**, select **Azure Blob Storage**.
234234

235235
1. Specify the Azure subscription.
236236

@@ -256,7 +256,7 @@ The next step is to connect to a data source to use for the search index.
256256

257257
### [ADLS Gen2](#tab/connect-data-adlsgen2)
258258

259-
1. On the **Set up your data connection** page, select **Azure Data Lake**.
259+
1. On **Connect to your data**, select **Azure Data Lake**.
260260

261261
1. Specify the Azure subscription.
262262

@@ -284,7 +284,7 @@ The next step is to connect to a data source to use for the search index.
284284

285285
Support for OneLake indexing is in preview. For more information about supported shortcuts and limitations, see ([OneLake indexing](search-how-to-index-onelake-files.md)).
286286

287-
1. On the **Set up your data connection** page, select **OneLake**.
287+
1. On **Connect to your data**, select **OneLake**.
288288

289289
1. Specify the type of connection:
290290

@@ -303,7 +303,7 @@ Support for OneLake indexing is in preview. For more information about supported
303303

304304
In this step, specify the embedding model for vectorizing chunked data.
305305

306-
Chunking is built-in and nonconfigurable. The effective settings are:
306+
Chunking is built in and nonconfigurable. The effective settings are:
307307

308308
```json
309309
"textSplitMode": "pages",

articles/search/search-how-to-index-sql-database.md

Lines changed: 14 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ ms.date: 11/20/2024
1717

1818
In this article, learn how to configure an [**indexer**](search-indexer-overview.md) that imports content from Azure SQL Database or an Azure SQL managed instance and makes it searchable in Azure AI Search.
1919

20-
This article supplements [**Create an indexer**](search-howto-create-indexers.md) with information that's specific to Azure SQL. It uses the Azure portal and REST APIs to demonstrate a three-part workflow common to all indexers: create a data source, create an index, create an indexer.
20+
This article supplements [**Create an indexer**](search-howto-create-indexers.md) with information that's specific to Azure SQL. It uses the Azure portal and REST APIs to demonstrate a three-part workflow common to all indexers: create a data source, create an index, create an indexer. Data extraction occurs when you submit the Create Indexer request.
2121

2222
This article also provides:
2323

@@ -42,7 +42,9 @@ To work through the examples in this article, you need the Azure portal or a [RE
4242

4343
## Try with sample data
4444

45-
[Download hotels-azure-sql.sql](hotels/hotel-sql/hotels-azure-sql.sql) from GitHub to create a table on Azure SQL Database that contains a subset of the sample hotels data set.
45+
Use these instructions to create a table in Azure SQL that you can use with an indexer on Azure AI Search. The portal approach, using either import data wizard, is the quickest way to create and load an index from a table in a SQL database.
46+
47+
1. [Download hotels-azure-sql.sql](https://github.com/Azure-Samples/azure-search-sample-data/tree/main/hotels/hotel-sql) from GitHub to create a table on Azure SQL Database that contains a subset of the sample hotels data set.
4648

4749
1. Sign in to the Azure portal and [create an Azure SQL database and database server](/azure/azure-sql/database/single-database-create-quickstart). Consider configuring both SQL Server authentication and Microsoft Entra ID authentication. If you don't have permissions to configure roles on Azure, you can use SQL authentication as a workaround.
4850

@@ -108,19 +110,25 @@ You can use either the **Import data** wizard or **Import and vectorize data** w
108110
109111
1. Specify the server name, database name, and table or view name.
110112
111-
The portal validates the connection. If the database is unavailable due to inactivity, navigate to the database server page and make sure database status is *online*.
113+
The portal validates the connection. If the database is paused due to inactivity, navigate to the database server page and make sure database status is *online*. You can run a query on any table to activate the database.
114+
115+
:::image type="content" source="media/search-how-to-index-sql-database/database-online.png" alt-text="Screenshot of the database status page in the Azure portal.":::
112116
113117
1. Specify an authentication method, either a SQL Server login defined during server setup, or a managed identity.
114118
115-
If you [configure Azure AI Search to use a managed identity](search-howto-managed-identities-data-sources.md), and you create a role assignment on the database server that grants **SQL Server Contributor** or **SQL Server DB Contributor** permissions to the identity, you can connect using Microsoft Entra ID and roles.
119+
If you [configure Azure AI Search to use a managed identity](search-howto-managed-identities-data-sources.md), and you create a role assignment on the database server that grants **SQL Server Contributor** or **SQL Server DB Contributor** permissions to the identity, your indexer can connect to Azure SQL using Microsoft Entra ID and roles.
116120
117121
1. For the **Import and vectorize data** wizard, you can specify options for change and deletion tracking.
118122
119123
+ Deletion tracking is based on [soft delete using custom metadata](#soft-delete-column-deletion-detection-policy).
120124
121125
+ Change tracking is based on [SQL Server integrated change tracking](#sql-integrated-change-tracking-policy) or [high water mark change tracking](#high-water-mark-change-detection-policy).
122126
123-
1. Continue with the remaining steps to complete the wizard. For more information, see [Quickstart: Import data wizard](search-get-started-portal.md) or [Quickstart: Import and vectorize data wizard](search-get-started-portal-import-vectors.md).
127+
1. Continue with the remaining steps to complete the wizard:
128+
129+
+ [Quickstart: Import data wizard](search-get-started-portal.md)
130+
131+
+ [Quickstart: Import and vectorize data wizard](search-get-started-portal-import-vectors.md)
124132
125133
## Use the REST APIs
126134
@@ -362,6 +370,7 @@ api-key: admin-key
362370
"container" : { "name" : "table name" },
363371
"dataChangeDetectionPolicy" : {
364372
"@odata.type" : "#Microsoft.Azure.Search.SqlIntegratedChangeTrackingPolicy"
373+
}
365374
}
366375
```
367376

articles/search/search-howto-index-cosmosdb.md

Lines changed: 86 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -10,14 +10,14 @@ ms.custom:
1010
- devx-track-dotnet
1111
- ignite-2023
1212
ms.topic: how-to
13-
ms.date: 06/18/2024
13+
ms.date: 11/20/2024
1414
---
1515

1616
# Index data from Azure Cosmos DB for NoSQL for queries in Azure AI Search
1717

1818
In this article, learn how to configure an [**indexer**](search-indexer-overview.md) that imports content from [Azure Cosmos DB for NoSQL](/azure/cosmos-db/nosql/) and makes it searchable in Azure AI Search.
1919

20-
This article supplements [**Create an indexer**](search-howto-create-indexers.md) with information that's specific to Cosmos DB. It uses the REST APIs to demonstrate a three-part workflow common to all indexers: create a data source, create an index, create an indexer. Data extraction occurs when you submit the Create Indexer request.
20+
This article supplements [**Create an indexer**](search-howto-create-indexers.md) with information that's specific to Cosmos DB. It uses the Azure portal and REST APIs to demonstrate a three-part workflow common to all indexers: create a data source, create an index, create an indexer. Data extraction occurs when you submit the Create Indexer request.
2121

2222
Because terminology can be confusing, it's worth noting that [Azure Cosmos DB indexing](/azure/cosmos-db/index-overview) and [Azure AI Search indexing](search-what-is-an-index.md) are different operations. Indexing in Azure AI Search creates and loads a search index on your search service.
2323

@@ -27,11 +27,73 @@ Because terminology can be confusing, it's worth noting that [Azure Cosmos DB in
2727

2828
+ An [automatic indexing policy](/azure/cosmos-db/index-policy) on the Azure Cosmos DB collection, set to [Consistent](/azure/cosmos-db/index-policy#indexing-mode). This is the default configuration. Lazy indexing isn't recommended and can result in missing data.
2929

30-
+ Read permissions. A "full access" connection string includes a key that grants access to the content, but if you're using Azure RBAC (Microsoft Entra ID), make sure the [search service managed identity](search-howto-managed-identities-data-sources.md) is assigned both **Cosmos DB Account Reader Role** and [**Cosmos DB Built-in Data Reader Role**](/azure/cosmos-db/how-to-setup-rbac#built-in-role-definitions).
30+
+ Read permissions. A "full access" connection string includes a key that grants access to the content, but if you're using identities (Microsoft Entra ID), make sure the [search service managed identity](search-howto-managed-identities-data-sources.md) is assigned both **Cosmos DB Account Reader Role** and [**Cosmos DB Built-in Data Reader Role**](/azure/cosmos-db/how-to-setup-rbac#built-in-role-definitions).
3131

32-
+ A [REST client](search-get-started-rest.md) to create the data source, index, and indexer.
32+
To work through the examples in this article, you need the Azure portal or a [REST client](search-get-started-rest.md). If you're using Azure portal, make sure that access to all public networks is enabled in Cosmos DB and that the client has access via an inbound rule. For a REST client that runs locally, configure the network firewall to allow inbound access from your device IP address. Other approaches for creating a Cosmos DB indexer include Azure SDKs.
3333

34-
## Define the data source
34+
## Try with sample data
35+
36+
Use these instructions to create a container and database in Cosmos DB that you can use with an indexer on Azure AI Search. The portal approach, using either import data wizard, is the quickest way to create and load an index from a container in Cosmos DB.
37+
38+
1. [Download HotelsData_toCosmosDB.JSON](https://github.com/HeidiSteen/azure-search-sample-data/blob/main/hotels/HotelsData_toCosmosDB.JSON) from GitHub to create a container in Cosmos DB that contains a subset of the sample hotels data set.
39+
40+
1. Sign in to the Azure portal and [create an account, database, and container](/azure/cosmos-db/nosql/quickstart-portal) on Cosmos DB.
41+
42+
1. In Cosmos DB, select **Data Explorer** for the new container, provide the following values.
43+
44+
| Property | Value |
45+
|----------|-------|
46+
| Database | Create new |
47+
| Database ID | hotelsdb |
48+
| Share throughput across containers | Don't select |
49+
| Container ID | hotels |
50+
| Partition key | /HotelId |
51+
| Container throughput (autoscale) | Autoscale |
52+
| Container Max RU/s | 1000 |
53+
54+
1. In **Data Explorer**, expand *hotelsdb* and *hotels", and then select **Items**.
55+
56+
1. Select **Upload Item** and then select *HotelsData_toCosmosDB.JSON* file that you downloaded from GitHub.
57+
58+
1. Right-click **Items** and select **New SQL query**. The default query is `SELECT * FROM c`.
59+
60+
1. Select **Execute query** to run the query and view results. You should have 50 hotel documents.
61+
62+
You can now use this content for indexing in the Azure portal, REST client, or an Azure SDK.
63+
64+
## Use the Azure portal
65+
66+
You can use either the **Import data** wizard or **Import and vectorize data** wizard to automate indexing from an SQL database table or view. The data source configuration similar for both wizards.
67+
68+
1. [Start the wizard](search-import-data-portal.md#starting-the-wizards).
69+
70+
1. On **Connect to your data**, select or verify that the data source type is either *Azure Cosmos DB* or a *NoSQL account*.
71+
72+
The data source name refers to the data source connection object in Azure AI Search. If you use the vector wizard, your data source name is autogenerated using a custom prefix specified at the end of the wizard workflow.
73+
74+
1. Specify the database name and collection. The query is optional. It's useful if you have hierarchical data and you want to import a specific slice.
75+
76+
1. Specify an authentication method, either a managed identity or built-in API key. If you don't specify a managed identity connection, the portal uses the key.
77+
78+
If you [configure Azure AI Search to use a managed identity](search-howto-managed-identities-data-sources.md), and you create a role assignment on Cosmos DB that grants **Cosmos DB Account Reader Role** and [**Cosmos DB Built-in Data Reader Role**](/azure/cosmos-db/how-to-setup-rbac#built-in-role-definitions) permissions to the identity, your indexer can connect to Cosmos DB using Microsoft Entra ID and roles.
79+
80+
1. For the **Import and vectorize data** wizard, you can specify options for change and deletion tracking.
81+
82+
[Change detection](#incremental-indexing-and-custom-queries) is supported by default through a `_ts` field (timestamp). If you upload content using the approach described in [Try with sample data](#try-with-sample-data), the collection is created with a `_ts` field.
83+
84+
[Deletion detection](#indexing-deleted-documents) requires that you have a pre-existing top-level field in the index that can be used as a soft-delete flag. It should be a Boolean field (you could name it IsDeleted). In the search index, add a corresponding search field called *IsDeleted* set to retrievable and filterable. Specify `true` as the soft-delete value.
85+
86+
1. Continue with the remaining steps to complete the wizard:
87+
88+
+ [Quickstart: Import data wizard](search-get-started-portal.md)
89+
90+
+ [Quickstart: Import and vectorize data wizard](search-get-started-portal-import-vectors.md)
91+
92+
## Use the REST APIs
93+
94+
This section demonstrates the REST API calls that create a data source, index, and indexer.
95+
96+
### Define the data source
3597

3698
The data source definition specifies the data to index, credentials, and policies for identifying changes in the data. A data source is an independent resource that can be used by multiple indexers.
3799

@@ -73,7 +135,7 @@ The data source definition specifies the data to index, credentials, and policie
73135
74136
<a name="credentials"></a>
75137
76-
### Supported credentials and connection strings
138+
#### Supported credentials and connection strings
77139
78140
Indexers can connect to a collection using the following connections.
79141
@@ -91,7 +153,7 @@ Avoid port numbers in the endpoint URL. If you include the port number, the conn
91153
92154
<a name="flatten-structures"></a>
93155
94-
### Using queries to shape indexed data
156+
#### Using queries to shape indexed data
95157
96158
In the "query" property under "container", you can specify a SQL query to flatten nested properties or arrays, project JSON properties, and filter the data to be indexed.
97159
@@ -135,7 +197,7 @@ SELECT c.id, c.userId, tag, c._ts FROM c JOIN tag IN c.tags WHERE c._ts >= @High
135197

136198
<a name="SelectDistinctQuery"></a>
137199

138-
#### Unsupported queries (DISTINCT and GROUP BY)
200+
##### Unsupported queries (DISTINCT and GROUP BY)
139201

140202
Queries using the [DISTINCT keyword](/azure/cosmos-db/sql-query-keywords#distinct) or [GROUP BY clause](/azure/cosmos-db/sql-query-group-by) aren't supported. Azure AI Search relies on [SQL query pagination](/azure/cosmos-db/sql-query-pagination) to fully enumerate the results of the query. Neither the DISTINCT keyword or GROUP BY clauses are compatible with the [continuation tokens](/azure/cosmos-db/sql-query-pagination#continuation-tokens) used to paginate results.
141203

@@ -156,7 +218,7 @@ Although Azure Cosmos DB has a workaround to support [SQL query pagination with
156218
SELECT DISTINCT VALUE c.name FROM c ORDER BY c.name
157219
```
158220

159-
## Add search fields to an index
221+
### Add search fields to an index
160222

161223
In a [search index](search-what-is-an-index.md), add fields to accept the source JSON documents or the output of your custom query projection. Ensure that the search index schema is compatible with source data. For content in Azure Cosmos DB, your search index schema should correspond to the [Azure Cosmos DB items](/azure/cosmos-db/resource-model#azure-cosmos-db-items) in your data source.
162224

@@ -191,7 +253,7 @@ In a [search index](search-what-is-an-index.md), add fields to accept the source
191253
192254
1. Create more fields for more searchable content. See [Create an index](search-how-to-create-search-index.md) for details.
193255
194-
### Mapping data types
256+
#### Mapping data types
195257
196258
| JSON data types | Azure AI Search field types |
197259
| --- | --- |
@@ -204,7 +266,7 @@ In a [search index](search-what-is-an-index.md), add fields to accept the source
204266
| GeoJSON objects such as { "type": "Point", "coordinates": [long, lat] } |Edm.GeographyPoint |
205267
| Other JSON objects |N/A |
206268
207-
## Configure and run the Azure Cosmos DB for NoSQL indexer
269+
### Configure and run the Azure Cosmos DB for NoSQL indexer
208270
209271
Once the index and data source have been created, you're ready to create the indexer. Indexer configuration specifies the inputs, parameters, and properties controlling run time behaviors.
210272
@@ -240,8 +302,17 @@ An indexer runs automatically when it's created. You can prevent this by setting
240302
241303
## Check indexer status
242304
243-
To monitor the indexer status and execution history, send a [Get Indexer Status](/rest/api/searchservice/indexers/get-status) request:
305+
To monitor the indexer status and execution history, check the indexer execution history in the Azure portal, or send a [Get Indexer Status](/rest/api/searchservice/indexers/get-status) REST APIrequest
306+
307+
### [**Portal**](#tab/portal-check-indexer)
308+
309+
1. On the search service page, open **Search management** > **Indexers**.
244310
311+
1. Select an indexer to access configuration and execution history.
312+
313+
1. Select a specific indexer job to view details, warnings, and errors.
314+
315+
### [**REST**](#tab/rest-check-indexer)
245316
```http
246317
GET https://myservice.search.windows.net/indexers/myindexer/status?api-version=2024-07-01
247318
Content-Type: application/json
@@ -282,6 +353,8 @@ The response includes status and the number of items processed. It should look s
282353
}
283354
```
284355

356+
---
357+
285358
Execution history contains up to 50 of the most recently completed executions, which are sorted in the reverse chronological order so that the latest execution comes first.
286359

287360
<a name="DataChangeDetectionPolicy"></a>
@@ -340,7 +413,7 @@ When rows are deleted from the collection, you normally want to delete those row
340413

341414
If you're using a custom query, make sure that the property referenced by `softDeleteColumnName` is projected by the query.
342415

343-
The `softDeleteColumnName` must be a top-level field in the index. Using nested fields within complex data types as the `softDeleteColumnName` is not supported.
416+
The `softDeleteColumnName` must be a top-level field in the index. Using nested fields within complex data types as the `softDeleteColumnName` isn't supported.
344417

345418
The following example creates a data source with a soft-deletion policy:
346419

0 commit comments

Comments
 (0)