You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/search/search-get-started-portal-import-vectors.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -42,7 +42,7 @@ The **Import and vectorize data** wizard supports the following data sources:
42
42
+[Azure SQL Database](/azure/azure-sql/database/single-database-create-quickstart), [Azure SQL Managed Instance](/azure/azure-sql/managed-instance/instance-create-quickstart), and Azure SQL Server virtual machines.
43
43
44
44
> [!NOTE]
45
-
> This quicktart provides steps for just those data sources that work with whole files: Azure Blob storage, ADLS Gen2, OneLake. For more information about using this wizard with other data soruces, see [Azure Table indexer](search-howto-indexing-azure-tables.md), [Cosmos DB for NoSQL indexer](search-howto-index-cosmosdb.md), and [Azuer SQL indexer](search-howto-connecting-azure-sql-database-to-azure-search-using-indexers.md).
45
+
> This quicktart provides steps for just those data sources that work with whole files: Azure Blob storage, ADLS Gen2, OneLake. For more information about using this wizard with other data soruces, see [Azure Table indexer](search-howto-indexing-azure-tables.md), [Cosmos DB for NoSQL indexer](search-howto-index-cosmosdb.md), and [Azuer SQL indexer](search-how-to-index-sql-database.md).
46
46
47
47
### Supported embedding models
48
48
@@ -230,7 +230,7 @@ The next step is to connect to a data source to use for the search index.
1. On the **Set up your data connection** page, select **Azure Blob Storage**.
233
+
1. On **Connect to your data**, select **Azure Blob Storage**.
234
234
235
235
1. Specify the Azure subscription.
236
236
@@ -256,7 +256,7 @@ The next step is to connect to a data source to use for the search index.
256
256
257
257
### [ADLS Gen2](#tab/connect-data-adlsgen2)
258
258
259
-
1. On the **Set up your data connection** page, select **Azure Data Lake**.
259
+
1. On **Connect to your data**, select **Azure Data Lake**.
260
260
261
261
1. Specify the Azure subscription.
262
262
@@ -284,7 +284,7 @@ The next step is to connect to a data source to use for the search index.
284
284
285
285
Support for OneLake indexing is in preview. For more information about supported shortcuts and limitations, see ([OneLake indexing](search-how-to-index-onelake-files.md)).
286
286
287
-
1. On the **Set up your data connection** page, select **OneLake**.
287
+
1. On **Connect to your data**, select **OneLake**.
288
288
289
289
1. Specify the type of connection:
290
290
@@ -303,7 +303,7 @@ Support for OneLake indexing is in preview. For more information about supported
303
303
304
304
In this step, specify the embedding model for vectorizing chunked data.
305
305
306
-
Chunking is built-in and nonconfigurable. The effective settings are:
306
+
Chunking is builtin and nonconfigurable. The effective settings are:
Copy file name to clipboardExpand all lines: articles/search/search-how-to-index-sql-database.md
+14-5Lines changed: 14 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -17,7 +17,7 @@ ms.date: 11/20/2024
17
17
18
18
In this article, learn how to configure an [**indexer**](search-indexer-overview.md) that imports content from Azure SQL Database or an Azure SQL managed instance and makes it searchable in Azure AI Search.
19
19
20
-
This article supplements [**Create an indexer**](search-howto-create-indexers.md) with information that's specific to Azure SQL. It uses the Azure portal and REST APIs to demonstrate a three-part workflow common to all indexers: create a data source, create an index, create an indexer.
20
+
This article supplements [**Create an indexer**](search-howto-create-indexers.md) with information that's specific to Azure SQL. It uses the Azure portal and REST APIs to demonstrate a three-part workflow common to all indexers: create a data source, create an index, create an indexer. Data extraction occurs when you submit the Create Indexer request.
21
21
22
22
This article also provides:
23
23
@@ -42,7 +42,9 @@ To work through the examples in this article, you need the Azure portal or a [RE
42
42
43
43
## Try with sample data
44
44
45
-
[Download hotels-azure-sql.sql](hotels/hotel-sql/hotels-azure-sql.sql) from GitHub to create a table on Azure SQL Database that contains a subset of the sample hotels data set.
45
+
Use these instructions to create a table in Azure SQL that you can use with an indexer on Azure AI Search. The portal approach, using either import data wizard, is the quickest way to create and load an index from a table in a SQL database.
46
+
47
+
1.[Download hotels-azure-sql.sql](https://github.com/Azure-Samples/azure-search-sample-data/tree/main/hotels/hotel-sql) from GitHub to create a table on Azure SQL Database that contains a subset of the sample hotels data set.
46
48
47
49
1. Sign in to the Azure portal and [create an Azure SQL database and database server](/azure/azure-sql/database/single-database-create-quickstart). Consider configuring both SQL Server authentication and Microsoft Entra ID authentication. If you don't have permissions to configure roles on Azure, you can use SQL authentication as a workaround.
48
50
@@ -108,19 +110,25 @@ You can use either the **Import data** wizard or **Import and vectorize data** w
108
110
109
111
1. Specify the server name, database name, and table or view name.
110
112
111
-
The portal validates the connection. If the database is unavailable due to inactivity, navigate to the database server page and make sure database status is *online*.
113
+
The portal validates the connection. If the database is paused due to inactivity, navigate to the database server page and make sure database status is *online*. You can run a query on any table to activate the database.
114
+
115
+
:::image type="content" source="media/search-how-to-index-sql-database/database-online.png" alt-text="Screenshot of the database status page in the Azure portal.":::
112
116
113
117
1. Specify an authentication method, either a SQL Server login defined during server setup, or a managed identity.
114
118
115
-
If you [configure Azure AI Search to use a managed identity](search-howto-managed-identities-data-sources.md), and you create a role assignment on the database server that grants **SQL Server Contributor** or **SQL Server DB Contributor** permissions to the identity, you can connect using Microsoft Entra ID and roles.
119
+
If you [configure Azure AI Search to use a managed identity](search-howto-managed-identities-data-sources.md), and you create a role assignment on the database server that grants **SQL Server Contributor** or **SQL Server DB Contributor** permissions to the identity, your indexer can connect to Azure SQL using Microsoft Entra ID and roles.
116
120
117
121
1. For the **Import and vectorize data** wizard, you can specify options for change and deletion tracking.
118
122
119
123
+ Deletion tracking is based on [soft delete using custom metadata](#soft-delete-column-deletion-detection-policy).
120
124
121
125
+ Change tracking is based on [SQL Server integrated change tracking](#sql-integrated-change-tracking-policy) or [high water mark change tracking](#high-water-mark-change-detection-policy).
122
126
123
-
1. Continue with the remaining steps to complete the wizard. For more information, see [Quickstart: Import data wizard](search-get-started-portal.md) or [Quickstart: Import and vectorize data wizard](search-get-started-portal-import-vectors.md).
127
+
1. Continue with the remaining steps to complete the wizard:
128
+
129
+
+ [Quickstart: Import data wizard](search-get-started-portal.md)
130
+
131
+
+ [Quickstart: Import and vectorize data wizard](search-get-started-portal-import-vectors.md)
Copy file name to clipboardExpand all lines: articles/search/search-howto-index-cosmosdb.md
+86-13Lines changed: 86 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,14 +10,14 @@ ms.custom:
10
10
- devx-track-dotnet
11
11
- ignite-2023
12
12
ms.topic: how-to
13
-
ms.date: 06/18/2024
13
+
ms.date: 11/20/2024
14
14
---
15
15
16
16
# Index data from Azure Cosmos DB for NoSQL for queries in Azure AI Search
17
17
18
18
In this article, learn how to configure an [**indexer**](search-indexer-overview.md) that imports content from [Azure Cosmos DB for NoSQL](/azure/cosmos-db/nosql/) and makes it searchable in Azure AI Search.
19
19
20
-
This article supplements [**Create an indexer**](search-howto-create-indexers.md) with information that's specific to Cosmos DB. It uses the REST APIs to demonstrate a three-part workflow common to all indexers: create a data source, create an index, create an indexer. Data extraction occurs when you submit the Create Indexer request.
20
+
This article supplements [**Create an indexer**](search-howto-create-indexers.md) with information that's specific to Cosmos DB. It uses the Azure portal and REST APIs to demonstrate a three-part workflow common to all indexers: create a data source, create an index, create an indexer. Data extraction occurs when you submit the Create Indexer request.
21
21
22
22
Because terminology can be confusing, it's worth noting that [Azure Cosmos DB indexing](/azure/cosmos-db/index-overview) and [Azure AI Search indexing](search-what-is-an-index.md) are different operations. Indexing in Azure AI Search creates and loads a search index on your search service.
23
23
@@ -27,11 +27,73 @@ Because terminology can be confusing, it's worth noting that [Azure Cosmos DB in
27
27
28
28
+ An [automatic indexing policy](/azure/cosmos-db/index-policy) on the Azure Cosmos DB collection, set to [Consistent](/azure/cosmos-db/index-policy#indexing-mode). This is the default configuration. Lazy indexing isn't recommended and can result in missing data.
29
29
30
-
+ Read permissions. A "full access" connection string includes a key that grants access to the content, but if you're using Azure RBAC (Microsoft Entra ID), make sure the [search service managed identity](search-howto-managed-identities-data-sources.md) is assigned both **Cosmos DB Account Reader Role** and [**Cosmos DB Built-in Data Reader Role**](/azure/cosmos-db/how-to-setup-rbac#built-in-role-definitions).
30
+
+ Read permissions. A "full access" connection string includes a key that grants access to the content, but if you're using identities (Microsoft Entra ID), make sure the [search service managed identity](search-howto-managed-identities-data-sources.md) is assigned both **Cosmos DB Account Reader Role** and [**Cosmos DB Built-in Data Reader Role**](/azure/cosmos-db/how-to-setup-rbac#built-in-role-definitions).
31
31
32
-
+ A [REST client](search-get-started-rest.md)to create the data source, index, and indexer.
32
+
To work through the examples in this article, you need the Azure portal or a [REST client](search-get-started-rest.md). If you're using Azure portal, make sure that access to all public networks is enabled in Cosmos DB and that the client has access via an inbound rule. For a REST client that runs locally, configure the network firewall to allow inbound access from your device IP address. Other approaches for creating a Cosmos DB indexer include Azure SDKs.
33
33
34
-
## Define the data source
34
+
## Try with sample data
35
+
36
+
Use these instructions to create a container and database in Cosmos DB that you can use with an indexer on Azure AI Search. The portal approach, using either import data wizard, is the quickest way to create and load an index from a container in Cosmos DB.
37
+
38
+
1.[Download HotelsData_toCosmosDB.JSON](https://github.com/HeidiSteen/azure-search-sample-data/blob/main/hotels/HotelsData_toCosmosDB.JSON) from GitHub to create a container in Cosmos DB that contains a subset of the sample hotels data set.
39
+
40
+
1. Sign in to the Azure portal and [create an account, database, and container](/azure/cosmos-db/nosql/quickstart-portal) on Cosmos DB.
41
+
42
+
1. In Cosmos DB, select **Data Explorer** for the new container, provide the following values.
43
+
44
+
| Property | Value |
45
+
|----------|-------|
46
+
| Database | Create new |
47
+
| Database ID | hotelsdb |
48
+
| Share throughput across containers | Don't select |
49
+
| Container ID | hotels |
50
+
| Partition key | /HotelId |
51
+
| Container throughput (autoscale) | Autoscale |
52
+
| Container Max RU/s | 1000 |
53
+
54
+
1. In **Data Explorer**, expand *hotelsdb* and *hotels", and then select **Items**.
55
+
56
+
1. Select **Upload Item** and then select *HotelsData_toCosmosDB.JSON* file that you downloaded from GitHub.
57
+
58
+
1. Right-click **Items** and select **New SQL query**. The default query is `SELECT * FROM c`.
59
+
60
+
1. Select **Execute query** to run the query and view results. You should have 50 hotel documents.
61
+
62
+
You can now use this content for indexing in the Azure portal, REST client, or an Azure SDK.
63
+
64
+
## Use the Azure portal
65
+
66
+
You can use either the **Import data** wizard or **Import and vectorize data** wizard to automate indexing from an SQL database table or view. The data source configuration similar for both wizards.
67
+
68
+
1.[Start the wizard](search-import-data-portal.md#starting-the-wizards).
69
+
70
+
1. On **Connect to your data**, select or verify that the data source type is either *Azure Cosmos DB* or a *NoSQL account*.
71
+
72
+
The data source name refers to the data source connection object in Azure AI Search. If you use the vector wizard, your data source name is autogenerated using a custom prefix specified at the end of the wizard workflow.
73
+
74
+
1. Specify the database name and collection. The query is optional. It's useful if you have hierarchical data and you want to import a specific slice.
75
+
76
+
1. Specify an authentication method, either a managed identity or built-in API key. If you don't specify a managed identity connection, the portal uses the key.
77
+
78
+
If you [configure Azure AI Search to use a managed identity](search-howto-managed-identities-data-sources.md), and you create a role assignment on Cosmos DB that grants **Cosmos DB Account Reader Role** and [**Cosmos DB Built-in Data Reader Role**](/azure/cosmos-db/how-to-setup-rbac#built-in-role-definitions) permissions to the identity, your indexer can connect to Cosmos DB using Microsoft Entra ID and roles.
79
+
80
+
1. For the **Import and vectorize data** wizard, you can specify options for change and deletion tracking.
81
+
82
+
[Change detection](#incremental-indexing-and-custom-queries) is supported by default through a `_ts` field (timestamp). If you upload content using the approach described in [Try with sample data](#try-with-sample-data), the collection is created with a `_ts` field.
83
+
84
+
[Deletion detection](#indexing-deleted-documents) requires that you have a pre-existing top-level field in the index that can be used as a soft-delete flag. It should be a Boolean field (you could name it IsDeleted). In the search index, add a corresponding search field called *IsDeleted* set to retrievable and filterable. Specify `true` as the soft-delete value.
85
+
86
+
1. Continue with the remaining steps to complete the wizard:
87
+
88
+
+[Quickstart: Import data wizard](search-get-started-portal.md)
89
+
90
+
+[Quickstart: Import and vectorize data wizard](search-get-started-portal-import-vectors.md)
91
+
92
+
## Use the REST APIs
93
+
94
+
This section demonstrates the REST API calls that create a data source, index, and indexer.
95
+
96
+
### Define the data source
35
97
36
98
The data source definition specifies the data to index, credentials, and policies for identifying changes in the data. A data source is an independent resource that can be used by multiple indexers.
37
99
@@ -73,7 +135,7 @@ The data source definition specifies the data to index, credentials, and policie
73
135
74
136
<a name="credentials"></a>
75
137
76
-
### Supported credentials and connection strings
138
+
#### Supported credentials and connection strings
77
139
78
140
Indexers can connect to a collection using the following connections.
79
141
@@ -91,7 +153,7 @@ Avoid port numbers in the endpoint URL. If you include the port number, the conn
91
153
92
154
<a name="flatten-structures"></a>
93
155
94
-
### Using queries to shape indexed data
156
+
#### Using queries to shape indexed data
95
157
96
158
In the "query" property under "container", you can specify a SQL query to flatten nested properties or arrays, project JSON properties, and filter the data to be indexed.
97
159
@@ -135,7 +197,7 @@ SELECT c.id, c.userId, tag, c._ts FROM c JOIN tag IN c.tags WHERE c._ts >= @High
135
197
136
198
<aname="SelectDistinctQuery"></a>
137
199
138
-
#### Unsupported queries (DISTINCT and GROUP BY)
200
+
#####Unsupported queries (DISTINCT and GROUP BY)
139
201
140
202
Queries using the [DISTINCT keyword](/azure/cosmos-db/sql-query-keywords#distinct) or [GROUP BY clause](/azure/cosmos-db/sql-query-group-by) aren't supported. Azure AI Search relies on [SQL query pagination](/azure/cosmos-db/sql-query-pagination) to fully enumerate the results of the query. Neither the DISTINCT keyword or GROUP BY clauses are compatible with the [continuation tokens](/azure/cosmos-db/sql-query-pagination#continuation-tokens) used to paginate results.
141
203
@@ -156,7 +218,7 @@ Although Azure Cosmos DB has a workaround to support [SQL query pagination with
156
218
SELECT DISTINCT VALUE c.nameFROM c ORDER BYc.name
157
219
```
158
220
159
-
## Add search fields to an index
221
+
###Add search fields to an index
160
222
161
223
In a [search index](search-what-is-an-index.md), add fields to accept the source JSON documents or the output of your custom query projection. Ensure that the search index schema is compatible with source data. For content in Azure Cosmos DB, your search index schema should correspond to the [Azure Cosmos DB items](/azure/cosmos-db/resource-model#azure-cosmos-db-items) in your data source.
162
224
@@ -191,7 +253,7 @@ In a [search index](search-what-is-an-index.md), add fields to accept the source
191
253
192
254
1. Create more fields for more searchable content. See [Create an index](search-how-to-create-search-index.md) for details.
193
255
194
-
### Mapping data types
256
+
#### Mapping data types
195
257
196
258
| JSON data types | Azure AI Search field types |
197
259
| --- | --- |
@@ -204,7 +266,7 @@ In a [search index](search-what-is-an-index.md), add fields to accept the source
204
266
| GeoJSON objects such as { "type": "Point", "coordinates": [long, lat] } |Edm.GeographyPoint |
205
267
| Other JSON objects |N/A |
206
268
207
-
## Configure and run the Azure Cosmos DB for NoSQL indexer
269
+
### Configure and run the Azure Cosmos DB for NoSQL indexer
208
270
209
271
Once the index and data source have been created, you're ready to create the indexer. Indexer configuration specifies the inputs, parameters, and properties controlling run time behaviors.
210
272
@@ -240,8 +302,17 @@ An indexer runs automatically when it's created. You can prevent this by setting
240
302
241
303
## Check indexer status
242
304
243
-
To monitor the indexer status and execution history, send a [Get Indexer Status](/rest/api/searchservice/indexers/get-status) request:
305
+
To monitor the indexer status and execution history, check the indexer execution history in the Azure portal, or send a [Get Indexer Status](/rest/api/searchservice/indexers/get-status) REST APIrequest
306
+
307
+
### [**Portal**](#tab/portal-check-indexer)
308
+
309
+
1. On the search service page, open **Search management** > **Indexers**.
244
310
311
+
1. Select an indexer to access configuration and execution history.
312
+
313
+
1. Select a specific indexer job to view details, warnings, and errors.
314
+
315
+
### [**REST**](#tab/rest-check-indexer)
245
316
```http
246
317
GET https://myservice.search.windows.net/indexers/myindexer/status?api-version=2024-07-01
247
318
Content-Type: application/json
@@ -282,6 +353,8 @@ The response includes status and the number of items processed. It should look s
282
353
}
283
354
```
284
355
356
+
---
357
+
285
358
Execution history contains up to 50 of the most recently completed executions, which are sorted in the reverse chronological order so that the latest execution comes first.
286
359
287
360
<aname="DataChangeDetectionPolicy"></a>
@@ -340,7 +413,7 @@ When rows are deleted from the collection, you normally want to delete those row
340
413
341
414
If you're using a custom query, make sure that the property referenced by `softDeleteColumnName` is projected by the query.
342
415
343
-
The `softDeleteColumnName` must be a top-level field in the index. Using nested fields within complex data types as the `softDeleteColumnName`is not supported.
416
+
The `softDeleteColumnName` must be a top-level field in the index. Using nested fields within complex data types as the `softDeleteColumnName`isn't supported.
344
417
345
418
The following example creates a data source with a soft-deletion policy:
0 commit comments