Cosmos DB updates for portal BYOE

HeidiSteen · HeidiSteen · commit 51bdaeca8494 · 2024-11-17T13:07:06.000-08:00
diff --git a/articles/search/media/search-how-to-index-sql-database/database-online.png b/articles/search/media/search-how-to-index-sql-database/database-online.png
diff --git a/articles/search/search-get-started-portal-import-vectors.md b/articles/search/search-get-started-portal-import-vectors.md
@@ -42,7 +42,7 @@ The **Import and vectorize data** wizard supports the following data sources:
 + [Azure SQL Database](/azure/azure-sql/database/single-database-create-quickstart), [Azure SQL Managed Instance](/azure/azure-sql/managed-instance/instance-create-quickstart), and Azure SQL Server virtual machines.
 
 > [!NOTE]
-> This quicktart provides steps for just those data sources that work with whole files: Azure Blob storage, ADLS Gen2, OneLake. For more information about using this wizard with other data soruces, see [Azure Table indexer](search-howto-indexing-azure-tables.md), [Cosmos DB for NoSQL indexer](search-howto-index-cosmosdb.md), and [Azuer SQL indexer](search-howto-connecting-azure-sql-database-to-azure-search-using-indexers.md).
+> This quicktart provides steps for just those data sources that work with whole files: Azure Blob storage, ADLS Gen2, OneLake. For more information about using this wizard with other data soruces, see [Azure Table indexer](search-howto-indexing-azure-tables.md), [Cosmos DB for NoSQL indexer](search-howto-index-cosmosdb.md), and [Azuer SQL indexer](search-how-to-index-sql-database.md).
 
 ### Supported embedding models
 
@@ -230,7 +230,7 @@ The next step is to connect to a data source to use for the search index.
 
 ### [Azure Blob Storage](#tab/connect-data-storage)
 
-1. On the **Set up your data connection** page, select **Azure Blob Storage**.
+1. On **Connect to your data**, select **Azure Blob Storage**.
 
 1. Specify the Azure subscription.
 
@@ -256,7 +256,7 @@ The next step is to connect to a data source to use for the search index.
 
 ### [ADLS Gen2](#tab/connect-data-adlsgen2)
 
-1. On the **Set up your data connection** page, select **Azure Data Lake**.
+1. On **Connect to your data**, select **Azure Data Lake**.
 
 1. Specify the Azure subscription.
 
@@ -284,7 +284,7 @@ The next step is to connect to a data source to use for the search index.
 
 Support for OneLake indexing is in preview. For more information about supported shortcuts and limitations, see ([OneLake indexing](search-how-to-index-onelake-files.md)).
 
-1. On the **Set up your data connection** page, select **OneLake**.
+1. On **Connect to your data**, select **OneLake**.
 
 1. Specify the type of connection:
 
@@ -303,7 +303,7 @@ Support for OneLake indexing is in preview. For more information about supported
 
 In this step, specify the embedding model for vectorizing chunked data.
 
-Chunking is built-in and nonconfigurable. The effective settings are:
+Chunking is built in and nonconfigurable. The effective settings are:
 
 ```json
 "textSplitMode": "pages",
diff --git a/articles/search/search-how-to-index-sql-database.md b/articles/search/search-how-to-index-sql-database.md
@@ -17,7 +17,7 @@ ms.date: 11/20/2024
 
 In this article, learn how to configure an [**indexer**](search-indexer-overview.md) that imports content from Azure SQL Database or an Azure SQL managed instance and makes it searchable in Azure AI Search. 
 
-This article supplements [**Create an indexer**](search-howto-create-indexers.md) with information that's specific to Azure SQL. It uses the Azure portal and REST APIs to demonstrate a three-part workflow common to all indexers: create a data source, create an index, create an indexer. 
+This article supplements [**Create an indexer**](search-howto-create-indexers.md) with information that's specific to Azure SQL. It uses the Azure portal and REST APIs to demonstrate a three-part workflow common to all indexers: create a data source, create an index, create an indexer. Data extraction occurs when you submit the Create Indexer request.
 
 This article also provides:
 
@@ -42,7 +42,9 @@ To work through the examples in this article, you need the Azure portal or a [RE
 
 ## Try with sample data
 
-[Download hotels-azure-sql.sql](hotels/hotel-sql/hotels-azure-sql.sql) from GitHub to create a table on Azure SQL Database that contains a subset of the sample hotels data set.
+Use these instructions to create a table in Azure SQL that you can use with an indexer on Azure AI Search. The portal approach, using either import data wizard, is the quickest way to create and load an index from a table in a SQL database.
+
+1. [Download hotels-azure-sql.sql](https://github.com/Azure-Samples/azure-search-sample-data/tree/main/hotels/hotel-sql) from GitHub to create a table on Azure SQL Database that contains a subset of the sample hotels data set.
 
 1. Sign in to the Azure portal and [create an Azure SQL database and database server](/azure/azure-sql/database/single-database-create-quickstart). Consider configuring both SQL Server authentication and Microsoft Entra ID authentication. If you don't have permissions to configure roles on Azure, you can use SQL authentication as a workaround.
 
@@ -108,19 +110,25 @@ You can use either the **Import data** wizard or **Import and vectorize data** w
 
 1. Specify the server name, database name, and table or view name.
 
-   The portal validates the connection. If the database is unavailable due to inactivity, navigate to the database server page and make sure database status is *online*.
+   The portal validates the connection. If the database is paused due to inactivity, navigate to the database server page and make sure database status is *online*. You can run a query on any table to activate the database.
+
+   :::image type="content" source="media/search-how-to-index-sql-database/database-online.png" alt-text="Screenshot of the database status page in the Azure portal.":::
 
 1. Specify an authentication method, either a SQL Server login defined during server setup, or a managed identity.
 
-   If you [configure Azure AI Search to use a managed identity](search-howto-managed-identities-data-sources.md), and you create a role assignment on the database server that grants **SQL Server Contributor** or **SQL Server DB Contributor** permissions to the identity, you can connect using Microsoft Entra ID and roles.
+   If you [configure Azure AI Search to use a managed identity](search-howto-managed-identities-data-sources.md), and you create a role assignment on the database server that grants **SQL Server Contributor** or **SQL Server DB Contributor** permissions to the identity, your indexer can connect to Azure SQL using Microsoft Entra ID and roles.
 
 1. For the **Import and vectorize data** wizard, you can specify options for change and deletion tracking.
 
    + Deletion tracking is based on [soft delete using custom metadata](#soft-delete-column-deletion-detection-policy).
 
    + Change tracking is based on [SQL Server integrated change tracking](#sql-integrated-change-tracking-policy) or [high water mark change tracking](#high-water-mark-change-detection-policy).
 
-1. Continue with the remaining steps to complete the wizard. For more information, see [Quickstart: Import data wizard](search-get-started-portal.md) or [Quickstart: Import and vectorize data wizard](search-get-started-portal-import-vectors.md).
+1. Continue with the remaining steps to complete the wizard:
+
+   + [Quickstart: Import data wizard](search-get-started-portal.md)
+
+   + [Quickstart: Import and vectorize data wizard](search-get-started-portal-import-vectors.md)
 
 ## Use the REST APIs
 
@@ -362,6 +370,7 @@ api-key: admin-key
         "container" : { "name" : "table name" },
         "dataChangeDetectionPolicy" : {
             "@odata.type" : "#Microsoft.Azure.Search.SqlIntegratedChangeTrackingPolicy"
+        }
     }
 ```
 
diff --git a/articles/search/search-howto-index-cosmosdb.md b/articles/search/search-howto-index-cosmosdb.md
@@ -10,14 +10,14 @@ ms.custom:
   - devx-track-dotnet
   - ignite-2023
 ms.topic: how-to
-ms.date: 06/18/2024
+ms.date: 11/20/2024
 ---
 
 # Index data from Azure Cosmos DB for NoSQL for queries in Azure AI Search
 
 In this article, learn how to configure an [**indexer**](search-indexer-overview.md) that imports content from [Azure Cosmos DB for NoSQL](/azure/cosmos-db/nosql/) and makes it searchable in Azure AI Search.
 
-This article supplements [**Create an indexer**](search-howto-create-indexers.md) with information that's specific to Cosmos DB. It uses the REST APIs to demonstrate a three-part workflow common to all indexers: create a data source, create an index, create an indexer. Data extraction occurs when you submit the Create Indexer request.
+This article supplements [**Create an indexer**](search-howto-create-indexers.md) with information that's specific to Cosmos DB. It uses the Azure portal and REST APIs to demonstrate a three-part workflow common to all indexers: create a data source, create an index, create an indexer. Data extraction occurs when you submit the Create Indexer request.
 
 Because terminology can be confusing, it's worth noting that [Azure Cosmos DB indexing](/azure/cosmos-db/index-overview) and [Azure AI Search indexing](search-what-is-an-index.md) are different operations. Indexing in Azure AI Search creates and loads a search index on your search service.
 
@@ -27,11 +27,73 @@ Because terminology can be confusing, it's worth noting that [Azure Cosmos DB in
 
 + An [automatic indexing policy](/azure/cosmos-db/index-policy) on the Azure Cosmos DB collection, set to [Consistent](/azure/cosmos-db/index-policy#indexing-mode). This is the default configuration. Lazy indexing isn't recommended and can result in missing data.
 
-+ Read permissions. A "full access" connection string includes a key that grants access to the content, but if you're using Azure RBAC (Microsoft Entra ID), make sure the [search service managed identity](search-howto-managed-identities-data-sources.md) is assigned both **Cosmos DB Account Reader Role** and [**Cosmos DB Built-in Data Reader Role**](/azure/cosmos-db/how-to-setup-rbac#built-in-role-definitions).
++ Read permissions. A "full access" connection string includes a key that grants access to the content, but if you're using identities (Microsoft Entra ID), make sure the [search service managed identity](search-howto-managed-identities-data-sources.md) is assigned both **Cosmos DB Account Reader Role** and [**Cosmos DB Built-in Data Reader Role**](/azure/cosmos-db/how-to-setup-rbac#built-in-role-definitions).
 
-+ A [REST client](search-get-started-rest.md) to create the data source, index, and indexer. 
+To work through the examples in this article, you need the Azure portal or a [REST client](search-get-started-rest.md). If you're using Azure portal, make sure that access to all public networks is enabled in Cosmos DB and that the client has access via an inbound rule. For a REST client that runs locally, configure the network firewall to allow inbound access from your device IP address. Other approaches for creating a Cosmos DB indexer include Azure SDKs.
 
-## Define the data source
+## Try with sample data
+
+Use these instructions to create a container and database in Cosmos DB that you can use with an indexer on Azure AI Search. The portal approach, using either import data wizard, is the quickest way to create and load an index from a container in Cosmos DB.
+
+1. [Download HotelsData_toCosmosDB.JSON](https://github.com/HeidiSteen/azure-search-sample-data/blob/main/hotels/HotelsData_toCosmosDB.JSON) from GitHub to create a container in Cosmos DB that contains a subset of the sample hotels data set.
+
+1. Sign in to the Azure portal and [create an account, database, and container](/azure/cosmos-db/nosql/quickstart-portal) on Cosmos DB. 
+
+1. In Cosmos DB, select **Data Explorer**  for the new container, provide the following values.
+
+    | Property | Value |
+    |----------|-------|
+    | Database | Create new |
+    | Database ID | hotelsdb |
+    | Share throughput across containers | Don't select |
+    | Container ID | hotels |
+    | Partition key | /HotelId |
+    | Container throughput (autoscale) | Autoscale |
+    | Container Max RU/s | 1000 |
+
+1. In **Data Explorer**, expand *hotelsdb* and *hotels", and then select **Items**.
+
+1. Select **Upload Item** and then select *HotelsData_toCosmosDB.JSON* file that you downloaded from GitHub.
+
+1. Right-click **Items** and select **New SQL query**. The default query is `SELECT * FROM c`.
+
+1. Select **Execute query** to run the query and view results. You should have 50 hotel documents.
+
+You can now use this content for indexing in the Azure portal, REST client, or an Azure SDK.
+
+## Use the Azure portal
+
+You can use either the **Import data** wizard or **Import and vectorize data** wizard to automate indexing from an SQL database table or view. The data source configuration similar for both wizards.
+
+1. [Start the wizard](search-import-data-portal.md#starting-the-wizards).
+
+1. On **Connect to your data**, select or verify that the data source type is either *Azure Cosmos DB* or a *NoSQL account*.
+
+   The data source name refers to the data source connection object in Azure AI Search. If you use the vector wizard, your data source name is autogenerated using a custom prefix specified at the end of the wizard workflow.
+
+1. Specify the database name and collection. The query is optional. It's useful if you have hierarchical data and you want to import a specific slice.
+
+1. Specify an authentication method, either a managed identity or built-in API key. If you don't specify a managed identity connection, the portal uses the key.
+
+   If you [configure Azure AI Search to use a managed identity](search-howto-managed-identities-data-sources.md), and you create a role assignment on Cosmos DB that grants **Cosmos DB Account Reader Role** and [**Cosmos DB Built-in Data Reader Role**](/azure/cosmos-db/how-to-setup-rbac#built-in-role-definitions) permissions to the identity, your indexer can connect to Cosmos DB using Microsoft Entra ID and roles.
+
+1. For the **Import and vectorize data** wizard, you can specify options for change and deletion tracking.
+
+   [Change detection](#incremental-indexing-and-custom-queries) is supported by default through a `_ts` field (timestamp). If you upload content using the approach described in [Try with sample data](#try-with-sample-data), the collection is created with a `_ts` field.
+
+   [Deletion detection](#indexing-deleted-documents) requires that you have a pre-existing top-level field in the index that can be used as a soft-delete flag. It should be a Boolean field (you could name it IsDeleted). In the search index, add a corresponding search field called *IsDeleted* set to retrievable and filterable. Specify `true` as the soft-delete value.
+
+1. Continue with the remaining steps to complete the wizard:
+
+   + [Quickstart: Import data wizard](search-get-started-portal.md)
+
+   + [Quickstart: Import and vectorize data wizard](search-get-started-portal-import-vectors.md)
+
+## Use the REST APIs
+
+This section demonstrates the REST API calls that create a data source, index, and indexer.
+
+### Define the data source
 
 The data source definition specifies the data to index, credentials, and policies for identifying changes in the data. A data source is an independent resource that can be used by multiple indexers.
 
@@ -73,7 +135,7 @@ The data source definition specifies the data to index, credentials, and policie
 
 <a name="credentials"></a>
 
-### Supported credentials and connection strings
+#### Supported credentials and connection strings
 
 Indexers can connect to a collection using the following connections.
 
@@ -91,7 +153,7 @@ Avoid port numbers in the endpoint URL. If you include the port number, the conn
 
 <a name="flatten-structures"></a>
 
-### Using queries to shape indexed data
+#### Using queries to shape indexed data
 
 In the "query" property under "container", you can specify a SQL query to flatten nested properties or arrays, project JSON properties, and filter the data to be indexed. 
 
@@ -135,7 +197,7 @@ SELECT c.id, c.userId, tag, c._ts FROM c JOIN tag IN c.tags WHERE c._ts >= @High
 
 <a name="SelectDistinctQuery"></a>
 
-#### Unsupported queries (DISTINCT and GROUP BY)
+##### Unsupported queries (DISTINCT and GROUP BY)
 
 Queries using the [DISTINCT keyword](/azure/cosmos-db/sql-query-keywords#distinct) or [GROUP BY clause](/azure/cosmos-db/sql-query-group-by) aren't supported. Azure AI Search relies on [SQL query pagination](/azure/cosmos-db/sql-query-pagination) to fully enumerate the results of the query. Neither the DISTINCT keyword or GROUP BY clauses are compatible with the [continuation tokens](/azure/cosmos-db/sql-query-pagination#continuation-tokens) used to paginate results.
 
@@ -156,7 +218,7 @@ Although Azure Cosmos DB has a workaround to support [SQL query pagination with
 SELECT DISTINCT VALUE c.name FROM c ORDER BY c.name
 ```
 
-## Add search fields to an index
+### Add search fields to an index
 
 In a [search index](search-what-is-an-index.md), add fields to accept the source JSON documents or the output of your custom query projection. Ensure that the search index schema is compatible with source data. For content in Azure Cosmos DB, your search index schema should correspond to the [Azure Cosmos DB items](/azure/cosmos-db/resource-model#azure-cosmos-db-items) in your data source.
 
@@ -191,7 +253,7 @@ In a [search index](search-what-is-an-index.md), add fields to accept the source
 
 1. Create more fields for more searchable content. See [Create an index](search-how-to-create-search-index.md) for details.
 
-### Mapping data types
+#### Mapping data types
 
 | JSON data types | Azure AI Search field types |
 | --- | --- |
@@ -204,7 +266,7 @@ In a [search index](search-what-is-an-index.md), add fields to accept the source
 | GeoJSON objects such as { "type": "Point", "coordinates": [long, lat] } |Edm.GeographyPoint |
 | Other JSON objects |N/A |
 
-## Configure and run the Azure Cosmos DB for NoSQL indexer
+### Configure and run the Azure Cosmos DB for NoSQL indexer
 
 Once the index and data source have been created, you're ready to create the indexer. Indexer configuration specifies the inputs, parameters, and properties controlling run time behaviors.
 
@@ -240,8 +302,17 @@ An indexer runs automatically when it's created. You can prevent this by setting
 
 ## Check indexer status
 
-To monitor the indexer status and execution history, send a [Get Indexer Status](/rest/api/searchservice/indexers/get-status) request:
+To monitor the indexer status and execution history, check the indexer execution history in the Azure portal, or send a [Get Indexer Status](/rest/api/searchservice/indexers/get-status) REST APIrequest
+
+### [**Portal**](#tab/portal-check-indexer)
+
+1. On the search service page, open **Search management** > **Indexers**.
 
+1. Select an indexer to access configuration and execution history.
+
+1. Select a specific indexer job to view details, warnings, and errors.
+
+### [**REST**](#tab/rest-check-indexer)
 ```http
 GET https://myservice.search.windows.net/indexers/myindexer/status?api-version=2024-07-01
   Content-Type: application/json  
@@ -282,6 +353,8 @@ The response includes status and the number of items processed. It should look s
     }
 ```
 
+---
+
 Execution history contains up to 50 of the most recently completed executions, which are sorted in the reverse chronological order so that the latest execution comes first.
 
 <a name="DataChangeDetectionPolicy"></a>
@@ -340,7 +413,7 @@ When rows are deleted from the collection, you normally want to delete those row
 
 If you're using a custom query, make sure that the property referenced by `softDeleteColumnName` is projected by the query.
 
-The `softDeleteColumnName` must be a top-level field in the index. Using nested fields within complex data types as the `softDeleteColumnName` is not supported.
+The `softDeleteColumnName` must be a top-level field in the index. Using nested fields within complex data types as the `softDeleteColumnName` isn't supported.
 
 The following example creates a data source with a soft-deletion policy: