Merge pull request #1376 from HeidiSteen/heidist-ignite

prmerger-automator[bot] · web-flow · commit f694996716be · 2024-11-08T01:03:51.000Z
[release-azure-search] List portal data support for structured data sources
diff --git a/articles/search/search-file-storage-integration.md b/articles/search/search-file-storage-integration.md
@@ -9,7 +9,7 @@ ms.service: azure-ai-search
 ms.custom:
   - ignite-2023
 ms.topic: how-to
-ms.date: 08/23/2024
+ms.date: 11/19/2024
 ---
 
 # Index data from Azure Files
@@ -19,7 +19,12 @@ ms.date: 08/23/2024
 
 In this article, learn how to configure an [**indexer**](search-indexer-overview.md) that imports content from Azure Files and makes it searchable in Azure AI Search. Inputs to the indexer are your files in a single share. Output is a search index with searchable content and metadata stored in individual fields.
 
-This article supplements [**Create an indexer**](search-howto-create-indexers.md) with information that's specific to indexing files in Azure Storage. It uses the REST APIs to demonstrate a three-part workflow common to all indexers: create a data source, create an index, create an indexer. Data extraction occurs when you submit the Create Indexer request.
+To configure and run the indexer, you can use:
+
++ [Search Service preview REST APIs](/rest/api/searchservice), any preview version.
++ An Azure SDK package, any version.
++ [Import data](search-get-started-portal.md) wizard in the Azure portal.
++ [Import and vectorize data](search-get-started-portal-import-vectors.md) wizard in the Azure portal.
 
 ## Prerequisites
 
@@ -33,6 +38,16 @@ This article supplements [**Create an indexer**](search-howto-create-indexers.md
 
 + Use a [REST client](search-get-started-rest.md) to formulate REST calls similar to the ones shown in this article.
 
+## Supported tasks
+
+You can use this indexer for the following tasks:
+
++ **Data indexing and incremental indexing:** The indexer can index files and associated metadata from tables. It detects new and updated files and metadata through built-in change detection. You can configure data refresh on a schedule or on demand.
++ **Deletion detection:** The indexer can [detect deletions through custom metadata](search-howto-index-changed-deleted-blobs.md).
++ **Applied AI through skillsets:** [Skillsets](cognitive-search-concept-intro.md) are fully supported by the indexer. This includes key features like [integrated vectorization](vector-search-integrated-vectorization.md) that adds data chunking and embedding steps.
++ **Parsing modes:** The indexer supports [JSON parsing modes](search-howto-index-json-blobs.md) if you want to parse JSON arrays or lines into individual search documents. It also supports [Markdown parsing mode](search-how-to-index-markdown-blobs.md).
++ **Compatibility with other features:** The indexer is designed to work seamlessly with other indexer features, such as [debug sessions](cognitive-search-debug-session.md), [indexer cache for incremental enrichments](search-howto-incremental-index.md), and [knowledge store](knowledge-store-concept-intro.md).
+
 ## Supported document formats
 
 The Azure Files indexer can extract text from the following document formats:
diff --git a/articles/search/search-get-started-portal-import-vectors.md b/articles/search/search-get-started-portal-import-vectors.md
@@ -8,47 +8,44 @@ ms.service: azure-ai-search
 ms.custom:
   - build-2024
 ms.topic: quickstart
-ms.date: 10/18/2024
+ms.date: 11/19/2024
 ---
 
 # Quickstart: Vectorize text and images by using the Azure portal
 
 This quickstart helps you get started with [integrated vectorization](vector-search-integrated-vectorization.md) by using the **Import and vectorize data** wizard in the Azure portal. The wizard chunks your content and calls an embedding model to vectorize content during indexing and for queries.
 
-Key points about the wizard:
+## Prerequisites
 
-+ Supported data sources are Azure Blob Storage, Azure Data Lake Storage (ADLS) Gen2, or OneLake files and shortcuts.
-+ Supported embedding models are hosted on Azure OpenAI, Azure AI Studio model catalog, Azure AI Vision multimodal.
-+ Index schema provides vector and nonvector fields for chunked data. 
-+ You can add fields, but you can't delete or modify generated fields.
-+ Document parsing mode creates chunks (one search document per chunk).
-+ Chunking is nonconfigurable. The effective settings are:
++ An Azure subscription. [Create one for free](https://azure.microsoft.com/free/).
 
-  ```json
-   "textSplitMode": "pages",
-   "maximumPageLength": 2000,
-   "pageOverlapLength": 500,
-   "maximumPagesToTake": 0, #unlimited
-   "unit": "characters",
-  ```
++ [An Azure AI Search service](search-create-service-portal.md) in the same region as Azure AI. We recommend the Basic tier or higher.
 
-## Prerequisites
++ [A supported data source](#supported-data-sources).
 
-+ An Azure subscription. [Create one for free](https://azure.microsoft.com/free/).
++ [A supported embedding model](#supported-embedding-models).
+
+### Supported data sources
+
++ [Azure Data Lake Storage (ADLS) Gen2](/azure/storage/blobs/create-data-lake-storage-account) (a storage account with a hierarchical namespace).
 
-+ [Azure AI Search service](search-create-service-portal.md) in the same region as Azure AI. We recommend the Basic tier or higher.
++ [Azure Storage](/azure/storage/common/storage-account-create) for blobs, files, and tables. Azure Storage must be a standard performance (general-purpose v2) account. Access tiers can be hot, cool, and cold.
 
-+ [Azure Blob Storage](/azure/storage/common/storage-account-create), [Azure Data Lake Storage (ADLS) Gen2](/azure/storage/blobs/create-data-lake-storage-account) (a storage account with a hierarchical namespace), or a [OneLake lakehouse](search-how-to-index-onelake-files.md).
++ [Azure Cosmos DB](/azure/cosmos-db/nosql/quickstart-portal) for NoSQL, Mongo DB, and Apache Gremlin.
 
-  Azure Storage must be a standard performance (general-purpose v2) account. Access tiers can be hot, cool, and cold.
++ [Azure SQL Database](/azure/azure-sql/database/single-database-create-quickstart), [Azure SQL Managed Instance](/azure/azure-sql/managed-instance/instance-create-quickstart), and Azure SQL Server virtual machines.
 
-+ An embedding model on an Azure AI platform in the [same region as Azure AI Search](search-create-service-portal.md#regions-with-the-most-overlap). [Deployment instructions](#set-up-embedding-models) are in this article.
++ [OneLake lakehouse](search-how-to-index-onelake-files.md).
 
-  | Provider | Supported models |
-  |---|---|
-  | [Azure OpenAI Service](https://aka.ms/oai/access) | text-embedding-ada-002, text-embedding-3-large, or text-embedding-3-small. |
-  | [Azure AI Studio model catalog](/azure/ai-studio/what-is-ai-studio) |  Azure, Cohere, and Facebook embedding models. |
-  | [Azure AI services multi-service account](/azure/ai-services/multi-service-resource) | [Azure AI Vision multimodal](/azure/ai-services/computer-vision/how-to/image-retrieval) for image and text vectorization. Azure AI Vision multimodal is available in selected regions. [Check the documentation](/azure/ai-services/computer-vision/how-to/image-retrieval?tabs=csharp) for an updated list. **To use this resource, the account must be in an available region and in the same region as Azure AI Search**. |
+### Supported embedding models
+
+Use an embedding model on an Azure AI platform in the [same region as Azure AI Search](search-create-service-portal.md#regions-with-the-most-overlap). [Deployment instructions](#set-up-embedding-models) are in this article.
+
+| Provider | Supported models |
+|---|---|
+| [Azure OpenAI Service](https://aka.ms/oai/access) | text-embedding-ada-002, text-embedding-3-large, or text-embedding-3-small. |
+| [Azure AI Studio model catalog](/azure/ai-studio/what-is-ai-studio) |  Azure, Cohere, and Facebook embedding models. |
+| [Azure AI services multi-service account](/azure/ai-services/multi-service-resource) | [Azure AI Vision multimodal](/azure/ai-services/computer-vision/how-to/image-retrieval) for image and text vectorization. Azure AI Vision multimodal is available in selected regions. [Check the documentation](/azure/ai-services/computer-vision/how-to/image-retrieval?tabs=csharp) for an updated list. **To use this resource, the account must be in an available region and in the same region as Azure AI Search**. |
 
 If using the Azure OpenAI Service, it must have an associated [custom subdomain](/azure/ai-services/cognitive-services-custom-subdomains). If the service was created through the Azure portal, this subdomain is automatically generated as part of your service setup. Ensure that your service includes a custom subdomain before using it with the Azure AI Search integration.
 
@@ -60,17 +57,17 @@ For the purposes of this quickstart, all of the preceding resources must have pu
 
 If private endpoints are already present and you can't disable them, the alternative option is to run the respective end-to-end flow from a script or program on a virtual machine. The virtual machine must be on the same virtual network as the private endpoint. [Here's a Python code sample](https://github.com/Azure/azure-search-vector-samples/tree/main/demo-python/code/integrated-vectorization) for integrated vectorization. The same [GitHub repo](https://github.com/Azure/azure-search-vector-samples/tree/main) has samples in other programming languages.
 
-### Role-based access control requirements
+### Role requirements
 
 We recommend role assignments for search service connections to other resources.
 
 1. On Azure AI Search, [enable roles](search-security-enable-roles.md).
 
 1. Configure your search service to [use a managed identity](search-howto-managed-identities-data-sources.md#create-a-system-managed-identity).
 
-1. On your data source platform and embedding model provider, create role assignments that allow search service to access data and models. [Prepare sample data](#prepare-sample-data) provides instructions for setting up roles.
+1. On your data source platform and embedding model provider, create role assignments that allow search service to access data and models. [Prepare sample data](#prepare-sample-data) provides instructions for setting up roles for each supported data source.
 
-A free search service supports RBAC on connections to Azure AI Search, but it doesn't support managed identities on outbound connections to Azure Storage or Azure AI Vision. This level of support means you must use key-based authentication on connections between a free search service and other Azure services. 
+A free search service supports role-based connections to Azure AI Search, but it doesn't support managed identities on outbound connections to Azure Storage or Azure AI Vision. This level of support means you must use key-based authentication on connections between a free search service and other Azure services. 
 
 For more secure connections:
 
@@ -216,7 +213,7 @@ The wizard supports Azure, Cohere, and Facebook embedding models in the Azure AI
 
 1. Deploy a supported embedding model to the model catalog in your project.
 
-1. For RBAC, create two role assignments: one for Azure AI Search, and another for the AI Studio project. Assign the [Cognitive Services OpenAI User](/azure/ai-services/openai/how-to/role-based-access-control) role for embeddings and vectorization.
+1. For role-based connections, create two role assignments: one for Azure AI Search, and another for the AI Studio project. Assign the [Cognitive Services OpenAI User](/azure/ai-services/openai/how-to/role-based-access-control) role for embeddings and vectorization.
 
 ---
 
@@ -307,6 +304,16 @@ Support for OneLake indexing is in preview. For more information about supported
 
 In this step, specify the embedding model for vectorizing chunked data.
 
+Chunking is built-in and nonconfigurable. The effective settings are:
+
+```json
+"textSplitMode": "pages",
+"maximumPageLength": 2000,
+"pageOverlapLength": 500,
+"maximumPagesToTake": 0, #unlimited
+"unit": "characters"
+```
+
 1. On the **Vectorize your text** page, choose the source of the embedding model:
 
    + Azure OpenAI
@@ -361,6 +368,12 @@ On the **Advanced settings** page, you can optionally add [semantic ranking](sem
 
 ## Map new fields
 
+Key points about this step:
+
++ Index schema provides vector and nonvector fields for chunked data. 
++ You can add fields, but you can't delete or modify generated fields.
++ Document parsing mode creates chunks (one search document per chunk).
+
 On the **Advanced settings** page, you can optionally add new fields. By default, the wizard generates the following fields with these attributes:
 
 | Field | Applies to | Description |
diff --git a/articles/search/search-get-started-portal.md b/articles/search/search-get-started-portal.md
@@ -7,7 +7,7 @@ author: HeidiSteen
 ms.author: heidist
 ms.service: azure-ai-search
 ms.topic: quickstart
-ms.date: 05/30/2024
+ms.date: 11/19/2024
 ms.custom:
   - mode-ui
   - ignite-2023
diff --git a/articles/search/search-how-to-index-onelake-files.md b/articles/search/search-how-to-index-onelake-files.md
@@ -9,22 +9,19 @@ ms.service: azure-ai-search
 ms.custom:
   - build-2024
 ms.topic: how-to
-ms.date: 06/12/2024
+ms.date: 11/19/2024
 ---
 
 # Index data from OneLake files and shortcuts
   
 In this article, learn how to configure a OneLake files indexer for extracting searchable data and metadata data from a [lakehouse](/fabric/onelake/create-lakehouse-onelake) on top of [OneLake](/fabric/onelake/onelake-overview). 
 
-Use this indexer for the following tasks:
-  
-+ **Data indexing and incremental indexing:** The indexer can index files and associated metadata from data paths within a lakehouse. It detects new and updated files and metadata through built-in change detection. You can configure data refresh on a schedule or on demand. 
-+ **Deletion detection:** The indexer can [detect deletions via custom metadata](#detect-deletions-via-custom-metadata) for most files and shortcuts. This requires adding metadata to files to signify that they have been "soft deleted", enabling their removal from the search index. Currently, it's not possible to detect deletions in Google Cloud Storage or Amazon S3 shortcut files because custom metadata isn't supported for those data sources.
-+ **Applied AI through skillsets:** [Skillsets](cognitive-search-concept-intro.md) are fully supported by the OneLake files indexer. This includes key features like [integrated vectorization](vector-search-integrated-vectorization.md) that adds data chunking and embedding steps.
-+ **Parsing modes:** The indexer supports [JSON parsing modes](search-howto-index-json-blobs.md) if you want to parse JSON arrays or lines into individual search documents.
-+ **Compatibility with other features:** The OneLake indexer is designed to work seamlessly with other indexer features, such as [debug sessions](cognitive-search-debug-session.md), [indexer cache for incremental enrichments](search-howto-incremental-index.md), and [knowledge store](knowledge-store-concept-intro.md). 
+To configure and run the indexer, you can use:
 
-Use the [2024-05-01-preview REST API](/rest/api/searchservice/data-sources/create-or-update?view=rest-searchservice-2024-05-01-preview&tabs=HTTP&preserve-view=true), a beta Azure SDK package, or [Import and vectorize data](search-get-started-portal-import-vectors.md) in the Azure portal to index from OneLake.
++ [2024-05-01-preview REST API](/rest/api/searchservice/data-sources/create-or-update?view=rest-searchservice-2024-05-01-preview&tabs=HTTP&preserve-view=true) or a newer preview REST API.
++ An Azure SDK beta package that provides the feature.
++ [Import data](search-get-started-portal.md) wizard in the Azure portal.
++ [Import and vectorize data](search-get-started-portal-import-vectors.md) wizard in the Azure portal.
 
 This article uses the REST APIs to illustrate each step.
   
@@ -47,7 +44,17 @@ This article uses the REST APIs to illustrate each step.
 + A Contributor role assignment in the Microsoft Fabric workspace where the lakehouse is located. Steps are outlined in the [Grant permissions](#assign-service-permissions) section of this article.
 
 + [A REST client](search-get-started-rest.md) to formulate REST calls similar to the ones shown in this article.
+
+## Supported tasks
+
+You can use this indexer for the following tasks:
   
++ **Data indexing and incremental indexing:** The indexer can index files and associated metadata from data paths within a lakehouse. It detects new and updated files and metadata through built-in change detection. You can configure data refresh on a schedule or on demand. 
++ **Deletion detection:** The indexer can [detect deletions via custom metadata](#detect-deletions-via-custom-metadata) for most files and shortcuts. This requires adding metadata to files to signify that they have been "soft deleted", enabling their removal from the search index. Currently, it's not possible to detect deletions in Google Cloud Storage or Amazon S3 shortcut files because custom metadata isn't supported for those data sources.
++ **Applied AI through skillsets:** [Skillsets](cognitive-search-concept-intro.md) are fully supported by the OneLake files indexer. This includes key features like [integrated vectorization](vector-search-integrated-vectorization.md) that adds data chunking and embedding steps.
++ **Parsing modes:** The indexer supports [JSON parsing modes](search-howto-index-json-blobs.md) if you want to parse JSON arrays or lines into individual search documents. It also supports [Markdown parsing mode](search-how-to-index-markdown-blobs.md).
++ **Compatibility with other features:** The OneLake indexer is designed to work seamlessly with other indexer features, such as [debug sessions](cognitive-search-debug-session.md), [indexer cache for incremental enrichments](search-howto-incremental-index.md), and [knowledge store](knowledge-store-concept-intro.md).
+
 <a name="SupportedFormats"></a>
 
 ## Supported document formats  
@@ -69,7 +76,7 @@ The following OneLake shortcuts are supported by the OneLake files indexer:
 + [Google Cloud Storage shortcut](/fabric/onelake/create-gcs-shortcut)
 
 ## Limitations in this preview
- 
+
 + Parquet (including delta parquet) file types aren't currently supported.
 
 + File deletion isn't supported for Amazon S3 and Google Cloud Storage shortcuts.
diff --git a/articles/search/search-howto-indexing-azure-blob-storage.md b/articles/search/search-howto-indexing-azure-blob-storage.md