Merge pull request #177789 from HeidiSteen/heidist-gh

v-andreaco · web-flow · commit 12377914e932 · 2021-11-02T09:52:04.000-06:00
[azure search] indexer edits
diff --git a/articles/search/knowledge-store-concept-intro.md b/articles/search/knowledge-store-concept-intro.md
@@ -40,7 +40,7 @@ The following video explains both of these benefits and more.
 
 A knowledge store is defined inside a skillset definition and it has two components: 
 
-+ a connection string to Azure Storage
++ A connection string to Azure Storage
 
 + [**Projections**](knowledge-store-projection-overview.md) that determine whether the knowledge store consists of tables, objects or files. 
 
diff --git a/articles/search/search-howto-create-indexers.md b/articles/search/search-howto-create-indexers.md
@@ -8,20 +8,23 @@ author: HeidiSteen
 ms.author: heidist
 ms.service: cognitive-search
 ms.topic: conceptual
-ms.date: 01/28/2021
+ms.date: 11/02/2021
 ---
 
 # Creating indexers in Azure Cognitive Search
 
-A search indexer provides an automated workflow for transferring documents and content from an external data source, to a search index on your search service. As originally designed, it extracts text and metadata from an Azure data source, serializes documents into JSON, and passes off the resulting documents to a search engine for indexing. It's since been extended to support [AI enrichment](cognitive-search-concept-intro.md) for deep content processing. 
+A search indexer provides an automated workflow for reading content from an external data source, and ingesting that content into a search index on your search service. Indexers support two workflows: 
+
++ Extracting text and metadata for full text search
++ Analyzing images and large undifferentiated text for text and structure, adding [AI enrichment](cognitive-search-concept-intro.md) to the pipeline for deeper content processing. 
 
 Using indexers significantly reduces the quantity and complexity of the code you need to write. This article focuses on the mechanics of creating an indexer as preparation for more advanced work with source-specific indexers and [skillsets](cognitive-search-working-with-skillsets.md).
 
-## What's an indexer definition?
+## Indexer structure
 
-Indexers are used for either text-based indexing that pulls alphanumeric content from source fields into index fields, or AI-based processing that analyzes undifferentiated text for structure, or analyzes images for text and information, also adding that content to an index. The following index definitions are typical of what you might create for either scenario.
+The following index definitions are typical of what you might create for text-based and AI enrichment scenarios.
 
-### Indexers for text content
+### Indexing for full text search
 
 The original purpose of an indexer was to simplify the complex process of loading an index by providing a mechanism for connecting to and reading text and numeric content from fields in a data source, serialize that content as JSON documents, and hand off those documents to the search engine for indexing. This is still a primary use case, and for this operation, you'll need to create an indexer with the properties defined in the following example.
 
@@ -45,7 +48,7 @@ The **`parameters`** property modifies run time behaviors, such as how many erro
 
 The **`field mappings`** property is used to explicitly map source-to-destination fields if those fields differ by name or type. Other properties (not shown), are used to [specify a schedule](search-howto-schedule-indexers.md), create the indexer in a disabled state, or specify an [encryption key](search-security-manage-encryption-keys.md) for supplemental encryption of data at rest.
 
-### Indexers for AI indexing
+### Indexing for AI enrichment
 
 Because indexers are the mechanism by which a search service makes outbound requests, indexers were extended to support AI enrichments, adding infrastructure and objects to implement this use case.
 
@@ -73,27 +76,27 @@ All of the above properties and parameters apply to indexers that perform AI enr
 
 AI enrichment is beyond the scope of this article. For more information, start with these articles: [AI enrichment](cognitive-search-concept-intro.md), [Skillsets in Azure Cognitive Search](cognitive-search-working-with-skillsets.md), and [Create Skillset (REST)](/rest/api/searchservice/create-skillset).
 
-## Choose an indexer client and create the indexer
+## Prerequisites
 
-When you are ready to create an indexer on a remote search service, you will need a search client in the form of a tool, like Azure portal or Postman, or code that instantiates an indexer client. We recommend the Azure portal or REST APIs for early development and proof-of-concept testing.
++ Use a [supported data source](search-indexer-overview.md#supported-data-sources).
 
-### Permissions
++ Have admin rights. All operations related to indexers, including GET requests for status or definitions, require an [admin api-key](search-security-api-keys.md) on the request.
 
-All operations related to indexers, including GET requests for status or definitions, require an [admin api-key](search-security-api-keys.md) on the request.
+All [service tiers limit](search-limits-quotas-capacity.md#indexer-limits) the number of objects that you can create. If you are experimenting on the Free tier, you can only have 3 objects of each type and 2 minutes of indexer processing (not including skillset processing).
 
-### Limits
+## How to create indexers
 
-All [service tiers limit](search-limits-quotas-capacity.md#indexer-limits) the number of objects that you can create. If you are experimenting on the Free tier, you can only have 3 objects of each type and 2 minutes of indexer processing (not including skillset processing).
+When you are ready to create an indexer on a remote search service, you will need a search client in the form of a tool, like Azure portal or Postman, or code that instantiates an indexer client. We recommend the Azure portal or REST APIs for early development and proof-of-concept testing.
 
-### Use Azure portal to create an indexer
+### [**Azure portal**](#tab/indexer-portal)
 
 The portal provides two options for creating an indexer: [**Import data wizard**](search-import-data-portal.md) and **New Indexer** that provides fields for specifying an indexer definition. The wizard is unique in that it creates all of the required elements. Other approaches require that you have predefined a data source and index.
 
 The following screenshot shows where you can find these features in the portal. 
 
   :::image type="content" source="media/search-howto-create-indexers/portal-indexer-client.png" alt-text="hotels indexer" border="true":::
 
-### Use a REST client
+### [**REST**](#tab/kstore-rest)
 
 Both Postman and Visual Studio Code (with an extension for Azure Cognitive Search) can function as an indexer client. Using either tool, you can connect to your search service and send [Create Indexer (REST)](/rest/api/searchservice/create-indexer) requests. There are numerous tutorials and examples that demonstrate REST clients for creating objects. 
 
@@ -104,7 +107,7 @@ Start with either of these articles to learn about each client:
 
 Refer to the [Indexer operations (REST)](/rest/api/searchservice/Indexer-operations) for help with formulating indexer requests.
 
-### Use an SDK
+### [**.NET SDK**](#tab/kstore-dotnet)
 
 For Cognitive Search, the Azure SDKs implement generally available features. As such, you can use any of the SDKs to create indexer-related objects. All of them provide a **SearchIndexerClient** that has methods for creating indexers and related objects, including skillsets.
 
@@ -115,9 +118,11 @@ For Cognitive Search, the Azure SDKs implement generally available features. As
 | JavaScript | [SearchIndexerClient](/javascript/api/@azure/search-documents/searchindexerclient) | [Indexers](https://github.com/Azure/azure-sdk-for-js/tree/main/sdk/search/search-documents/samples/v11/javascript) |
 | Python | [SearchIndexerClient](/python/api/azure-search-documents/azure.search.documents.indexes.searchindexerclient) | [sample_indexers_operations.py](https://github.com/Azure/azure-sdk-for-python/blob/master/sdk/search/azure-search-documents/samples/sample_indexers_operations.py) |
 
+---
+
 ## Run the indexer
 
-An indexer runs automatically when you create the indexer on the service. This is the moment of truth where you will find out if there are data source connection errors, field mapping issues, or skillset problems. 
+Unless you set the **`disabled=true`** in the indexer definition, an indexer runs immediately when you create the indexer on the service. This is the moment of truth where you will find out if there are data source connection errors, field mapping issues, or skillset problems. 
 
 There are several ways to run an indexer:
 
@@ -127,9 +132,6 @@ There are several ways to run an indexer:
 
 + Run a program that calls SearchIndexerClient methods for create, update, or run.
 
-> [!NOTE]
-> To avoid immediately running an indexer upon creation, include **`disabled=true`** in the indexer definition.
-
 Alternatively, put the indexer [on a schedule](search-howto-schedule-indexers.md) to invoke processing at regular intervals. 
 
 Scheduled processing usually coincides with a need for incremental indexing of changed content. Change detection logic is a capability that's built into source platforms. Changes in a blob container are detected by the indexer automatically. For guidance on leveraging change detection in other data sources, refer to the indexer docs for specific data sources:
@@ -153,17 +155,17 @@ For large indexing loads, an indexer also keeps track of the last document it pr
 
 If you need to clear the high water mark to re-index in full, you can use [Reset Indexer](/rest/api/searchservice/reset-indexer). For more selective re-indexing, use [Reset Skills](/rest/api/searchservice/preview-api/reset-skills) or [Reset Documents](/rest/api/searchservice/preview-api/reset-documents). Through the reset APIs, you can clear internal state, and also flush the cache if you enabled [incremental enrichment](search-howto-incremental-index.md). For more background and comparison of each reset option, see [Run or reset indexers, skills, and documents](search-howto-run-reset-indexers.md).
 
-## Know your data
+## Data preparation
 
-Indexers expect a tabular row set, where each row becomes a full or partial search document in the index. Often, there is a one-to-one correspondence between a row and the resulting search document, where all the fields in the row set fully populate each document. But you can use indexers to generate just part of a document, for example if you're using multiple indexers or approaches to build out the index. 
+Indexers expect a tabular row set, where each row becomes a full or partial search document in the index. Often, there is a one-to-one correspondence between a row in a database and the resulting search document, where all the fields in the row set fully populate each document. But you can use indexers to generate a subset of a document's fields, and fill in the remaining fields using a different indexer or methodology. 
 
 To flatten relational data into a row set, you should create a SQL view, or build a query that returns parent and child records in the same row. For example, the built-in hotels sample dataset is a SQL database that has 50 records (one for each hotel), linked to room records in a related table. The query that flattens the collective data into a row set embeds all of the room information in JSON documents in each hotel record. The embedded room information is a generated by a query that uses a **FOR JSON AUTO** clause. You can learn more about this technique in [define a query that returns embedded JSON](index-sql-relational-data.md#define-a-query-that-returns-embedded-json). This is just one example; you can find other approaches that will produce the same effect.
 
 In addition to flattened data, it's important to pull in only searchable data. Searchable data is alphanumeric. Cognitive Search cannot search over binary data in any format, although it can extract and infer text descriptions of image files (see [AI enrichment](cognitive-search-concept-intro.md)) to create searchable content. Likewise, using AI enrichment, large text can be analyzed by natural language models to find structure or relevant information, generating new content that you can add to a search document.
 
 Given that indexers don't fix data problems, other forms of data cleansing or manipulation might be needed. For more information, you should refer to the product documentation of your [Azure database product](../index.yml?product=databases).
 
-## Know your index
+## Index preparation
 
 Recall that indexers pass off the search documents to the search engine for indexing. Just as indexers have properties that determine execution behavior, an index schema has properties that profoundly affect how strings are indexed (only strings are analyzed and tokenized). Depending on analyzer assignments, indexed strings might be different from what you passed in. You can evaluate the effects of analyzers using [Analyze Text (REST)](/rest/api/searchservice/test-analyzer). For more information about analyzers, see [Analyzers for text processing](search-analyzers.md).
 
diff --git a/articles/search/search-howto-run-reset-indexers.md b/articles/search/search-howto-run-reset-indexers.md