Merge pull request #220614 from HeidiSteen/heidist-work

prmerger-automator[bot] · web-flow · commit 59b06c9031cf · 2022-12-07T16:43:01.000Z
[azure search] Revised indexer maximum limits
diff --git a/articles/search/search-howto-create-indexers.md b/articles/search/search-howto-create-indexers.md
@@ -10,7 +10,7 @@ ms.author: heidist
 ms.service: cognitive-search
 ms.custom: ignite-2022
 ms.topic: how-to
-ms.date: 05/11/2022
+ms.date: 12/06/2022
 ---
 
 # Creating indexers in Azure Cognitive Search
@@ -31,7 +31,7 @@ When you create an indexer, the definition will adhere to one of two patterns: t
 
 ### Indexer definition for text-based indexing
 
-Full text search is the primary use case for indexers, and for this workflow, an indexer will look like this example.
+Text-based indexing for full text search is the primary use case for indexers, and for this workflow, an indexer will look like this example.
 
 ```json
 {
diff --git a/articles/search/search-howto-run-reset-indexers.md b/articles/search/search-howto-run-reset-indexers.md
@@ -8,7 +8,7 @@ ms.author: heidist
 ms.service: cognitive-search
 ms.custom: ignite-2022
 ms.topic: how-to
-ms.date: 01/07/2022
+ms.date: 12/06/2022
 ---
 
 # Run or reset indexers, skills, or documents
@@ -25,7 +25,12 @@ This article explains how to run indexers on demand, with and without a reset.
 
 [Run Indexer](/rest/api/searchservice/run-indexer) will detect and process only what it necessary to synchronize the search index with changes in the underlying data source. Incremental indexing starts by locating an internal high-water mark to find the last updated search document, which becomes the starting point for indexer execution over new and updated documents in the data source.
 
-Change detection is essential for determining what's new or updated in the data source. If the content is unchanged, Run has no effect. Blob storage has built-in change detection through its LastModified property. Other data sources, such as Azure SQL or Azure Cosmos DB, have to be configured for change detection before the indexer can read new and updated rows. 
+Change detection is essential for determining what's new or updated in the data source. Indexers use the change detection capabilities of the underlying data source to determine what's new or updated in the data source. 
+
++ Azure Storage has built-in change detection through its LastModified property
++ Other data sources, such as Azure SQL or Azure Cosmos DB, have to be configured for change detection before the indexer can read new and updated rows. 
+
+If the underlying content is unchanged, a run operation has no effect. In this case, indexer execution history will indicate `0\0` documents processed.
 
 ## Indexer execution
 
@@ -37,21 +42,21 @@ Indexer limits vary by the workload. For each workload, the following job limits
 
 | Workload | Maximum duration | Maximum jobs | Execution environment <sup>1</sup> |
 |----------|------------------|---------------------|-----------------------------|
-| Text-based indexing <sup>3</sup> | 2 or 24 hours | One per search unit <sup>2</sup> | Typically runs on the search service. It may also run on internally managed, multi-tenant content processing cluster. |
+| Text-based indexing | 2 or 24 hours <sup>2</sup> | One per search unit <sup>3</sup> | Typically runs on the search service. It may also run on internally managed, multi-tenant content processing cluster. |
 | Skills-based indexing | 2 hours | Indeterminate | Typically runs on an internally managed, multi-tenant content processing cluster, depending on how complex the skillset is. A simple skill might execute on your search service if the service has capacity. Otherwise, skills-based indexer jobs execute off-service. Because the content processing cluster is multi-tenant, nodes are added to meet demand. If you experience a delay in on-demand or scheduled execution, it's probably because the system is either adding nodes or waiting for one to become available.|
 
-<sup>1</sup> For optimum processing, a search service determines the internal execution environment for the indexer operation. The execution environment is either the search service or a multi-tenant environment that's managed and secured by Microsoft at no extra cost. You can't control or configure which environment is used. Using an internally managed cluster for skillset processing leaves more service-specific resources available for routine operations like queries and text indexing.
+<sup>1</sup> Indexers have an internal execution environment that's determined by the search service. The execution environment is either your search service or a multi-tenant environment that's managed and secured by Microsoft at no extra cost. You can't control or configure which environment is used. Using an internally managed cluster for content processing leaves more service-specific resources available for routine operations like queries and text indexing.
 
-<sup>2</sup> Search units can be [flexible combinations](search-capacity-planning.md#partition-and-replica-combinations) of partitions and replicas, and maximum indexer jobs aren't tied to one or the other. In other words, if you have four units, you can have four text-based indexer jobs running concurrently, no matter how the search units are deployed.
+<sup>2</sup> Indexer-based indexing is subject to a 2-hour maximum duration. Currently, some indexers have a longer 24-hour maximum execution window, but that behavior isn’t the norm. The longer window only applies if a service or its indexers can’t be internally migrated to the newer runtime behavior. If more than 2 hours are needed to process all of the data, [enable change detection](search-howto-create-indexers.md#change-detection-and-internal-state) and [schedule the indexer](search-howto-schedule-indexers.md) to run at 2-hour intervals.
 
-<sup>3</sup>  Indexer maximum run time for Basic tier or higher can be 2 or 24 hours, depending on system resources, product implementation and other factors.
+<sup>3</sup> Search units can be [flexible combinations](search-capacity-planning.md#partition-and-replica-combinations) of partitions and replicas, and maximum indexer jobs aren't tied to one or the other. In other words, if you have four units, you can have four text-based indexer jobs running concurrently, no matter how the search units are deployed.
 
 > [!TIP]
 > If you are [indexing a large data set](search-howto-large-index.md), you can stretch processing out by putting the indexer [on a schedule](search-howto-schedule-indexers.md). For the full list of all indexer-related limits, see [indexer limits](search-limits-quotas-capacity.md#indexer-limits)
 
 ## Resetting indexers
 
-After the initial run, an indexer keeps track of which search documents have been indexed through an internal *high-water mark*. The marker is never exposed, but internally the indexer knows where it last stopped, so that it can pick up where it left off on the next run.
+After the initial run, an indexer keeps track of which search documents have been indexed through an internal *high-water mark*. The marker is never exposed, but internally the indexer knows where it last stopped.
 
 If you need to rebuild all or part of an index, you can clear the indexer's high-water mark through a reset. Reset APIs are available at decreasing levels in the object hierarchy:
 
@@ -77,7 +82,7 @@ As previously noted, reset is a passive operation: you must follow up a Run requ
 
 Reset/run operations apply to a search index or a knowledge store, to specific documents or projections, and to cached enrichments if a reset explicitly or implicitly includes skills.
 
-Reset also applies to just new and update operations. It will not trigger deletion or clean up of orphaned documents in the search index. For more information about deleting documents, see [Add, Update or Delete Documents](/rest/api/searchservice/AddUpdate-or-Delete-Documents).
+Reset also applies to create and update operations. It will not trigger deletion or clean up of orphaned documents in the search index. For more information about deleting documents, see [Add, Update or Delete Documents](/rest/api/searchservice/AddUpdate-or-Delete-Documents).
 
 Once you reset an indexer, you can't undo the action.
 
diff --git a/articles/search/search-howto-schedule-indexers.md b/articles/search/search-howto-schedule-indexers.md
@@ -9,7 +9,7 @@ ms.author: heidist
 ms.service: cognitive-search
 ms.custom: ignite-2022
 ms.topic: how-to
-ms.date: 12/01/2022
+ms.date: 12/06/2022
 ---
 
 # Schedule an indexer in Azure Cognitive Search
@@ -22,7 +22,9 @@ Indexers can be configured to run on a schedule when you set the "schedule" prop
 
 + Source data is very large and you want to spread the indexer processing over time. 
 
-  Indexer jobs are subject to a maximum running time of 2 hours or 24 hours (this period will vary based on different implementation factors that aren't exposed). If indexing can't complete within the maximum interval, you can configure a schedule that runs every 2 hours. Indexers can automatically pick up where they left off, based on an internal high water mark that marks where indexing last ended. Running an indexer on a recurring 2-hour schedule allows it to process a very large data set (many millions of documents). For more information about indexing large data volumes, see [How to index large data sets in Azure Cognitive Search](search-howto-large-index.md).
+Indexer jobs are subject to a 2-hour maximum duration. Currently, some indexers have a longer 24-hour maximum execution window, but that behavior isn’t the norm. The longer window only applies if a service or its indexers can’t be internally migrated to the newer runtime behavior. 
+
+If indexing can't complete within the maximum interval, you can [schedule the indexer](search-howto-schedule-indexers.md) to run every 2 hours. As long as your data source supports [change detection logic](search-howto-create-indexers.md#change-detection-and-internal-state), indexers can automatically pick up where they left off, based on an internal high water mark that marks where indexing last ended. Running an indexer on a recurring 2-hour schedule allows it to process a very large data set (many millions of documents). For more information about indexing large data volumes, see [How to index large data sets in Azure Cognitive Search](search-howto-large-index.md).
 
 ## Prerequisites
 
@@ -36,16 +38,16 @@ A schedule is part of the indexer definition. If the "schedule" property is omit
 
 | Property | Description |
 |----------|-------------|
-| "interval" | (required) The amount of time between the start of two consecutive indexer executions. The smallest interval allowed is 5 minutes, and the longest is 1440 minutes (24 hours). It must be formatted as an XSD "dayTimeDuration" value (a restricted subset of an [ISO 8601 duration](https://www.w3.org/TR/xmlschema11-2/#dayTimeDuration) value). </br></br>The pattern for this is: `P(nD)(T(nH)(nM))`. </br></br>Examples: `PT15M` for every 15 minutes, `PT2H` for every 2 hours.|
+| "interval" | (required) The amount of time between the start of two consecutive indexer executions. The smallest interval allowed is 5 minutes, and the longest is 1440 minutes (24 hours). It must be formatted as an XSD "dayTimeDuration" value (a restricted subset of an [ISO 8601 duration](https://www.w3.org/TR/xmlschema11-2/#dayTimeDuration) value). </br></br>The pattern for this is: `P(nD)(T(nH)(nM))`. </br></br>Examples: `PT15M` for every 15 minutes, `PT2H` for every two hours.|
 | "startTime" | (optional) Start time is specified in coordinated universal time (UTC). If omitted, the current time is used. This time can be in the past, in which case the first execution is scheduled as if the indexer has been running continuously since the original start time.|
 
-The following example is a schedule that starts on January 1 at midnight and runs every 50 minutes.
+The following example is a schedule that starts on January 1 at midnight and runs every two hours.
 
 ```json
 {
     "dataSourceName" : "hotels-ds",
     "targetIndexName" : "hotels-idx",
-    "schedule" : { "interval" : "PT50M", "startTime" : "2022-01-01T00:00:00Z" }
+    "schedule" : { "interval" : "PT2H", "startTime" : "2022-01-01T00:00:00Z" }
 }
 ```
 
@@ -103,7 +105,7 @@ await indexerClient.CreateOrUpdateIndexerAsync(indexer);
 
 ## Scheduling behavior
 
-For text-based indexing, the scheduler can kick off as many indexer jobs as the search service supports, which is determined by the number of search units. For example, if the service has three replicas and four partitions, you can generally have twelve indexer jobs in active execution, whether initiated on demand or on a schedule.
+For text-based indexing, the scheduler can kick off as many indexer jobs as the search service supports, which is determined by the number of search units. For example, if the service has three replicas and four partitions, you can generally have 12 indexer jobs in active execution, whether initiated on demand or on a schedule.
 
 Skills-based indexers run in a different [execution environment](search-howto-run-reset-indexers.md#indexer-execution). For this reason, the number of service units has no bearing on the number of skills-based indexer jobs you can run. Multiple skills-based indexers can run in parallel, but doing so depends on node availability within the execution environment.
 
diff --git a/articles/search/search-limits-quotas-capacity.md b/articles/search/search-limits-quotas-capacity.md
@@ -46,7 +46,7 @@ Maximum limits on storage, workloads, and quantities of indexes and other object
 
 <sup>1</sup> Basic services created before December 2017 have lower limits (5 instead of 15) on indexes. Basic tier is the only SKU with a lower limit of 100 fields per index. 
 
-<sup>2</sup> The upper limit on fields includes both first-level fields and nested subfields in a complex collection. For example, if an index contains 15 fields and has two complex collections with 5 subfields each, the field count of your index is 25. Indexes with a very large fields collection can be slow. [Limit fields and attributes](search-what-is-an-index.md#physical-structure-and-size) to just those you need, and run indexing and query test to ensure performance is acceptable.
+<sup>2</sup> The upper limit on fields includes both first-level fields and nested subfields in a complex collection. For example, if an index contains 15 fields and has two complex collections with five subfields each, the field count of your index is 25. Indexes with a very large fields collection can be slow. [Limit fields and attributes](search-what-is-an-index.md#physical-structure-and-size) to just those you need, and run indexing and query test to ensure performance is acceptable.
 
 <sup>3</sup> An upper limit exists for elements because having a large number of them significantly increases the storage required for your index. An element of a complex collection is defined as a member of that collection. For example, assume a [Hotel document with a Rooms complex collection](search-howto-complex-data-types.md#indexing-complex-types), each room in the Rooms collection is considered an element. During indexing, the indexing engine can safely process a maximum of 3000 elements across the document as a whole. [This limit](search-api-migration.md#upgrade-to-2019-05-06) was introduced in `api-version=2019-05-06` and applies to complex collections only, and not to string collections or to complex fields.
 
@@ -93,7 +93,7 @@ Maximum running times exist to provide balance and stability to the service as a
 
 <sup>5</sup> AI enrichment and image analysis are computationally intensive and consume disproportionate amounts of available processing power. Running time for these workloads has been shortened to give other jobs in the queue more opportunity to run.
 
-<sup>6</sup> Indexer maximum run time for Basic tier or higher can be 2 or 24 hours, depending on system resources, product implementation and other factors.
+<sup>6</sup> Indexer execution and indexer-skillset combined execution is subject to a 2-hour maximum duration.  Currently, some indexers have a longer 24-hour maximum execution window, but that behavior isn't the norm. The longer window only applies if a service or its indexers can't be internally migrated to the newer runtime behavior. If more than 2 hours are needed to complete an indexer or indexer-skillset process, [schedule the indexer](search-howto-schedule-indexers.md) to run at 2-hour intervals.
 
 > [!NOTE]
 > As stated in the [Index limits](#index-limits), indexers will also enforce the upper limit of 3000 elements across all complex collections per document starting with the latest GA API version that supports complex types (`2019-05-06`) onwards. This means that if you've created your indexer with a prior API version, you will not be subject to this limit. To preserve maximum compatibility, an indexer that was created with a prior API version and then updated with an API version `2019-05-06` or later, will still be **excluded** from the limits. Customers should be aware of the adverse impact of having very large complex collections (as stated previously) and we highly recommend creating any new indexers with the latest GA API version.