Merge pull request #7500 from gmndrg/main

prmerger-automator[bot] · web-flow · commit 1309b4a2c816 · 2025-10-06T21:33:26.000Z
Large Blob datasets behavior improvements
diff --git a/articles/search/search-blob-storage-integration.md b/articles/search/search-blob-storage-integration.md
@@ -7,7 +7,7 @@ author: HeidiSteen
 ms.author: heidist
 ms.service: azure-ai-search
 ms.topic: conceptual
-ms.date: 07/25/2025
+ms.date: 10/06/2025
 ms.update-cycle: 365-days
 ms.custom:
   - ignite-2023
@@ -143,3 +143,4 @@ A more permanent solution is to gather query inputs and present the response as
 
 + [Upload, download, and list blobs with the Azure portal (Azure Blob storage)](/azure/storage/blobs/storage-quickstart-blobs-portal)
 + [Set up a blob indexer (Azure AI Search)](search-howto-indexing-azure-blob-storage.md)
++ [Index large data sets](search-how-to-large-index.md)
diff --git a/articles/search/search-how-to-large-index.md b/articles/search/search-how-to-large-index.md
@@ -9,7 +9,7 @@ ms.service: azure-ai-search
 ms.custom:
   - ignite-2023
 ms.topic: conceptual
-ms.date: 08/01/2025
+ms.date: 10/06/2025
 ms.update-cycle: 180-days
 ---
 
@@ -82,7 +82,8 @@ Default batch sizes are data-source specific. Azure SQL Database and Azure Cosmo
 
 Indexer scheduling is an important mechanism for processing large data sets and for accommodating slow-running processes like image analysis in an enrichment pipeline. 
 
-Typically, indexer processing runs within a two-hour window. If the indexing workload takes days rather than hours to complete, you can put the indexer on a consecutive, recurring schedule that starts every two hours. Assuming the data source has [change tracking enabled](search-howto-create-indexers.md#change-detection-and-internal-state), the indexer resumes processing where it last left off. At this cadence, an indexer can work its way through a document backlog over a series of days until all unprocessed documents are processed. 
+Typically, indexer processing runs within a two-hour window. If the indexing workload takes days rather than hours to complete, you can put the indexer on a consecutive, recurring schedule that starts every two hours. Assuming the data source has [change tracking enabled](search-howto-create-indexers.md#change-detection-and-internal-state), the indexer resumes processing where it last left off. At this cadence, an indexer can work its way through a document backlog over a series of days until all unprocessed documents are processed. This pattern is especially important during the initial run or when indexing large blob containers, where the blob listing phase alone can take multiple hours or days. During this time, the indexer would show no blobs being processed, but unless an error is reported, it is likely still iterating through the blob list. Document processing and enrichment begin only after this phase completes, and this behavior is expected.
+
 
 ```json
 {
diff --git a/articles/search/search-indexer-troubleshooting.md b/articles/search/search-indexer-troubleshooting.md
@@ -8,7 +8,7 @@ ms.service: azure-ai-search
 ms.custom:
   - ignite-2023
 ms.topic: conceptual
-ms.date: 05/29/2025
+ms.date: 10/06/2025
 ms.update-cycle: 365-days
 ---
 
@@ -19,6 +19,11 @@ Occasionally, indexers run into problems that don't produce errors or that occur
 > [!NOTE]
 > If you have an Azure AI Search error to investigate, see [Troubleshooting common indexer errors and warnings](cognitive-search-common-errors-warnings.md) instead.
 
+## Best practice: indexers are designed to run on a schedule
+> For reliable indexing, configure your indexers to run on a [regular schedule](search-howto-schedule-indexers.md). Scheduled runs automatically pick up any documents missed in previous runs due to transient errors, network interruptions, or temporary service issues. This approach helps maintain data consistency and minimizes the need for manual intervention.  
+>  
+> For [large data sources](search-how-to-large-index.md), the initial enumeration and indexing can take hours or even days. Running your indexer on a schedule allows that progress continues and errors are retried automatically. Avoid relying solely on manual or on-demand indexer runs, as these do not provide the same reliability or transient error recovery.  
+
 <a name="connection-errors"></a>
 
 ## Troubleshoot connections to restricted resources
@@ -291,3 +296,4 @@ If you have [sensitivity labels set on documents](/microsoft-365/compliance/sens
 
 * [Troubleshooting common indexer errors and warnings](cognitive-search-common-errors-warnings.md)
 * [Monitor indexer-based indexing](search-monitor-indexers.md)
+* [Index large data sets](search-how-to-large-index.md)