Skip to content

Commit 1309b4a

Browse files
Merge pull request #7500 from gmndrg/main
Large Blob datasets behavior improvements
2 parents 9b3c38e + f4fba1f commit 1309b4a

File tree

3 files changed

+12
-4
lines changed

3 files changed

+12
-4
lines changed

articles/search/search-blob-storage-integration.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ author: HeidiSteen
77
ms.author: heidist
88
ms.service: azure-ai-search
99
ms.topic: conceptual
10-
ms.date: 07/25/2025
10+
ms.date: 10/06/2025
1111
ms.update-cycle: 365-days
1212
ms.custom:
1313
- ignite-2023
@@ -143,3 +143,4 @@ A more permanent solution is to gather query inputs and present the response as
143143

144144
+ [Upload, download, and list blobs with the Azure portal (Azure Blob storage)](/azure/storage/blobs/storage-quickstart-blobs-portal)
145145
+ [Set up a blob indexer (Azure AI Search)](search-howto-indexing-azure-blob-storage.md)
146+
+ [Index large data sets](search-how-to-large-index.md)

articles/search/search-how-to-large-index.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ ms.service: azure-ai-search
99
ms.custom:
1010
- ignite-2023
1111
ms.topic: conceptual
12-
ms.date: 08/01/2025
12+
ms.date: 10/06/2025
1313
ms.update-cycle: 180-days
1414
---
1515

@@ -82,7 +82,8 @@ Default batch sizes are data-source specific. Azure SQL Database and Azure Cosmo
8282

8383
Indexer scheduling is an important mechanism for processing large data sets and for accommodating slow-running processes like image analysis in an enrichment pipeline.
8484

85-
Typically, indexer processing runs within a two-hour window. If the indexing workload takes days rather than hours to complete, you can put the indexer on a consecutive, recurring schedule that starts every two hours. Assuming the data source has [change tracking enabled](search-howto-create-indexers.md#change-detection-and-internal-state), the indexer resumes processing where it last left off. At this cadence, an indexer can work its way through a document backlog over a series of days until all unprocessed documents are processed.
85+
Typically, indexer processing runs within a two-hour window. If the indexing workload takes days rather than hours to complete, you can put the indexer on a consecutive, recurring schedule that starts every two hours. Assuming the data source has [change tracking enabled](search-howto-create-indexers.md#change-detection-and-internal-state), the indexer resumes processing where it last left off. At this cadence, an indexer can work its way through a document backlog over a series of days until all unprocessed documents are processed. This pattern is especially important during the initial run or when indexing large blob containers, where the blob listing phase alone can take multiple hours or days. During this time, the indexer would show no blobs being processed, but unless an error is reported, it is likely still iterating through the blob list. Document processing and enrichment begin only after this phase completes, and this behavior is expected.
86+
8687

8788
```json
8889
{

articles/search/search-indexer-troubleshooting.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ ms.service: azure-ai-search
88
ms.custom:
99
- ignite-2023
1010
ms.topic: conceptual
11-
ms.date: 05/29/2025
11+
ms.date: 10/06/2025
1212
ms.update-cycle: 365-days
1313
---
1414

@@ -19,6 +19,11 @@ Occasionally, indexers run into problems that don't produce errors or that occur
1919
> [!NOTE]
2020
> If you have an Azure AI Search error to investigate, see [Troubleshooting common indexer errors and warnings](cognitive-search-common-errors-warnings.md) instead.
2121
22+
## Best practice: indexers are designed to run on a schedule
23+
> For reliable indexing, configure your indexers to run on a [regular schedule](search-howto-schedule-indexers.md). Scheduled runs automatically pick up any documents missed in previous runs due to transient errors, network interruptions, or temporary service issues. This approach helps maintain data consistency and minimizes the need for manual intervention.
24+
>
25+
> For [large data sources](search-how-to-large-index.md), the initial enumeration and indexing can take hours or even days. Running your indexer on a schedule allows that progress continues and errors are retried automatically. Avoid relying solely on manual or on-demand indexer runs, as these do not provide the same reliability or transient error recovery.
26+
2227
<a name="connection-errors"></a>
2328

2429
## Troubleshoot connections to restricted resources
@@ -291,3 +296,4 @@ If you have [sensitivity labels set on documents](/microsoft-365/compliance/sens
291296

292297
* [Troubleshooting common indexer errors and warnings](cognitive-search-common-errors-warnings.md)
293298
* [Monitor indexer-based indexing](search-monitor-indexers.md)
299+
* [Index large data sets](search-how-to-large-index.md)

0 commit comments

Comments
 (0)