Skip to content

Commit 469e01a

Browse files
authored
Merge pull request #89540 from HeidiSteen/heidist-master
Azure Search: incorporate feedback on "how to index large data sets"
2 parents 98690c7 + aaf0076 commit 469e01a

File tree

1 file changed

+6
-6
lines changed

1 file changed

+6
-6
lines changed

articles/search/search-howto-large-index.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -13,17 +13,17 @@ ms.author: heidist
1313
---
1414
# How to index large data sets in Azure Search
1515

16-
As data volumes grow or processing needs change, you might find that simple or default indexing strategies are no longer productive. For Azure Search, there are several approaches for accommodating larger data sets, ranging from how you structure a data upload request, to using a source-specific indexer for scheduled and distributed workloads.
16+
As data volumes grow or processing needs change, you might find that simple or default indexing strategies are no longer practical. For Azure Search, there are several approaches for accommodating larger data sets, ranging from how you structure a data upload request, to using a source-specific indexer for scheduled and distributed workloads.
1717

18-
The same techniques for large data also apply to long-running processes. In particular, the steps outlined in [parallel indexing](#parallel-indexing) are helpful for computationally intensive indexing, such as image analysis or natural language processing in [cognitive search pipelines](cognitive-search-concept-intro.md).
18+
The same techniques also apply to long-running processes. In particular, the steps outlined in [parallel indexing](#parallel-indexing) are helpful for computationally intensive indexing, such as image analysis or natural language processing in [cognitive search pipelines](cognitive-search-concept-intro.md).
1919

2020
The following sections explore three techniques for indexing large amounts of data.
2121

2222
## Option 1: Pass multiple documents
2323

24-
One of the simplest mechanisms for indexing a larger data set is to submit multiple documents or records in a single request. As long as the entire payload is under 16 MB, a request can handle up to 1000 documents in a bulk upload operation. These limits apply whether you are using the [REST API](https://docs.microsoft.com/rest/api/searchservice/addupdate-or-delete-documents) or [IndexBatch](https://docs.microsoft.com/otnet/api/microsoft.azure.search.models.indexbatch?view=azure-dotnet) in the .NET SDK. For either API, you would package 1000 documents in the body of each request.
24+
One of the simplest mechanisms for indexing a larger data set is to submit multiple documents or records in a single request. As long as the entire payload is under 16 MB, a request can handle up to 1000 documents in a bulk upload operation. These limits apply whether you are using the [Add Documents (REST)](https://docs.microsoft.com/rest/api/searchservice/addupdate-or-delete-documents) or [Index class](https://docs.microsoft.com/dotnet/api/microsoft.azure.search.models.index?view=azure-dotnet) in the .NET SDK. For either API, you would package 1000 documents in the body of each request.
2525

26-
Batch indexing is implemented for individual requests using REST or .NET, or through indexers. A few indexers operate under different limits. Specifically, Azure Blob indexing sets batch size at 10 documents in recognition of the larger average document size. For indexers based on the [Create Indexer REST API](https://docs.microsoft.com/rest/api/searchservice/Create-Indexer ), you can set the `BatchSize` argument to customize this setting to better match the characteristics of your data.
26+
Batch indexing is implemented for individual requests using REST or .NET, or through indexers. A few indexers operate under different limits. Specifically, Azure Blob indexing sets batch size at 10 documents in recognition of the larger average document size. For indexers based on the [Create Indexer (REST)](https://docs.microsoft.com/rest/api/searchservice/Create-Indexer ), you can set the `BatchSize` argument to customize this setting to better match the characteristics of your data.
2727

2828
> [!NOTE]
2929
> To keep document size down, avoid adding non-queryable data to an index. Images and other binary data are not directly searchable and shouldn't be stored in the index. To integrate non-queryable data into search results, you should define a non-searchable field that stores a URL reference to the resource.
@@ -36,11 +36,11 @@ Increasing replicas and partitions are billable events that increase your cost,
3636

3737
## Option 3: Use indexers
3838

39-
[Indexers](search-indexer-overview.md) are used to crawl external data sources on supported Azure data platforms for searchable content. While not specifically intended for large-scale indexing, several indexer capabilities are particularly useful for accommodating larger data sets:
39+
[Indexers](search-indexer-overview.md) are used to crawl supported Azure data sources for searchable content. While not specifically intended for large-scale indexing, several indexer capabilities are particularly useful for accommodating larger data sets:
4040

4141
+ Schedulers allow you to parcel out indexing at regular intervals so that you can spread it out over time.
4242
+ Scheduled indexing can resume at the last known stopping point. If a data source is not fully crawled within a 24-hour window, the indexer will resume indexing on day two at wherever it left off.
43-
+ Partitioning data into smaller individual data sources enables parallel processing. You can break a large data set into smaller data sets, and then create multiple indexer data source definitions that can be indexed in parallel.
43+
+ Partitioning data into smaller individual data sources enables parallel processing. You can break a large data set into smaller data sets on your source data platform (such as Azure Blob storage or Azure SQL Database), and then create multiple [data source objects](https://docs.microsoft.com/rest/api/searchservice/create-data-source) on Azure Search that can be indexed in parallel.
4444

4545
> [!NOTE]
4646
> Indexers are data-source-specific, so using an indexer approach is only viable for selected data sources on Azure: [SQL Database](search-howto-connecting-azure-sql-database-to-azure-search-using-indexers.md), [Blob storage](search-howto-indexing-azure-blob-storage.md), [Table storage](search-howto-indexing-azure-tables.md), [Cosmos DB](search-howto-index-cosmosdb.md).

0 commit comments

Comments
 (0)