Skip to content

Commit b6d8520

Browse files
authored
Merge pull request #89564 from HeidiSteen/heidist-master
Azure Search: How to index large data sets (round 3)
2 parents f6513c5 + 32b766b commit b6d8520

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

articles/search/search-howto-large-index.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -21,9 +21,9 @@ The following sections explore three techniques for indexing large amounts of da
2121

2222
## Option 1: Pass multiple documents
2323

24-
One of the simplest mechanisms for indexing a larger data set is to submit multiple documents or records in a single request. As long as the entire payload is under 16 MB, a request can handle up to 1000 documents in a bulk upload operation. These limits apply whether you are using the [Add Documents (REST)](https://docs.microsoft.com/rest/api/searchservice/addupdate-or-delete-documents) or [Index class](https://docs.microsoft.com/dotnet/api/microsoft.azure.search.models.index?view=azure-dotnet) in the .NET SDK. For either API, you would package 1000 documents in the body of each request.
24+
One of the simplest mechanisms for indexing a larger data set is to submit multiple documents or records in a single request. As long as the entire payload is under 16 MB, a request can handle up to 1000 documents in a bulk upload operation. These limits apply whether you are using the [Add Documents REST API](https://docs.microsoft.com/rest/api/searchservice/addupdate-or-delete-documents) or the [Index method](https://docs.microsoft.com/dotnet/api/microsoft.azure.search.documentsoperationsextensions.index?view=azure-dotnet) in the .NET SDK. For either API, you would package 1000 documents in the body of each request.
2525

26-
Batch indexing is implemented for individual requests using REST or .NET, or through indexers. A few indexers operate under different limits. Specifically, Azure Blob indexing sets batch size at 10 documents in recognition of the larger average document size. For indexers based on the [Create Indexer (REST)](https://docs.microsoft.com/rest/api/searchservice/Create-Indexer ), you can set the `BatchSize` argument to customize this setting to better match the characteristics of your data.
26+
Batch indexing is implemented for individual requests using REST or .NET, or through indexers. A few indexers operate under different limits. Specifically, Azure Blob indexing sets batch size at 10 documents in recognition of the larger average document size. For indexers based on the [Create Indexer REST API](https://docs.microsoft.com/rest/api/searchservice/Create-Indexer), you can set the `BatchSize` argument to customize this setting to better match the characteristics of your data.
2727

2828
> [!NOTE]
2929
> To keep document size down, avoid adding non-queryable data to an index. Images and other binary data are not directly searchable and shouldn't be stored in the index. To integrate non-queryable data into search results, you should define a non-searchable field that stores a URL reference to the resource.
@@ -40,7 +40,7 @@ Increasing replicas and partitions are billable events that increase your cost,
4040

4141
+ Schedulers allow you to parcel out indexing at regular intervals so that you can spread it out over time.
4242
+ Scheduled indexing can resume at the last known stopping point. If a data source is not fully crawled within a 24-hour window, the indexer will resume indexing on day two at wherever it left off.
43-
+ Partitioning data into smaller individual data sources enables parallel processing. You can break a large data set into smaller data sets on your source data platform (such as Azure Blob storage or Azure SQL Database), and then create multiple [data source objects](https://docs.microsoft.com/rest/api/searchservice/create-data-source) on Azure Search that can be indexed in parallel.
43+
+ Partitioning data into smaller individual data sources enables parallel processing. You can break up source data into smaller components, such as into multiple containers in Azure Blob storage, and then create corresponding, multiple [data source objects](https://docs.microsoft.com/rest/api/searchservice/create-data-source) in Azure Search that can be indexed in parallel.
4444

4545
> [!NOTE]
4646
> Indexers are data-source-specific, so using an indexer approach is only viable for selected data sources on Azure: [SQL Database](search-howto-connecting-azure-sql-database-to-azure-search-using-indexers.md), [Blob storage](search-howto-indexing-azure-blob-storage.md), [Table storage](search-howto-indexing-azure-tables.md), [Cosmos DB](search-howto-index-cosmosdb.md).

0 commit comments

Comments
 (0)