You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/search/search-howto-large-index.md
+6-6Lines changed: 6 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,17 +13,17 @@ ms.author: heidist
13
13
---
14
14
# How to index large data sets in Azure Search
15
15
16
-
As data volumes grow or processing needs change, you might find that simple or default indexing strategies are no longer productive. For Azure Search, there are several approaches for accommodating larger data sets, ranging from how you structure a data upload request, to using a source-specific indexer for scheduled and distributed workloads.
16
+
As data volumes grow or processing needs change, you might find that simple or default indexing strategies are no longer practical. For Azure Search, there are several approaches for accommodating larger data sets, ranging from how you structure a data upload request, to using a source-specific indexer for scheduled and distributed workloads.
17
17
18
-
The same techniques for large data also apply to long-running processes. In particular, the steps outlined in [parallel indexing](#parallel-indexing) are helpful for computationally intensive indexing, such as image analysis or natural language processing in [cognitive search pipelines](cognitive-search-concept-intro.md).
18
+
The same techniques also apply to long-running processes. In particular, the steps outlined in [parallel indexing](#parallel-indexing) are helpful for computationally intensive indexing, such as image analysis or natural language processing in [cognitive search pipelines](cognitive-search-concept-intro.md).
19
19
20
20
The following sections explore three techniques for indexing large amounts of data.
21
21
22
22
## Option 1: Pass multiple documents
23
23
24
-
One of the simplest mechanisms for indexing a larger data set is to submit multiple documents or records in a single request. As long as the entire payload is under 16 MB, a request can handle up to 1000 documents in a bulk upload operation. These limits apply whether you are using the [REST API](https://docs.microsoft.com/rest/api/searchservice/addupdate-or-delete-documents) or [IndexBatch](https://docs.microsoft.com/otnet/api/microsoft.azure.search.models.indexbatch?view=azure-dotnet) in the .NET SDK. For either API, you would package 1000 documents in the body of each request.
24
+
One of the simplest mechanisms for indexing a larger data set is to submit multiple documents or records in a single request. As long as the entire payload is under 16 MB, a request can handle up to 1000 documents in a bulk upload operation. These limits apply whether you are using the [Add Documents (REST)](https://docs.microsoft.com/rest/api/searchservice/addupdate-or-delete-documents) or [Index class](https://docs.microsoft.com/dotnet/api/microsoft.azure.search.models.index?view=azure-dotnet) in the .NET SDK. For either API, you would package 1000 documents in the body of each request.
25
25
26
-
Batch indexing is implemented for individual requests using REST or .NET, or through indexers. A few indexers operate under different limits. Specifically, Azure Blob indexing sets batch size at 10 documents in recognition of the larger average document size. For indexers based on the [Create Indexer REST API](https://docs.microsoft.com/rest/api/searchservice/Create-Indexer), you can set the `BatchSize` argument to customize this setting to better match the characteristics of your data.
26
+
Batch indexing is implemented for individual requests using REST or .NET, or through indexers. A few indexers operate under different limits. Specifically, Azure Blob indexing sets batch size at 10 documents in recognition of the larger average document size. For indexers based on the [Create Indexer (REST)](https://docs.microsoft.com/rest/api/searchservice/Create-Indexer), you can set the `BatchSize` argument to customize this setting to better match the characteristics of your data.
27
27
28
28
> [!NOTE]
29
29
> To keep document size down, avoid adding non-queryable data to an index. Images and other binary data are not directly searchable and shouldn't be stored in the index. To integrate non-queryable data into search results, you should define a non-searchable field that stores a URL reference to the resource.
@@ -36,11 +36,11 @@ Increasing replicas and partitions are billable events that increase your cost,
36
36
37
37
## Option 3: Use indexers
38
38
39
-
[Indexers](search-indexer-overview.md) are used to crawl external data sources on supported Azure data platforms for searchable content. While not specifically intended for large-scale indexing, several indexer capabilities are particularly useful for accommodating larger data sets:
39
+
[Indexers](search-indexer-overview.md) are used to crawl supported Azure data sources for searchable content. While not specifically intended for large-scale indexing, several indexer capabilities are particularly useful for accommodating larger data sets:
40
40
41
41
+ Schedulers allow you to parcel out indexing at regular intervals so that you can spread it out over time.
42
42
+ Scheduled indexing can resume at the last known stopping point. If a data source is not fully crawled within a 24-hour window, the indexer will resume indexing on day two at wherever it left off.
43
-
+ Partitioning data into smaller individual data sources enables parallel processing. You can break a large data set into smaller data sets, and then create multiple indexer data source definitions that can be indexed in parallel.
43
+
+ Partitioning data into smaller individual data sources enables parallel processing. You can break a large data set into smaller data sets on your source data platform (such as Azure Blob storage or Azure SQL Database), and then create multiple [data source objects](https://docs.microsoft.com/rest/api/searchservice/create-data-source) on Azure Search that can be indexed in parallel.
44
44
45
45
> [!NOTE]
46
46
> Indexers are data-source-specific, so using an indexer approach is only viable for selected data sources on Azure: [SQL Database](search-howto-connecting-azure-sql-database-to-azure-search-using-indexers.md), [Blob storage](search-howto-indexing-azure-blob-storage.md), [Table storage](search-howto-indexing-azure-tables.md), [Cosmos DB](search-howto-index-cosmosdb.md).
0 commit comments