You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/search/search-howto-large-index.md
+17-11Lines changed: 17 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -23,18 +23,17 @@ The following sections explore techniques for indexing large amounts of data usi
23
23
24
24
## Push API
25
25
26
-
When pushing data into an index, there's several key considerations that impact indexing speeds for the Push API. These factors are outlined in the section below.
26
+
When pushing data into an index, there's several key considerations that impact indexing speeds for the push API. These factors are outlined in the section below.
27
27
28
-
In addition to the guidance in this article, you can also take advantage of the code samples provided in the tutorial on [optimizing indexing speeds with the push API](tutorial-optimize-indexing-pushapi.md).
28
+
In addition to the information in this article, you can also take advantage of the code samples in the [optimizing indexing speeds tutorial](tutorial-optimize-indexing-pushapi.md) to learn more.
29
29
30
30
### Service tier and number of partitions/replicas
31
31
32
32
Adding partitions or increasing the tier of your search service will both increase indexing speeds.
33
33
34
-
Adding additional replicas may also increase indexing speeds but it is not guaranteed. On the other hand, additional replicas will increase the query volume your search service can handle.
34
+
Adding additional replicas may also increase indexing speeds but it isn't guaranteed. On the other hand, additional replicas will increase the query volume your search service can handle. Replicas are also a key component for getting an [SLA](https://azure.microsoft.com/en-us/support/legal/sla/search/v1_0/).
35
35
36
-
Before adding partition/replicas or upgrading to a higher tier, consider the cost and allocation time. Adding partitions can significantly increase indexing speed but adding/removing them can take anywhere from 15 minutes to several hours.
37
-
For additional guidance, see the documentation on [adjusting capacity](search-capacity-planning.md)
36
+
Before adding partition/replicas or upgrading to a higher tier, consider the monetary cost and allocation time. Adding partitions can significantly increase indexing speed but adding/removing them can take anywhere from 15 minutes to several hours. For more information, see the documentation on [adjusting capacity](search-capacity-planning.md).
38
37
39
38
### Index Schema
40
39
@@ -47,24 +46,31 @@ In general, we recommend only adding additional properties to fields if you inte
47
46
48
47
### Batch Size
49
48
50
-
One of the simplest mechanisms for indexing a larger data set is to submit multiple documents or records in a single request. As long as the entire payload is under 16 MB, a request can handle up to 1000 documents in a bulk upload operation. These limits apply whether you are using the [Add Documents REST API](https://docs.microsoft.com/rest/api/searchservice/addupdate-or-delete-documents) or the [Index method](https://docs.microsoft.com/dotnet/api/microsoft.azure.search.documentsoperationsextensions.index?view=azure-dotnet) in the .NET SDK. For either API, you would package 1000 documents in the body of each request.
49
+
One of the simplest mechanisms for indexing a larger data set is to submit multiple documents or records in a single request. As long as the entire payload is under 16 MB, a request can handle up to 1000 documents in a bulk upload operation. These limits apply whether you're using the [Add Documents REST API](https://docs.microsoft.com/rest/api/searchservice/addupdate-or-delete-documents) or the [Index method](https://docs.microsoft.com/dotnet/api/microsoft.azure.search.documentsoperationsextensions.index?view=azure-dotnet) in the .NET SDK. For either API, you would package 1000 documents in the body of each request.
51
50
52
-
Indexing documents in batches will significantly improve indexing performance. Determining the optimal batch size for your data is a key component of optimizing indexing speeds. The two primary factorings influencing the optimal batch size are:
51
+
Using batches to index documents will significantly improve indexing performance. Determining the optimal batch size for your data is a key component of optimizing indexing speeds. The two primary factors influencing the optimal batch size are:
53
52
1. The schema of your index
54
53
1. The size of your data
55
54
56
-
Because, the optimal batch size is dependent on your index and your data, the best approach is to test different batch sizes to determine what results in the fastest indexing speeds in terms of MB/s for your scenario. This [tutorial](tutorial-optimize-indexing-pushapi.md) provides sample code for testing batch sizes using the .NET SDK.
55
+
Because the optimal batch size depends on your index and your data, the best approach is to test different batch sizes to determine what results in the fastest indexing speeds for your scenario. This [tutorial](tutorial-optimize-indexing-pushapi.md) provides sample code for testing batch sizes using the .NET SDK.
57
56
58
57
### Number of threads/workers
59
58
60
59
To take full advantage of Azure Cognitive Search's indexing speeds, you'll likely need to use multiple threads to send batch indexing requests concurrently to the service.
61
60
62
-
The optimal number of threads is determined by the tier of your search service, the number of batches, the size of your batches, and the schema of your index. You can modify this sample and test with different thread counts to determine the optimal thread count for your scenario. However, as long as you have several threads running concurrently, you should be able to take advantage of most of the efficiency gains.
61
+
The optimal number of threads is determined by:
62
+
63
+
1. The tier of your search service
64
+
1. The number of partitions
65
+
1. The size of your batches
66
+
1. The schema of your index
67
+
68
+
You can modify this sample and test with different thread counts to determine the optimal thread count for your scenario. However, as long as you have several threads running concurrently, you should be able to take advantage of most of the efficiency gains.
63
69
64
70
> [!NOTE]
65
71
> As you increase the tier of your search service or increase the partitions, you should also increase the number of concurrent threads.
66
72
67
-
As you ramp up the requests hitting the search service, you may encounter [HTTP status codes](http-status-codes.md) indicating the request did not fully succeed. During indexing, two common HTTP status codes are:
73
+
As you ramp up the requests hitting the search service, you may encounter [HTTP status codes](http-status-codes.md) indicating the request didn't fully succeed. During indexing, two common HTTP status codes are:
68
74
69
75
***503 Service Unavailable** - This error means that the system is under heavy load and your request can't be processed at this time.
70
76
***207 Multi-Status** - This error means that some documents succeeded, but at least one failed.
@@ -73,7 +79,7 @@ As you ramp up the requests hitting the search service, you may encounter [HTTP
73
79
74
80
If a failure happens, requests should be retried using an [exponential backoff retry strategy](https://docs.microsoft.com/dotnet/architecture/microservices/implement-resilient-applications/implement-retries-exponential-backoff).
75
81
76
-
Azure Cognitive Search's .NET SDK automatically retries 503s and other failed requests but you'll need to implement your own logic to retry 207s. Open-source tools such as [Polly](https://github.com/App-vNext/Polly) can also be used to implement a retry strategy. In this sample, we implement our own exponential backoff retry strategy.
82
+
Azure Cognitive Search's .NET SDK automatically retries 503s and other failed requests but you'll need to implement your own logic to retry 207s. Open-source tools such as [Polly](https://github.com/App-vNext/Polly) can also be used to implement a retry strategy.
0 commit comments