Skip to content

Commit 8ddacd9

Browse files
small updates per acrolinx
1 parent 5136169 commit 8ddacd9

File tree

2 files changed

+48
-43
lines changed

2 files changed

+48
-43
lines changed

articles/search/search-howto-large-index.md

Lines changed: 17 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -23,18 +23,17 @@ The following sections explore techniques for indexing large amounts of data usi
2323

2424
## Push API
2525

26-
When pushing data into an index, there's several key considerations that impact indexing speeds for the Push API. These factors are outlined in the section below.
26+
When pushing data into an index, there's several key considerations that impact indexing speeds for the push API. These factors are outlined in the section below.
2727

28-
In addition to the guidance in this article, you can also take advantage of the code samples provided in the tutorial on [optimizing indexing speeds with the push API](tutorial-optimize-indexing-pushapi.md).
28+
In addition to the information in this article, you can also take advantage of the code samples in the [optimizing indexing speeds tutorial](tutorial-optimize-indexing-pushapi.md) to learn more.
2929

3030
### Service tier and number of partitions/replicas
3131

3232
Adding partitions or increasing the tier of your search service will both increase indexing speeds.
3333

34-
Adding additional replicas may also increase indexing speeds but it is not guaranteed. On the other hand, additional replicas will increase the query volume your search service can handle.
34+
Adding additional replicas may also increase indexing speeds but it isn't guaranteed. On the other hand, additional replicas will increase the query volume your search service can handle. Replicas are also a key component for getting an [SLA](https://azure.microsoft.com/en-us/support/legal/sla/search/v1_0/).
3535

36-
Before adding partition/replicas or upgrading to a higher tier, consider the cost and allocation time. Adding partitions can significantly increase indexing speed but adding/removing them can take anywhere from 15 minutes to several hours.
37-
For additional guidance, see the documentation on [adjusting capacity](search-capacity-planning.md)
36+
Before adding partition/replicas or upgrading to a higher tier, consider the monetary cost and allocation time. Adding partitions can significantly increase indexing speed but adding/removing them can take anywhere from 15 minutes to several hours. For more information, see the documentation on [adjusting capacity](search-capacity-planning.md).
3837

3938
### Index Schema
4039

@@ -47,24 +46,31 @@ In general, we recommend only adding additional properties to fields if you inte
4746
4847
### Batch Size
4948

50-
One of the simplest mechanisms for indexing a larger data set is to submit multiple documents or records in a single request. As long as the entire payload is under 16 MB, a request can handle up to 1000 documents in a bulk upload operation. These limits apply whether you are using the [Add Documents REST API](https://docs.microsoft.com/rest/api/searchservice/addupdate-or-delete-documents) or the [Index method](https://docs.microsoft.com/dotnet/api/microsoft.azure.search.documentsoperationsextensions.index?view=azure-dotnet) in the .NET SDK. For either API, you would package 1000 documents in the body of each request.
49+
One of the simplest mechanisms for indexing a larger data set is to submit multiple documents or records in a single request. As long as the entire payload is under 16 MB, a request can handle up to 1000 documents in a bulk upload operation. These limits apply whether you're using the [Add Documents REST API](https://docs.microsoft.com/rest/api/searchservice/addupdate-or-delete-documents) or the [Index method](https://docs.microsoft.com/dotnet/api/microsoft.azure.search.documentsoperationsextensions.index?view=azure-dotnet) in the .NET SDK. For either API, you would package 1000 documents in the body of each request.
5150

52-
Indexing documents in batches will significantly improve indexing performance. Determining the optimal batch size for your data is a key component of optimizing indexing speeds. The two primary factorings influencing the optimal batch size are:
51+
Using batches to index documents will significantly improve indexing performance. Determining the optimal batch size for your data is a key component of optimizing indexing speeds. The two primary factors influencing the optimal batch size are:
5352
1. The schema of your index
5453
1. The size of your data
5554

56-
Because, the optimal batch size is dependent on your index and your data, the best approach is to test different batch sizes to determine what results in the fastest indexing speeds in terms of MB/s for your scenario. This [tutorial](tutorial-optimize-indexing-pushapi.md) provides sample code for testing batch sizes using the .NET SDK.
55+
Because the optimal batch size depends on your index and your data, the best approach is to test different batch sizes to determine what results in the fastest indexing speeds for your scenario. This [tutorial](tutorial-optimize-indexing-pushapi.md) provides sample code for testing batch sizes using the .NET SDK.
5756

5857
### Number of threads/workers
5958

6059
To take full advantage of Azure Cognitive Search's indexing speeds, you'll likely need to use multiple threads to send batch indexing requests concurrently to the service.
6160

62-
The optimal number of threads is determined by the tier of your search service, the number of batches, the size of your batches, and the schema of your index. You can modify this sample and test with different thread counts to determine the optimal thread count for your scenario. However, as long as you have several threads running concurrently, you should be able to take advantage of most of the efficiency gains.
61+
The optimal number of threads is determined by:
62+
63+
1. The tier of your search service
64+
1. The number of partitions
65+
1. The size of your batches
66+
1. The schema of your index
67+
68+
You can modify this sample and test with different thread counts to determine the optimal thread count for your scenario. However, as long as you have several threads running concurrently, you should be able to take advantage of most of the efficiency gains.
6369

6470
> [!NOTE]
6571
> As you increase the tier of your search service or increase the partitions, you should also increase the number of concurrent threads.
6672
67-
As you ramp up the requests hitting the search service, you may encounter [HTTP status codes](http-status-codes.md) indicating the request did not fully succeed. During indexing, two common HTTP status codes are:
73+
As you ramp up the requests hitting the search service, you may encounter [HTTP status codes](http-status-codes.md) indicating the request didn't fully succeed. During indexing, two common HTTP status codes are:
6874

6975
* **503 Service Unavailable** - This error means that the system is under heavy load and your request can't be processed at this time.
7076
* **207 Multi-Status** - This error means that some documents succeeded, but at least one failed.
@@ -73,7 +79,7 @@ As you ramp up the requests hitting the search service, you may encounter [HTTP
7379

7480
If a failure happens, requests should be retried using an [exponential backoff retry strategy](https://docs.microsoft.com/dotnet/architecture/microservices/implement-resilient-applications/implement-retries-exponential-backoff).
7581

76-
Azure Cognitive Search's .NET SDK automatically retries 503s and other failed requests but you'll need to implement your own logic to retry 207s. Open-source tools such as [Polly](https://github.com/App-vNext/Polly) can also be used to implement a retry strategy. In this sample, we implement our own exponential backoff retry strategy.
82+
Azure Cognitive Search's .NET SDK automatically retries 503s and other failed requests but you'll need to implement your own logic to retry 207s. Open-source tools such as [Polly](https://github.com/App-vNext/Polly) can also be used to implement a retry strategy.
7783

7884
### Network data transfer speeds
7985

0 commit comments

Comments
 (0)