You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/search/search-howto-large-index.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,7 +13,7 @@ ms.date: 5/8/2020
13
13
14
14
# How to index large data sets in Azure Cognitive Search
15
15
16
-
Azure Cognitive Search supports [two basic approaches](https://docs.microsoft.com/en-us/azure/search/search-what-is-data-import) for importing data into a search index: *pushing* your data into the index programmatically, or pointing an [Azure Cognitive Search indexer](https://docs.microsoft.com/en-us/azure/search/search-indexer-overview) at a supported data source to *pull* in the data.
16
+
Azure Cognitive Search supports [two basic approaches](search-what-is-data-import.md) for importing data into a search index: *pushing* your data into the index programmatically, or pointing an [Azure Cognitive Search indexer](search-indexer-overview.md) at a supported data source to *pull* in the data.
17
17
18
18
As data volumes grow or processing needs change, you might find that simple or default indexing strategies are no longer practical. For Azure Cognitive Search, there are several approaches for accommodating larger data sets, ranging from how you structure a data upload request, to using a source-specific indexer for scheduled and distributed workloads.
19
19
@@ -64,14 +64,14 @@ The optimal number of threads is determined by the tier of your search service,
64
64
> [!NOTE]
65
65
> As you increase the tier of your search service or increase the partitions, you should also increase the number of concurrent threads.
66
66
67
-
As you ramp up the requests hitting the search service, you may encounter [HTTP status codes](https://docs.microsoft.com/en-us/rest/api/searchservice/http-status-codes) indicating the request did not fully succeed. During indexing, two common HTTP status codes are:
67
+
As you ramp up the requests hitting the search service, you may encounter [HTTP status codes](http-status-codes.md) indicating the request did not fully succeed. During indexing, two common HTTP status codes are:
68
68
69
69
***503 Service Unavailable** - This error means that the system is under heavy load and your request can't be processed at this time.
70
70
***207 Multi-Status** - This error means that some documents succeeded, but at least one failed.
71
71
72
72
### Retry strategy
73
73
74
-
If a failure happens, requests should be retried using an [exponential backoff retry strategy](https://docs.microsoft.com/en-us/dotnet/architecture/microservices/implement-resilient-applications/implement-retries-exponential-backoff).
74
+
If a failure happens, requests should be retried using an [exponential backoff retry strategy](https://docs.microsoft.com/dotnet/architecture/microservices/implement-resilient-applications/implement-retries-exponential-backoff).
75
75
76
76
Azure Cognitive Search's .NET SDK automatically retries 503s and other failed requests but you'll need to implement your own logic to retry 207s. Open-source tools such as [Polly](https://github.com/App-vNext/Polly) can also be used to implement a retry strategy. In this sample, we implement our own exponential backoff retry strategy.
Copy file name to clipboardExpand all lines: articles/search/tutorial-optimize-indexing-pushapi.md
+12-12Lines changed: 12 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,5 @@
1
1
---
2
-
title: 'C# Tutorial: Optimize indexing with the push API'
2
+
title: 'C# tutorial optimize indexing with the push API'
3
3
titleSuffix: Azure Cognitive Search
4
4
description: Learn how to efficiently index data using Azure Cognitive Search's push API. This tutorial and sample code are in C#.
5
5
@@ -13,9 +13,9 @@ ms.date: 05/08/2020
13
13
14
14
# Tutorial: Optimize indexing with the push API
15
15
16
-
Azure Cognitive Search supports [two basic approaches](https://docs.microsoft.com/en-us/azure/search/search-what-is-data-import) for importing data into a search index: *pushing* your data into the index programmatically, or pointing an [Azure Cognitive Search indexer](https://docs.microsoft.com/en-us/azure/search/search-indexer-overview) at a supported data source to *pull* in the data.
16
+
Azure Cognitive Search supports [two basic approaches](search-what-is-data-import.md) for importing data into a search index: *pushing* your data into the index programmatically, or pointing an [Azure Cognitive Search indexer](search-indexer-overview.md) at a supported data source to *pull* in the data.
17
17
18
-
This tutorial describes how to efficiently index data using the [push model](https://docs.microsoft.com/en-us/azure/search/search-what-is-data-import#pushing-data-to-an-index). A .NET Core C# console application has been created so you can [download and run the application](https://github.com/Azure-Samples/azure-search-dotnet-samples/tree/master/optimize-data-indexing). This article explains the key aspects of the application as well as factors to consider when indexing data.
18
+
This tutorial describes how to efficiently index data using the [push model](search-what-is-data-import.md#pushing-data-to-an-index) by batching requests and leveraging an exponential backoff retry strategy. You can [download and run the application](https://github.com/Azure-Samples/azure-search-dotnet-samples/tree/master/optimize-data-indexing). This article explains the key aspects of the application as well as factors to consider when indexing data.
19
19
20
20
This tutorial uses C# and the [.NET SDK](https://aka.ms/search-sdk) to perform the following tasks:
21
21
@@ -30,7 +30,7 @@ If you don't have an Azure subscription, create a [free account](https://azure.m
30
30
31
31
## Prerequisites
32
32
33
-
The following services and tools are required for this quickstart.
33
+
The following services and tools are required for this tutorial.
34
34
35
35
+[Visual Studio](https://visualstudio.microsoft.com/downloads/), any edition. Sample code and instructions were tested on the free Community edition.
36
36
@@ -88,7 +88,7 @@ API calls require the service URL and an access key. A search service is created
88
88
89
89
Once you update *appsettings.json*, the sample program in **OptimizeDataIndexing.sln** should be ready to build and run.
90
90
91
-
This code is derived from the [C# Quickstart](https://docs.microsoft.com/en-us/azure/search/search-get-started-dotnet) and you can find more detailed information on creating indexes and the basics of working with the .NET SDK in that article.
91
+
This code is derived from the [C# Quickstart](search-get-started-dotnet.md) and you can find more detailed information on creating indexes and the basics of working with the .NET SDK in that article.
92
92
93
93
This simple C#/.NET console app performs the following tasks:
94
94
@@ -167,7 +167,7 @@ Determining the optimal batch size for your data is a key component of optimizin
167
167
1. The schema of your index
168
168
1. The size of your data
169
169
170
-
Because, the optimal batch size is dependent on your index and your data, the best approach is to test different batch sizes to determine what results in the fastest indexing speeds in terms of MB/s for your scenario.
170
+
Because the optimal batch size is dependent on your index and your data, the best approach is to test different batch sizes to determine what results in the fastest indexing speeds in terms of MB/s for your scenario.
171
171
172
172
The following function demonstrates a simple approach to testing batch sizes.
173
173
@@ -256,14 +256,14 @@ The optimal number of threads is determined by the tier of your search service,
256
256
> [!NOTE]
257
257
> As you increase the tier of your search service or increase the partitions, you should also increase the number of concurrent threads.
258
258
259
-
As you ramp up the requests hitting the search service, you may encounter [HTTP status codes](https://docs.microsoft.com/en-us/rest/api/searchservice/http-status-codes) indicating the request did not fully succeed. During indexing, two common HTTP status codes are:
259
+
As you ramp up the requests hitting the search service, you may encounter [HTTP status codes](http-status-codes.md) indicating the request did not fully succeed. During indexing, two common HTTP status codes are:
260
260
261
261
***503 Service Unavailable** - This error means that the system is under heavy load and your request can't be processed at this time.
262
262
***207 Multi-Status** - This error means that some documents succeeded, but at least one failed.
263
263
264
264
### Implement an exponential backoff retry strategy
265
265
266
-
If a failure happens, requests should be retried using an [exponential backoff retry strategy](https://docs.microsoft.com/en-us/dotnet/architecture/microservices/implement-resilient-applications/implement-retries-exponential-backoff).
266
+
If a failure happens, requests should be retried using an [exponential backoff retry strategy](https://docs.microsoft.com/dotnet/architecture/microservices/implement-resilient-applications/implement-retries-exponential-backoff).
267
267
268
268
Azure Cognitive Search's .NET SDK automatically retries 503s and other failed requests but you'll need to implement your own logic to retry 207s. Open-source tools such as [Polly](https://github.com/App-vNext/Polly) can also be used to implement a retry strategy. In this sample, we implement our own exponential backoff retry strategy.
It's important to catch [IndexBatchException](https://docs.microsoft.com/en-us/dotnet/api/microsoft.azure.search.indexbatchexception?view=azure-dotnet) as this indicates that the indexing operation only partially succeeded (207s). Failed items should be retried using the `FindFailedActionsToRetry` method which making it easy to create a new batch containing only the failed items.
282
+
It's important to catch [IndexBatchException](https://docs.microsoft.com/dotnet/api/microsoft.azure.search.indexbatchexception?view=azure-dotnet) as this indicates that the indexing operation only partially succeeded (207s). Failed items should be retried using the `FindFailedActionsToRetry` method which making it easy to create a new batch containing only the failed items.
283
283
284
284
Exceptions other than `IndexBatchException` should also be caught and indicate the request failed completely. These exceptions are less common, particularly with the .NET SDK as it retries 503s automatically.
285
285
@@ -343,7 +343,7 @@ You can explore the populated search index after the program has run programatic
343
343
344
344
### Programatically
345
345
346
-
There are two main options for checking the number of documents in an index: the [Count Documents API](https://docs.microsoft.com/en-us/rest/api/searchservice/count-documents) and the [Get Index Statistics API](https://docs.microsoft.com/en-us/rest/api/searchservice/get-index-statistics). Both paths
346
+
There are two main options for checking the number of documents in an index: the [Count Documents API](https://docs.microsoft.com/rest/api/searchservice/count-documents) and the [Get Index Statistics API](https://docs.microsoft.com/rest/api/searchservice/get-index-statistics). Both paths
In Azure portal, open the search service **Overview** page, and find the **optimize-indexing** index in the **Indexes** list.
367
367
368
-

368
+

369
369
370
-
The *Document Count* and *Storage Size* are based on [Get Index Statistics API](https://docs.microsoft.com/en-us/rest/api/searchservice/get-index-statistics) and may take several minutes to update.
370
+
The *Document Count* and *Storage Size* are based on [Get Index Statistics API](https://docs.microsoft.com/rest/api/searchservice/get-index-statistics) and may take several minutes to update.
0 commit comments