You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/search/search-indexer-tutorial.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -52,7 +52,7 @@ This tutorial provides *hotels.sql* file in the sample download to populate the
52
52
53
53
If you have an existing Azure SQL Database resource, you can add the hotels table to it, starting at the **Open query** step.
54
54
55
-
1. Create an Azure SQL database, using the instructions in [Quickstart: Create a single database](/azure-sql/database/single-database-create-quickstart).
55
+
1. Create an Azure SQL database, using the instructions in [Quickstart: Create a single database](/azure/azure-sql/database/single-database-create-quickstart).
56
56
57
57
Server configuration for the database is important.
58
58
@@ -64,7 +64,7 @@ If you have an existing Azure SQL Database resource, you can add the hotels tabl
64
64
65
65
1. In the Azure portal, go to the new resource.
66
66
67
-
1. Add a firewall rule to allow access from your client, using the instructions in [Quickstart: Create a server-level firewall rule in Azure portal](/azure/azure-sql/database/firewall-create-server-level-portal-quickstart0). You can run `ipconfig` from a command prompt to get your IP address.
67
+
1. Add a firewall rule to allow access from your client, using the instructions in [Quickstart: Create a server-level firewall rule in Azure portal](/azure/azure-sql/database/firewall-create-server-level-portal-quickstart). You can run `ipconfig` from a command prompt to get your IP address.
68
68
69
69
1. Use the Query editor to load the sample data. On the navigation pane, select **Query editor (preview)** and enter the user name and password of server admin.
Copy file name to clipboardExpand all lines: articles/search/tutorial-optimize-indexing-push-api.md
+12-15Lines changed: 12 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,19 +6,19 @@ author: gmndrg
6
6
ms.author: gimondra
7
7
ms.service: cognitive-search
8
8
ms.topic: tutorial
9
-
ms.date: 1/05/2023
9
+
ms.date: 1/18/2024
10
10
ms.custom:
11
11
- devx-track-csharp
12
12
- ignite-2023
13
13
---
14
14
15
15
# Tutorial: Optimize indexing with the push API
16
16
17
-
Azure AI Search supports [two basic approaches](search-what-is-data-import.md) for importing data into a search index: *pushing* your data into the index programmatically, or pointing an [Azure AI Search indexer](search-indexer-overview.md) at a supported data source to *pull* in the data.
17
+
Azure AI Search supports [two basic approaches](search-what-is-data-import.md) for importing data into a search index: *push* your data into the index programmatically, or pointing an [Azure AI Search indexer](search-indexer-overview.md) at a supported data source to *pull* in the data.
18
18
19
19
This tutorial describes how to efficiently index data using the [push model](search-what-is-data-import.md#pushing-data-to-an-index) by batching requests and using an exponential backoff retry strategy. You can [download and run the sample application](https://github.com/Azure-Samples/azure-search-dotnet-scale/tree/main/optimize-data-indexing). This article explains the key aspects of the application and factors to consider when indexing data.
20
20
21
-
This tutorial uses C# and the [.NET SDK](/dotnet/api/overview/azure/search) to perform the following tasks:
21
+
This tutorial uses C# and the [Azure.Search.Documents library from the Azure SDK for .NET](/dotnet/api/overview/azure/search) to perform the following tasks:
22
22
23
23
> [!div class="checklist"]
24
24
> * Create an index
@@ -45,9 +45,7 @@ Source code for this tutorial is in the [optimize-data-indexing/v11](https://git
45
45
46
46
## Key considerations
47
47
48
-
When pushing data into an index, there's several key considerations that impact indexing speeds. You can learn more about these factors in the [index large data sets article](search-howto-large-index.md).
49
-
50
-
Six key factors to consider are:
48
+
Factors affecting indexing speeds are listed next. You can learn more in [Index large data sets](search-howto-large-index.md).
51
49
52
50
+**Service tier and number of partitions/replicas** - Adding partitions and increasing your tier will both increase indexing speeds.
53
51
+**Index Schema** - Adding fields and adding additional properties to fields (such as *searchable*, *facetable*, or *filterable*) both reduce indexing speeds.
@@ -56,7 +54,6 @@ Six key factors to consider are:
56
54
+**Retry strategy** - An exponential backoff retry strategy should be used to optimize indexing.
57
55
+**Network data transfer speeds** - Data transfer speeds can be a limiting factor. Index data from within your Azure environment to increase data transfer speeds.
58
56
59
-
60
57
## 1 - Create Azure AI Search service
61
58
62
59
To complete this tutorial, you'll need an Azure AI Search service, which you can [create in the portal](search-create-service-portal.md). We recommend using the same tier you plan to use in production so that you can accurately test and optimize indexing speeds.
There are two sizes of hotels available for testing in this sample: **small** and **large**.
153
150
154
-
The schema of your index can have a significant impact on indexing speeds. Because of this impact, it makes sense to convert this class to generate data matching your intended index schema after you run through this tutorial.
151
+
The schema of your index has an effect on indexing speeds. For this reason, it makes sense to convert this class to generate data that best matches your intended index schema after you run through this tutorial.
155
152
156
153
## 4 - Test batch sizes
157
154
@@ -230,7 +227,7 @@ public static double EstimateObjectSize(object data)
230
227
}
231
228
```
232
229
233
-
The function requires an`SearchClient`as well as the number of tries you'd like to test for each batch size. As there may be some variability in indexing times for each batch, we try each batch three times by default to make the results more statistically significant.
230
+
The function requires a`SearchClient`plus the number of tries you'd like to test for each batch size. Because there might be variability in indexing times for each batch, we try each batch three times by default to make the results more statistically significant.
@@ -240,7 +237,7 @@ When you run the function, you should see an output like below in your console:
240
237
241
238

242
239
243
-
Identify which batch size is most efficient and then use that batch size in the next step of the tutorial. You may see a plateau in MB/s across different batch sizes.
240
+
Identify which batch size is most efficient and then use that batch size in the next step of the tutorial. You might see a plateau in MB/s across different batch sizes.
244
241
245
242
## 5 - Index data
246
243
@@ -253,9 +250,9 @@ Now that we've identified the batch size we intend to use, the next step is to b
253
250
254
251
To take full advantage of Azure AI Search's indexing speeds, you'll likely need to use multiple threads to send batch indexing requests concurrently to the service.
255
252
256
-
Several of the key considerations mentioned above impact the optimal number of threads. You can modify this sample and test with different thread counts to determine the optimal thread count for your scenario. However, as long as you have several threads running concurrently, you should be able to take advantage of most of the efficiency gains.
253
+
Several of the key considerations previously mentioned can affect the optimal number of threads. You can modify this sample and test with different thread counts to determine the optimal thread count for your scenario. However, as long as you have several threads running concurrently, you should be able to take advantage of most of the efficiency gains.
257
254
258
-
As you ramp up the requests hitting the search service, you may encounter [HTTP status codes](/rest/api/searchservice/http-status-codes) indicating the request didn't fully succeed. During indexing, two common HTTP status codes are:
255
+
As you ramp up the requests hitting the search service, you might encounter [HTTP status codes](/rest/api/searchservice/http-status-codes) indicating the request didn't fully succeed. During indexing, two common HTTP status codes are:
259
256
260
257
+**503 Service Unavailable** - This error means that the system is under heavy load and your request can't be processed at this time.
261
258
+**207 Multi-Status** - This error means that some documents succeeded, but at least one failed.
The results of the indexing operation are stored in the variable `IndexDocumentResult result`. This variable is important because it allows you to check if any documents in the batch failed as shown below. If there is a partial failure, a new batch is created based on the failed documents' ID.
281
+
The results of the indexing operation are stored in the variable `IndexDocumentResult result`. This variable is important because it allows you to check if any documents in the batch failed as shown below. If there's a partial failure, a new batch is created based on the failed documents' ID.
285
282
286
283
`RequestFailedException` exceptions should also be caught as they indicate the request failed completely and should also be retried.
287
284
@@ -366,7 +363,7 @@ You can explore the populated search index after the program has run programatic
366
363
367
364
### Programatically
368
365
369
-
There are two main options for checking the number of documents in an index: the [Count Documents API](/rest/api/searchservice/count-documents) and the [Get Index Statistics API](/rest/api/searchservice/get-index-statistics). Both paths may require some additional time to update so don't be alarmed if the number of documents returned is lower than you expected initially.
366
+
There are two main options for checking the number of documents in an index: the [Count Documents API](/rest/api/searchservice/count-documents) and the [Get Index Statistics API](/rest/api/searchservice/get-index-statistics). Both paths require time to process so don't be alarmed if the number of documents returned is initially lower than you expect.
370
367
371
368
#### Count Documents
372
369
@@ -390,7 +387,7 @@ In Azure portal, open the search service **Overview** page, and find the **optim
390
387
391
388

392
389
393
-
The *Document Count* and *Storage Size* are based on [Get Index Statistics API](/rest/api/searchservice/get-index-statistics) and may take several minutes to update.
390
+
The *Document Count* and *Storage Size* are based on [Get Index Statistics API](/rest/api/searchservice/get-index-statistics) and can take several minutes to update.
0 commit comments