Skip to content

Commit 85aa48a

Browse files
committed
Reordered the H2s in large indexing
1 parent 5044c40 commit 85aa48a

File tree

2 files changed

+21
-21
lines changed

2 files changed

+21
-21
lines changed

articles/search/search-howto-connecting-azure-sql-database-to-azure-search-using-indexers.md

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -358,10 +358,6 @@ Yes. However, you need to allow your search service to connect to your database.
358358

359359
Not directly. We do not recommend or support a direct connection, as doing so would require you to open your databases to Internet traffic. Customers have succeeded with this scenario using bridge technologies like Azure Data Factory. For more information, see [Push data to an Azure Cognitive Search index using Azure Data Factory](../data-factory/v1/data-factory-azure-search-connector.md).
360360

361-
**Q: Does running an indexer affect my query workload?**
362-
363-
Yes. Indexer runs on one of the nodes in your search service, and that node’s resources are shared between indexing and serving query traffic and other API requests. If you run intensive indexing and query workloads and encounter a high rate of 503 errors or increasing response times, consider [scaling up your search service](search-capacity-planning.md).
364-
365361
**Q: Can I use a secondary replica in a [failover cluster](../azure-sql/database/auto-failover-group-overview.md) as a data source?**
366362

367363
It depends. For full indexing of a table or view, you can use a secondary replica.

articles/search/search-howto-large-index.md

Lines changed: 21 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -19,29 +19,23 @@ As data volumes grow or processing needs change, you might find that simple or d
1919

2020
The same techniques also apply to long-running processes. In particular, the steps outlined in [parallel indexing](#parallel-indexing) are helpful for computationally intensive indexing, such as image analysis or natural language processing in an [AI enrichment pipeline](cognitive-search-concept-intro.md).
2121

22-
The following sections explain techniques for indexing large amounts of data using both the push API and indexers.For more information and code samples that illustrate push model indexing, see [Tutorial: Optimize indexing speeds](tutorial-optimize-indexing-push-api.md).
22+
The following sections explain techniques for indexing large amounts of data using both the push API and indexers. For more information and code samples that illustrate push model indexing, see [Tutorial: Optimize indexing speeds](tutorial-optimize-indexing-push-api.md).
2323

24-
## Indexing with the "push" API
24+
## General tips
2525

26-
When pushing data into an index using the [Add Documents REST API](/rest/api/searchservice/addupdate-or-delete-documents) or the [IndexDocuments method (.NET)](/dotnet/api/azure.search.documents.searchclient.indexdocuments), there are several key considerations that impact indexing speed. Those factors are outlined in the section below, and range from setting service capacity to code optimizations.
26+
When you're indexing a large volume of data, there are a few simple tips that can make a difference regardless of how the indexing is being done.
2727

28-
+ [Index schema](#review-index-schema)
29-
+ [Data location and transfer speed](#check-data-location)
30-
+ [Batch multiple documents per request](#check-the-batch-size)
31-
+ [Service capacity](#check-service-capacity-and-partitions)
32-
+ [Manage threads](#add-threads-and-a-retry-strategy)
33-
34-
## Review index schema
28+
### Simplify the index schema
3529

3630
The schema of your index plays an important role in indexing data. The more fields you have, and the more properties you set (such as *searchable*, *facetable*, or *filterable*), all contribute to increased indexing time.
3731

3832
To keep document size down, avoid adding non-queryable data to an index. Every field that you add to an index should be there for a reason. If you need to integrate non-queryable data such as images into search results, you should define a non-searchable field that stores a URL reference to the resource.
3933

40-
## Check data location
34+
### Check data location
4135

4236
Network data transfer speeds can be a limiting factor when indexing data. Indexing data from within your Azure environment is an easy way to speed up indexing.
4337

44-
## Check service capacity and partitions
38+
### Check service capacity and partitions
4539

4640
1. Review the characteristics and [limits](search-limits-quotas-capacity.md) of the tier at which you provisioned the service. Service tiers differ by the size and speed of partitions, which has a direct impact on indexing speed. If the tier is insufficient for the workload, upgrading might be the easiest and most effective solution for increasing indexing throughput.
4741

@@ -53,7 +47,17 @@ Adding more replicas may also increase indexing speeds but it isn't guaranteed.
5347
> When [adding partition and replicas](search-capacity-planning.md#add-or-reduce-replicas-and-partitions), or provisioning a service at a higher tier, consider the monetary cost and allocation time. Adding partitions can significantly increase indexing speed, but adding and removing them can take anywhere from 15 minutes to several hours.
5448
>
5549
56-
## Check the batch size
50+
## Indexing with the "push" API
51+
52+
When pushing data into an index using the [Add Documents REST API](/rest/api/searchservice/addupdate-or-delete-documents) or the [IndexDocuments method (.NET)](/dotnet/api/azure.search.documents.searchclient.indexdocuments), there are several key considerations that impact indexing speed. Those factors are outlined in the section below, and range from setting service capacity to code optimizations.
53+
54+
+ [Index schema](#review-index-schema)
55+
+ [Data location and transfer speed](#check-data-location)
56+
+ [Batch multiple documents per request](#check-the-batch-size)
57+
+ [Service capacity](#check-service-capacity-and-partitions)
58+
+ [Manage threads](#add-threads-and-a-retry-strategy)
59+
60+
### Check the batch size
5761

5862
One of the simplest mechanisms for indexing a larger data set is to submit multiple documents or records in a single request. As long as the entire payload is under 16 MB, a request can handle up to 1000 documents in a bulk upload operation. These limits apply whether you're using the [Add Documents REST API](/rest/api/searchservice/addupdate-or-delete-documents) or the [IndexDocuments method](/dotnet/api/azure.search.documents.searchclient.indexdocuments) in the .NET SDK. For either API, you would package 1000 documents in the body of each request.
5963

@@ -64,7 +68,7 @@ Using batches to index documents will significantly improve indexing performance
6468

6569
Because the optimal batch size depends on your index and your data, the best approach is to test different batch sizes to determine what results in the fastest indexing speeds for your scenario. [Tutorial: Optimize indexing with the push API](tutorial-optimize-indexing-push-api.md) provides sample code for testing batch sizes using the .NET SDK.
6670

67-
## Add threads and a retry strategy
71+
### Add threads and a retry strategy
6872

6973
In contrast with indexer APIs, when you are using the push APIs to index documents, your application code should ensure there are sufficient threads to make full use of the available capacity.
7074

@@ -90,13 +94,13 @@ The Azure .NET SDK automatically retries 503s and other failed requests but you'
9094

9195
+ Partitioning data into smaller individual data sources enables parallel processing. You can break up source data into smaller components, such as into multiple containers in Azure Blob Storage, create a [data source](/rest/api/searchservice/create-data-source) for each partition, and then run multiple indexers in parallel.
9296

93-
### Check indexer batchSize
97+
### Check indexer batch size
9498

9599
As with the push API, indexers allow you to configure the number of items per batch. For indexers based on the [Create Indexer REST API](/rest/api/searchservice/Create-Indexer), you can set the `batchSize` argument to customize this setting to better match the characteristics of your data.
96100

97101
Default batch sizes are data source specific. Azure SQL Database and Azure Cosmos DB have a default batch size of 1000. In contrast, Azure Blob indexing sets batch size at 10 documents in recognition of the larger average document size.
98102

99-
## Scheduled indexers for long-running processes
103+
### Schedule indexers for long-running processes
100104

101105
Indexer scheduling is an important mechanism for processing large data sets, and slow-running processes like image analysis in a cognitive search pipeline. Indexer processing operates within a 24-hour window. If processing fails to finish within 24 hours, the behaviors of indexer scheduling can work to your advantage.
102106

@@ -106,7 +110,7 @@ In practical terms, for index loads spanning several days, you can put the index
106110

107111
<a name="parallel-indexing"></a>
108112

109-
## Parallel indexers
113+
### Run indexers in parallel
110114

111115
If you have partitioned data, you can create indexer-data-source combinations that pull from each data source and write to the same search index. Because each indexer is distinct, you can run them at the same time, populating a search index more quickly than if you ran them sequentially.
112116

0 commit comments

Comments
 (0)