You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/search/search-howto-connecting-azure-sql-database-to-azure-search-using-indexers.md
-4Lines changed: 0 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -358,10 +358,6 @@ Yes. However, you need to allow your search service to connect to your database.
358
358
359
359
Not directly. We do not recommend or support a direct connection, as doing so would require you to open your databases to Internet traffic. Customers have succeeded with this scenario using bridge technologies like Azure Data Factory. For more information, see [Push data to an Azure Cognitive Search index using Azure Data Factory](../data-factory/v1/data-factory-azure-search-connector.md).
360
360
361
-
**Q: Does running an indexer affect my query workload?**
362
-
363
-
Yes. Indexer runs on one of the nodes in your search service, and that node’s resources are shared between indexing and serving query traffic and other API requests. If you run intensive indexing and query workloads and encounter a high rate of 503 errors or increasing response times, consider [scaling up your search service](search-capacity-planning.md).
364
-
365
361
**Q: Can I use a secondary replica in a [failover cluster](../azure-sql/database/auto-failover-group-overview.md) as a data source?**
366
362
367
363
It depends. For full indexing of a table or view, you can use a secondary replica.
Copy file name to clipboardExpand all lines: articles/search/search-howto-large-index.md
+21-17Lines changed: 21 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -19,29 +19,23 @@ As data volumes grow or processing needs change, you might find that simple or d
19
19
20
20
The same techniques also apply to long-running processes. In particular, the steps outlined in [parallel indexing](#parallel-indexing) are helpful for computationally intensive indexing, such as image analysis or natural language processing in an [AI enrichment pipeline](cognitive-search-concept-intro.md).
21
21
22
-
The following sections explain techniques for indexing large amounts of data using both the push API and indexers.For more information and code samples that illustrate push model indexing, see [Tutorial: Optimize indexing speeds](tutorial-optimize-indexing-push-api.md).
22
+
The following sections explain techniques for indexing large amounts of data using both the push API and indexers.For more information and code samples that illustrate push model indexing, see [Tutorial: Optimize indexing speeds](tutorial-optimize-indexing-push-api.md).
23
23
24
-
## Indexing with the "push" API
24
+
## General tips
25
25
26
-
When pushing data into an index using the [Add Documents REST API](/rest/api/searchservice/addupdate-or-delete-documents) or the [IndexDocuments method (.NET)](/dotnet/api/azure.search.documents.searchclient.indexdocuments), there are several key considerations that impact indexing speed. Those factors are outlined in the section below, and range from setting service capacity to code optimizations.
26
+
When you're indexing a large volume of data, there are a few simple tips that can make a difference regardless of how the indexing is being done.
27
27
28
-
+[Index schema](#review-index-schema)
29
-
+[Data location and transfer speed](#check-data-location)
30
-
+[Batch multiple documents per request](#check-the-batch-size)
The schema of your index plays an important role in indexing data. The more fields you have, and the more properties you set (such as *searchable*, *facetable*, or *filterable*), all contribute to increased indexing time.
37
31
38
32
To keep document size down, avoid adding non-queryable data to an index. Every field that you add to an index should be there for a reason. If you need to integrate non-queryable data such as images into search results, you should define a non-searchable field that stores a URL reference to the resource.
39
33
40
-
## Check data location
34
+
###Check data location
41
35
42
36
Network data transfer speeds can be a limiting factor when indexing data. Indexing data from within your Azure environment is an easy way to speed up indexing.
43
37
44
-
## Check service capacity and partitions
38
+
###Check service capacity and partitions
45
39
46
40
1. Review the characteristics and [limits](search-limits-quotas-capacity.md) of the tier at which you provisioned the service. Service tiers differ by the size and speed of partitions, which has a direct impact on indexing speed. If the tier is insufficient for the workload, upgrading might be the easiest and most effective solution for increasing indexing throughput.
47
41
@@ -53,7 +47,17 @@ Adding more replicas may also increase indexing speeds but it isn't guaranteed.
53
47
> When [adding partition and replicas](search-capacity-planning.md#add-or-reduce-replicas-and-partitions), or provisioning a service at a higher tier, consider the monetary cost and allocation time. Adding partitions can significantly increase indexing speed, but adding and removing them can take anywhere from 15 minutes to several hours.
54
48
>
55
49
56
-
## Check the batch size
50
+
## Indexing with the "push" API
51
+
52
+
When pushing data into an index using the [Add Documents REST API](/rest/api/searchservice/addupdate-or-delete-documents) or the [IndexDocuments method (.NET)](/dotnet/api/azure.search.documents.searchclient.indexdocuments), there are several key considerations that impact indexing speed. Those factors are outlined in the section below, and range from setting service capacity to code optimizations.
53
+
54
+
+[Index schema](#review-index-schema)
55
+
+[Data location and transfer speed](#check-data-location)
56
+
+[Batch multiple documents per request](#check-the-batch-size)
One of the simplest mechanisms for indexing a larger data set is to submit multiple documents or records in a single request. As long as the entire payload is under 16 MB, a request can handle up to 1000 documents in a bulk upload operation. These limits apply whether you're using the [Add Documents REST API](/rest/api/searchservice/addupdate-or-delete-documents) or the [IndexDocuments method](/dotnet/api/azure.search.documents.searchclient.indexdocuments) in the .NET SDK. For either API, you would package 1000 documents in the body of each request.
59
63
@@ -64,7 +68,7 @@ Using batches to index documents will significantly improve indexing performance
64
68
65
69
Because the optimal batch size depends on your index and your data, the best approach is to test different batch sizes to determine what results in the fastest indexing speeds for your scenario. [Tutorial: Optimize indexing with the push API](tutorial-optimize-indexing-push-api.md) provides sample code for testing batch sizes using the .NET SDK.
66
70
67
-
## Add threads and a retry strategy
71
+
###Add threads and a retry strategy
68
72
69
73
In contrast with indexer APIs, when you are using the push APIs to index documents, your application code should ensure there are sufficient threads to make full use of the available capacity.
70
74
@@ -90,13 +94,13 @@ The Azure .NET SDK automatically retries 503s and other failed requests but you'
90
94
91
95
+ Partitioning data into smaller individual data sources enables parallel processing. You can break up source data into smaller components, such as into multiple containers in Azure Blob Storage, create a [data source](/rest/api/searchservice/create-data-source) for each partition, and then run multiple indexers in parallel.
92
96
93
-
### Check indexer batchSize
97
+
### Check indexer batch size
94
98
95
99
As with the push API, indexers allow you to configure the number of items per batch. For indexers based on the [Create Indexer REST API](/rest/api/searchservice/Create-Indexer), you can set the `batchSize` argument to customize this setting to better match the characteristics of your data.
96
100
97
101
Default batch sizes are data source specific. Azure SQL Database and Azure Cosmos DB have a default batch size of 1000. In contrast, Azure Blob indexing sets batch size at 10 documents in recognition of the larger average document size.
98
102
99
-
##Scheduled indexers for long-running processes
103
+
### Schedule indexers for long-running processes
100
104
101
105
Indexer scheduling is an important mechanism for processing large data sets, and slow-running processes like image analysis in a cognitive search pipeline. Indexer processing operates within a 24-hour window. If processing fails to finish within 24 hours, the behaviors of indexer scheduling can work to your advantage.
102
106
@@ -106,7 +110,7 @@ In practical terms, for index loads spanning several days, you can put the index
106
110
107
111
<aname="parallel-indexing"></a>
108
112
109
-
##Parallel indexers
113
+
### Run indexers in parallel
110
114
111
115
If you have partitioned data, you can create indexer-data-source combinations that pull from each data source and write to the same search index. Because each indexer is distinct, you can run them at the same time, populating a search index more quickly than if you ran them sequentially.
0 commit comments