Skip to content

Commit f3c6ec1

Browse files
committed
throttling doc fixes for semantic ranker
1 parent 8b0dfe9 commit f3c6ec1

File tree

1 file changed

+15
-2
lines changed

1 file changed

+15
-2
lines changed

articles/search/search-limits-quotas-capacity.md

Lines changed: 15 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -197,9 +197,22 @@ Static rate request limits for operations related to a service:
197197

198198
+ Service Statistics (GET /servicestats): 4 per second per search unit
199199

200-
L2 reranking using the semantic reranker has an expected volume:
200+
### Semantic Ranker Throttling limits
201201

202-
+ Up to 10 concurrent queries per replica. If you anticipate consistent throughput requirements near, at, or higher than this level, please file a support ticket so that we can provision for your workload.
202+
[Semantic ranker](search-get-started-semantic.md) uses a queuing system to manage concurrent requests. This sytem allows search services get the highest amount of queries per second possible. When the limit of concurrent requests is reached, additional requests are placed in a queue. If the queue is full, further requests are rejected and must be retried.
203+
204+
Total semantic ranker queries per second varies based on the following factors:
205+
+ The SKU of the search service. Both queue capacity and concurrent request limits vary by SKU.
206+
+ The number of search units in the search service. The simplest way to increase the maximum amount of concurrent semantic ranker queries is to [add additional search units to your search service](search-capacity-planning.md#how-to-change-capacity).
207+
+ The total available semantic ranker capacity in the region.
208+
+ The amount of time it takes to serve a query using semantic ranker. This varies based on how busy the search service is.
209+
210+
The following table describes the semantic ranker throttling limits by SKU. These limits may be increased through a support request, subject to available capacity in the region:
211+
212+
| Resource | Basic | S1 | S2 | S3 | S3-HD | L1 | L2 |
213+
|----------|-------|----|----|----|-------|----|----|
214+
| Maximum Concurrent Requests (per Search Unit) | 2 | 3 | 4 | 4 | 4 | 4 |
215+
| Maximum Request Queue Size (per Search Unit) | 4 | 6 | 8 | 8 | 8 | 8 |
203216

204217
## API request limits
205218

0 commit comments

Comments
 (0)