Skip to content

Commit 76daeb5

Browse files
committed
add links to related articles
1 parent 1cf6129 commit 76daeb5

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

articles/ai-studio/concepts/model-benchmarks.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -73,15 +73,15 @@ This approach uses the following default parameters for benchmarking:
7373

7474
| Parameter | Value | Applies to |
7575
|-----------|-------|----------------|
76-
| Region | East US/East US2 | Serverless APIs and Azure OpenAI |
76+
| Region | East US/East US2 | [Serverless APIs](../how-to/model-catalog-overview.md#serverless-apis-with-pay-as-you-go-billing) and [Azure OpenAI](/azure/ai-services/openai/overview) |
7777
| Tokens per minute (TPM) rate limit | 30k (180 RPM based on Azure OpenAI) <br> N/A (serverless APIs) | For Azure OpenAI models, selection is available for users with rate limit ranges based on deployment type (standard, global, global standard, and so on.) <br> For serverless APIs, this setting is abstracted. |
7878
| Number of requests | 128 | Serverless APIs, Azure OpenAI |
7979
| Prompt/Context length | Moderate length | Serverless APIs, Azure OpenAI |
8080
| Number of tokens processed (moderate) | 80:20 ratio for input to output tokens, that is, 800 input tokens to 200 output tokens. | Serverless APIs, Azure OpenAI |
8181
| Number of concurrent requests | 16 | Serverless APIs, Azure OpenAI |
8282
| Data | Synthetic (input prompts prepared from static text) | Serverless APIs, Azure OpenAI |
8383
| Deployment type | Standard | Applicable only for Azure OpenAI |
84-
| Streaming | True | Applies to serverless APIs and Azure OpenAI. For models deployed via managed compute, set max_token = 1 to replicate streaming scenario, which allows for calculating metrics like total time to first token (TTFT) for managed compute. |
84+
| Streaming | True | Applies to serverless APIs and Azure OpenAI. For models deployed via [managed compute](../how-to/model-catalog-overview.md#managed-compute), set max_token = 1 to replicate streaming scenario, which allows for calculating metrics like total time to first token (TTFT) for managed compute. |
8585
| Tokenizer | Tiktoken package (Azure OpenAI) <br> Hugging Face model ID (Serverless APIs) | Hugging Face model ID (Azure serverless APIs) |
8686

8787
#### Performance metrics calculated as an aggregate

0 commit comments

Comments
 (0)