Merge pull request #266454 from ChrisHMSFT/chrhoder/20240215PTUupdate

prmerger-automator[bot] · web-flow · commit d4f8558a9ae9 · 2024-03-04T14:42:00.000Z
[Azure OpenAI] updated statement on concurrency
diff --git a/articles/ai-services/openai/concepts/provisioned-throughput.md b/articles/ai-services/openai/concepts/provisioned-throughput.md
@@ -115,7 +115,7 @@ We use a variation of the leaky bucket algorithm to maintain utilization below 1
 
 #### How many concurrent calls can I have on my deployment?
 
-The number of concurrent calls you can have at one time is dependent on each call's shape. The service will continue to accept calls until the utilization is above 100%. To determine the approximate number of concurrent calls you can model out the maximum requests per minute for a particular call shape in the [capacity calculator](https://oai.azure.com/portal/calculator). If `max_tokens` is empty, you can assume a value of 1000
+The number of concurrent calls you can achieve depends on each call's shape (prompt size, max_token parameter, etc). The service will continue to accept calls until the utilization reach 100%. To determine the approximate number of concurrent calls you can model out the maximum requests per minute for a particular call shape in the [capacity calculator](https://oai.azure.com/portal/calculator). If the system generates less than the number of samplings tokens like max_token, it will accept more requests.
 
 ## Next steps