Skip to content

Commit f782f2a

Browse files
committed
updated statement on concurrency
1 parent 852523d commit f782f2a

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

articles/ai-services/openai/concepts/provisioned-throughput.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -111,7 +111,7 @@ We use a variation of the leaky bucket algorithm to maintain utilization below 1
111111

112112
#### How many concurrent calls can I have on my deployment?
113113

114-
The number of concurrent calls you can have at one time is dependent on each call's shape. The service will continue to accept calls until the utilization is above 100%. To determine the approximate number of concurrent calls you can model out the maximum requests per minute for a particular call shape in the [capacity calculator](https://oai.azure.com/portal/calculator). If `max_tokens` is empty, you can assume a value of 1000
114+
The number of concurrent calls you can have at one time is dependent on each call's shape. The service will continue to accept calls until the utilization is above 100%. To determine the approximate number of concurrent calls you can model out the maximum requests per minute for a particular call shape in the [capacity calculator](https://oai.azure.com/portal/calculator). This will be the worst case scenario when the `max_tokens` parameter is set to match the sized generation token size. The service may take higher concurrency in some situations.
115115

116116
## Next steps
117117

0 commit comments

Comments
 (0)