Skip to content

Commit bdd84b5

Browse files
authored
Merge pull request #121664 from nkgami/patch-40
Fix wrong parameter name
2 parents a0a18b1 + bb4b62f commit bdd84b5

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

articles/ai-services/openai/how-to/latency.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@ Latency varies based on what model you're using. For an identical request, expec
5959
When you send a completion request to the Azure OpenAI endpoint, your input text is converted to tokens that are then sent to your deployed model. The model receives the input tokens and then begins generating a response. It's an iterative sequential process, one token at a time. Another way to think of it is like a for loop with `n tokens = n iterations`. For most models, generating the response is the slowest step in the process.
6060

6161
At the time of the request, the requested generation size (max_tokens parameter) is used as an initial estimate of the generation size. The compute-time for generating the full size is reserved by the model as the request is processed. Once the generation is completed, the remaining quota is released. Ways to reduce the number of tokens:
62-
- Set the `max_token` parameter on each call as small as possible.
62+
- Set the `max_tokens` parameter on each call as small as possible.
6363
- Include stop sequences to prevent generating extra content.
6464
- Generate fewer responses: The best_of & n parameters can greatly increase latency because they generate multiple outputs. For the fastest response, either don't specify these values or set them to 1.
6565

@@ -136,4 +136,4 @@ Time from the first token to the last token, divided by the number of generated
136136

137137
* **Streaming**: Enabling streaming can be useful in managing user expectations in certain situations by allowing the user to see the model response as it is being generated rather than having to wait until the last token is ready.
138138

139-
* **Content Filtering** improves safety, but it also impacts latency. Evaluate if any of your workloads would benefit from [modified content filtering policies](./content-filters.md).
139+
* **Content Filtering** improves safety, but it also impacts latency. Evaluate if any of your workloads would benefit from [modified content filtering policies](./content-filters.md).

0 commit comments

Comments
 (0)