update

mrbullwinkle · mrbullwinkle · commit bdc8a1cc54d5 · 2024-02-07T09:03:27.000-05:00
diff --git a/articles/ai-services/openai/how-to/latency.md b/articles/ai-services/openai/how-to/latency.md
@@ -61,7 +61,7 @@ When you send a completion request to the Azure OpenAI endpoint, your input text
 At the time of the request, the requested generation size (max_tokens parameter) is used as an initial estimate of the generation size. The compute-time for generating the full size is reserved by the model as the request is processed. Once the generation is completed, the remaining quota is released. Ways to reduce the number of tokens:
 - Set the `max_token` parameter on each call as small as possible.
 - Include stop sequences to prevent generating extra content.
-- Generate fewer responses: The best_of & n parameters can greatly increase latency because they  generate multiple outputs. For the fastest response, either don't specify these values or set them to 1.
+- Generate fewer responses: The best_of & n parameters can greatly increase latency because they generate multiple outputs. For the fastest response, either don't specify these values or set them to 1.
 
 In summary, reducing the number of tokens generated per request reduces the latency of each request.