Merge pull request #277989 from mrbullwinkle/mrb_06_12_2024_max_tokens

prmerger-automator[bot] · web-flow · commit 8c76d8b93e88 · 2024-06-12T15:31:58.000Z
[Azure OpenAI] Clarify default max token limit for vision models
diff --git a/articles/ai-services/openai/faq.yml b/articles/ai-services/openai/faq.yml
@@ -7,7 +7,7 @@ metadata:
   manager: nitinme
   ms.service: azure-ai-openai
   ms.topic: faq
-  ms.date: 04/24/2024
+  ms.date: 06/12/2024
   ms.author: mbullwin
   author: mrbullwinkle
 title: Azure OpenAI Service frequently asked questions
@@ -228,6 +228,11 @@ sections:
           What are the known limitations of GPT-4 Turbo with Vision?
         answer: |
           See the [limitations](./concepts/gpt-with-vision.md#limitations) section of the GPT-4 Turbo with Vision concepts guide.
+      - question: |
+          I keep getting truncated responses when I use GPT-4 Turbo vision models. Why is this happening?
+        answer:
+          By default GPT-4 `vision-preview` and GPT-4 `turbo-2024-04-09` have a `max_tokens` value of 16. Depending on your request this value is often too low and can lead to truncated responses. To resolve this issue, pass a larger `max_tokens` value as part of your chat completions API requests. GPT-4o defaults to 4096 max_tokens.
+
   - name: Assistants
     questions:
       - question: |
diff --git a/articles/ai-services/openai/quotas-limits.md b/articles/ai-services/openai/quotas-limits.md
@@ -10,7 +10,7 @@ ms.custom:
   - ignite-2023
   - references_regions
 ms.topic: conceptual
-ms.date: 06/05/2024
+ms.date: 06/12/2024
 ms.author: mbullwin
 ---
 
@@ -46,6 +46,7 @@ The following sections provide you with a quick guide to the default quotas and
 | Max file size for Assistants & fine-tuning | 512 MB |
 | Assistants token limit | 2,000,000 token limit |
 | GPT-4o max images per request (# of images in the messages array/conversation history) | 10 |
+| GPT-4 `vision-preview` & GPT-4 `turbo-2024-04-09` default max tokens | 16 <br><br> Increase the `max_tokens` parameter value to avoid truncated responses. GPT-4o max tokens defaults to 4096. |
 
 ## Regional quota limits