You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/openai/how-to/gpt-with-vision.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -347,18 +347,18 @@ Every response includes a `"finish_details"` field. The subfield `"type"` has th
347
347
348
348
If `finish_details.type` is `stop`, then there is another `"stop"` property that specifies the token that caused the output to end.
349
349
350
-
## Low or high fidelity image understanding
350
+
## Detail parameter settings in image processing: Low, High, Auto
351
351
352
-
By controlling the _detail_ parameter, which has two options,`low`or `high`, you can control how the model processes the image and generates its textual understanding.
353
-
-`low`disables the "high res" mode. The model receives a low-res 512x512 version of the image and represents the image with a budget of 65 tokens. This allows the API to return faster responses and consume fewer input tokens for use cases that don't require high detail.
354
-
-`high`enables "high res" mode, which first allows the model to see the low res image and then creates detailed crops of input images as 512x512 squares based on the input image size. Each of the detailed crops uses twice the token budget (65 tokens) for a total of 129 tokens.
352
+
The detail parameter in the model offers three choices:`low`, `high`, or `auto`, to adjust the way the model interprets and processes images. The default setting is auto, where the model decides between low or high based on the size of the image input.
353
+
-`low`setting: the model does not activate the "high res" mode, instead processing a lower resolution 512x512 version of the image using 65 tokens, resulting in quicker responses and reduced token consumption for scenarios where fine detail isn't crucial.
354
+
-`high`setting activates "high res" mode. Here, the model initially views the low-resolution image and then generates detailed 512x512 segments from the input image. Each segment uses double the token budget, amounting to 129 tokens per segment, allowing for a more detailed interpretation of the image.
355
355
356
356
## Limitations
357
357
358
358
### Image support
359
359
360
360
-**Limitation on image enhancements per chat session**: Enhancements cannot be applied to multiple images within a single chat call.
361
-
-**Maximum input image size**: The maximum size for input images is restricted to 4 MB.
361
+
-**Maximum input image size**: The maximum size for input images is restricted to 20 MB.
362
362
-**Object grounding in enhancement API**: When the enhancement API is used for object grounding, and the model detects duplicates of an object, it will generate one bounding box and label for all the duplicates instead of separate ones for each.
363
363
-**Low resolution accuracy**: When images are analyzed using the "low resolution" setting, it allows for faster responses and uses fewer input tokens for certain use cases. However, this could impact the accuracy of object and text recognition within the image.
364
364
-**Image chat restriction**: When uploading images in the chat playground or the API, there is a limit of 10 images per chat call.
0 commit comments