Skip to content

Commit bb2b398

Browse files
Merge pull request #261148 from PatrickFarley/openai-gptnext
Openai gptnext
2 parents 8622cf7 + 677453b commit bb2b398

File tree

2 files changed

+15
-15
lines changed

2 files changed

+15
-15
lines changed

articles/ai-services/openai/how-to/gpt-with-vision.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -347,18 +347,18 @@ Every response includes a `"finish_details"` field. The subfield `"type"` has th
347347

348348
If `finish_details.type` is `stop`, then there is another `"stop"` property that specifies the token that caused the output to end.
349349

350-
## Low or high fidelity image understanding
350+
## Detail parameter settings in image processing: Low, High, Auto
351351

352-
By controlling the _detail_ parameter, which has two options, `low` or `high`, you can control how the model processes the image and generates its textual understanding.
353-
- `low` disables the "high res" mode. The model receives a low-res 512x512 version of the image and represents the image with a budget of 65 tokens. This allows the API to return faster responses and consume fewer input tokens for use cases that don't require high detail.
354-
- `high` enables "high res" mode, which first allows the model to see the low res image and then creates detailed crops of input images as 512x512 squares based on the input image size. Each of the detailed crops uses twice the token budget (65 tokens) for a total of 129 tokens.
352+
The detail parameter in the model offers three choices: `low`, `high`, or `auto`, to adjust the way the model interprets and processes images. The default setting is auto, where the model decides between low or high based on the size of the image input.
353+
- `low` setting: the model does not activate the "high res" mode, instead processing a lower resolution 512x512 version of the image using 65 tokens, resulting in quicker responses and reduced token consumption for scenarios where fine detail isn't crucial.
354+
- `high` setting activates "high res" mode. Here, the model initially views the low-resolution image and then generates detailed 512x512 segments from the input image. Each segment uses double the token budget, amounting to 129 tokens per segment, allowing for a more detailed interpretation of the image.
355355

356356
## Limitations
357357

358358
### Image support
359359

360360
- **Limitation on image enhancements per chat session**: Enhancements cannot be applied to multiple images within a single chat call.
361-
- **Maximum input image size**: The maximum size for input images is restricted to 4 MB.
361+
- **Maximum input image size**: The maximum size for input images is restricted to 20 MB.
362362
- **Object grounding in enhancement API**: When the enhancement API is used for object grounding, and the model detects duplicates of an object, it will generate one bounding box and label for all the duplicates instead of separate ones for each.
363363
- **Low resolution accuracy**: When images are analyzed using the "low resolution" setting, it allows for faster responses and uses fewer input tokens for certain use cases. However, this could impact the accuracy of object and text recognition within the image.
364364
- **Image chat restriction**: When uploading images in the chat playground or the API, there is a limit of 10 images per chat call.

articles/ai-services/openai/includes/gpt-v-rest.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -61,11 +61,11 @@ Create a new Python file named _quickstart.py_. Open the new file in your prefer
6161
endpoint = f"{base_url}/chat/completions?api-version=2023-12-01-preview"
6262
data = {
6363
"messages": [
64-
{ "role": "system", "content": "You are a helpful assistant." }, # Content can be a string, OR
65-
{ "role": "user", "content": [ # It can be an array containing strings and images.
66-
"Describe this picture:",
67-
{ "image": "<base_64_encoded_image>" } # Images are represented like this.
68-
] }
64+
{ "role": "system", "content": "You are a helpful assistant." },
65+
{ "role": "user", "content": [
66+
{ "type": "text", "text": "Describe this picture:" },
67+
{ "type": "image_url", "url": "<URL or base-64-encoded image>" }
68+
] }
6969
],
7070
"max_tokens": 100
7171
}
@@ -136,11 +136,11 @@ The **object grounding** integration brings a new layer to data analysis and use
136136
}
137137
}],
138138
"messages": [
139-
{ "role": "system", "content": "You are a helpful assistant." }, # Content can be a string, OR
140-
{ "role": "user", "content": [ # It can be an array containing strings and images.
141-
"Describe this picture:",
142-
{ "image": "<base_64_encoded_image>" } # Images are represented like this.
143-
]}
139+
{ "role": "system", "content": "You are a helpful assistant." },
140+
{ "role": "user", "content": [
141+
{ "type": "text", "text": "Describe this picture:" },
142+
{ "type": "image_url", "url": "<URL or base-64-encoded image>" }
143+
]}
144144
],
145145
"max_tokens": 100
146146
}

0 commit comments

Comments
 (0)