You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
> You can use the Object detection feature through the [Azure OpenAI](/azure/ai-services/openai/overview) service. The **GPT-4 Turbo with Vision** model lets you chat with an AI assistant that can analyze the images you share, and the Vision Enhancement option uses Image Analysis to provide the AI assistance with more details (readable text and object locations) about the image. For more information, see the [GPT-4 Turbo with Vision quickstart](/azure/ai-services/openai/gpt-v-quickstart).
Copy file name to clipboardExpand all lines: articles/ai-services/computer-vision/concept-ocr.md
-2Lines changed: 0 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -23,8 +23,6 @@ OCR is a machine-learning-based technique for extracting text from in-the-wild a
23
23
24
24
The new Azure AI Vision Image Analysis 4.0 REST API offers the ability to extract printed or handwritten text from images in a unified performance-enhanced synchronous API that makes it easy to get all image insights including OCR results in a single API operation. The Read OCR engine is built on top of multiple deep learning models supported by universal script-based models for [global language support](./language-support.md).
25
25
26
-
> [!TIP]
27
-
> You can also use the OCR feature in conjunction with the [Azure OpenAI](/azure/ai-services/openai/overview) service. The **GPT-4 Turbo with Vision** model lets you chat with an AI assistant that can analyze the images you share, and the Vision Enhancement option uses Image Analysis to give the AI assistant more details (readable text and object locations) about the image. For more information, see the [GPT-4 Turbo with Vision quickstart](/azure/ai-services/openai/gpt-v-quickstart).
Copy file name to clipboardExpand all lines: articles/ai-services/computer-vision/overview-image-analysis.md
-2Lines changed: 0 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -61,8 +61,6 @@ You can analyze images to provide insights about their visual features and chara
61
61
|**Detect the color scheme** (v3.2 only) |Analyze color usage within an image. Azure AI Vision can determine whether an image is black & white or color and, for color images, identify the dominant and accent colors.|[Detect the color scheme](concept-detecting-color-schemes.md)|
62
62
|**Moderate content in images** (v3.2 only) |You can use Azure AI Vision to detect adult content in an image and return confidence scores for different classifications. The threshold for flagging content can be set on a sliding scale to accommodate your preferences.|[Detect adult content](concept-detecting-adult-content.md)|
63
63
64
-
> [!TIP]
65
-
> You can leverage the Read text and Object detection features of Image Analysis through the [Azure OpenAI](/azure/ai-services/openai/overview) service. The **GPT-4 Turbo with Vision** model lets you chat with an AI assistant that can analyze the images you share, and the Vision Enhancement option uses Image Analysis to give the AI assistant more details about the image (readable text and object locations). For more information, see the [GPT-4 Turbo with Vision quickstart](/azure/ai-services/openai/gpt-v-quickstart).
Copy file name to clipboardExpand all lines: articles/ai-services/openai/concepts/gpt-with-vision.md
-36Lines changed: 0 additions & 36 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -20,31 +20,6 @@ To try out GPT-4 Turbo with Vision, see the [quickstart](/azure/ai-services/open
20
20
21
21
The GPT-4 Turbo with Vision model answers general questions about what's present in the images or videos you upload.
22
22
23
-
## Enhancements
24
-
25
-
Enhancements let you incorporate other Azure AI services (such as Azure AI Vision) to add new functionality to the chat-with-vision experience.
26
-
27
-
> [!IMPORTANT]
28
-
> To use Vision enhancement, you need a Computer Vision resource. It must be in the paid (S1) tier and in the same Azure region as your GPT-4 Turbo with Vision resource.
29
-
30
-
> [!IMPORTANT]
31
-
> Vision enhancements are not supported by the GPT-4 Turbo GA model. They are only available with the preview models.
32
-
33
-
**Object grounding**: Azure AI Vision complements GPT-4 Turbo with Vision’s text response by identifying and locating salient objects in the input images. This lets the chat model give more accurate and detailed responses about the contents of the image.
34
-
35
-
:::image type="content" source="../media/concepts/gpt-v/object-grounding.png" alt-text="Screenshot of an image with object grounding applied. Objects have bounding boxes with labels.":::
36
-
37
-
:::image type="content" source="../media/concepts/gpt-v/object-grounding-response.png" alt-text="Screenshot of a chat response to an image prompt about an outfit. The response is an itemized list of clothing items seen in the image.":::
38
-
39
-
**Optical Character Recognition (OCR)**: Azure AI Vision complements GPT-4 Turbo with Vision by providing high-quality OCR results as supplementary information to the chat model. It allows the model to produce higher quality responses for images with dense text, transformed images, and numbers-heavy financial documents, and increases the variety of languages the model can recognize in text.
40
-
41
-
:::image type="content" source="../media/concepts/gpt-v/receipts.png" alt-text="Photo of several receipts.":::
42
-
43
-
:::image type="content" source="../media/concepts/gpt-v/ocr-response.png" alt-text="Screenshot of the JSON response of an OCR call.":::
44
-
45
-
**Video prompt**: The **video prompt** enhancement lets you use video clips as input for AI chat, enabling the model to generate summaries and answers about video content. It uses Azure AI Vision Video Retrieval to sample a set of frames from a video and create a transcript of the speech in the video.
@@ -59,15 +34,6 @@ Base Pricing for GPT-4 Turbo with Vision is:
59
34
60
35
See the [Tokens section of the overview](/azure/ai-services/openai/overview#tokens) for information on how text and images translate to tokens.
61
36
62
-
If you turn on Enhancements, additional usage applies for using GPT-4 Turbo with Vision with Azure AI Vision functionality.
63
-
64
-
| Model | Price |
65
-
|-----------------|-----------------|
66
-
| + Enhanced add-on features for OCR | $1.5 per 1000 transactions |
67
-
| + Enhanced add-on features for Object Detection | $1.5 per 1000 transactions |
68
-
| + Enhanced add-on feature for “Video Retrieval” integration **<sup>1</sup>**| Ingestion: $0.05 per minute of video <br>Transactions: $0.25 per 1000 queries of the Video Retrieval index |
69
-
70
-
**<sup>1</sup>** Processing videos involves the use of extra tokens to identify key frames for analysis. The number of these additional tokens will be roughly equivalent to the sum of the tokens in the text input, plus 700 tokens.
71
37
72
38
### Example image price calculation
73
39
> [!IMPORTANT]
@@ -108,9 +74,7 @@ This section describes the limitations of GPT-4 Turbo with Vision.
108
74
109
75
### Image support
110
76
111
-
-**Limitation on image enhancements per chat session**: Enhancements cannot be applied to multiple images within a single chat call.
112
77
-**Maximum input image size**: The maximum size for input images is restricted to 20 MB.
113
-
-**Object grounding in enhancement API**: When the enhancement API is used for object grounding, and the model detects duplicates of an object, it will generate one bounding box and label for all the duplicates instead of separate ones for each.
114
78
-**Low resolution accuracy**: When images are analyzed using the "low resolution" setting, it allows for faster responses and uses fewer input tokens for certain use cases. However, this could impact the accuracy of object and text recognition within the image.
115
79
-**Image chat restriction**: When you upload images in Azure OpenAI Studio or the API, there is a limit of 10 images per chat call.
0 commit comments