You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/openai/concepts/gpt-with-vision.md
+8-7Lines changed: 8 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -62,15 +62,16 @@ Base Pricing for GPT-4 Turbo with Vision is:
62
62
63
63
See the [Tokens section of the overview](/azure/ai-services/openai/overview#tokens) for information on how text and images translate to tokens.
64
64
65
-
If you turn on Enhancements to employ Azure Computer Vision foundational models to enhance the capabilities with GPT-4 Turbo with Vision, this does incur additional costs.
66
-
- Any image with text will incur usage for the **Enhanced add on features for Optical Character Recognition**: $1.50 per 1000 transactions
67
-
- Any image with objects detected will incur usage for the **Enhanced add-on features for Object Grounding**: $1.50 per 1000 transactions
65
+
If you turn on Enhancements, additional usage applies for using GPT-4 Turbo with Vision with Azure AI Vision functionality.
68
66
69
-
Additionally, if you use video prompt integration with the Video Retrieval add-on, it accrues other costs:
70
-
- Ingestion: $0.05 per minute of video
71
-
- Transactions: $0.25 per 1000 queries of the Video Retrieval index
67
+
| Model | Price |
68
+
|-----------------|-----------------|
69
+
| + Enhanced add-on features for OCR | $1.5 per 1000 transactions |
70
+
| + Enhanced add-on features for Object Detection | $1.5 per 1000 transactions |
71
+
| + Enhanced add-on feature for “Add your Image” Image Embeddings | $1.5 per 1000 transactions |
72
+
| + Enhanced add-on feature for “Video Retrieval” integration **<sup>1</sup>**| Ingestion: $0.05 per minute of video <br>Transactions: $0.25 per 1000 queries of the Video Retrieval index |
72
73
73
-
Processing videos involves the use of extra tokens to identify key frames for analysis. The number of these additional tokens will be roughly equivalent to the sum of the tokens in the text input, plus 700 tokens.
74
+
**<sup>1</sup>**Processing videos involves the use of extra tokens to identify key frames for analysis. The number of these additional tokens will be roughly equivalent to the sum of the tokens in the text input, plus 700 tokens.
0 commit comments