Skip to content

Commit 49143c8

Browse files
authored
Update gpt-with-vision.md
Updates to pricing example structure
1 parent 149b520 commit 49143c8

File tree

1 file changed

+8
-7
lines changed

1 file changed

+8
-7
lines changed

articles/ai-services/openai/concepts/gpt-with-vision.md

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -62,15 +62,16 @@ Base Pricing for GPT-4 Turbo with Vision is:
6262

6363
See the [Tokens section of the overview](/azure/ai-services/openai/overview#tokens) for information on how text and images translate to tokens.
6464

65-
If you turn on Enhancements to employ Azure Computer Vision foundational models to enhance the capabilities with GPT-4 Turbo with Vision, this does incur additional costs.
66-
- Any image with text will incur usage for the **Enhanced add on features for Optical Character Recognition**: $1.50 per 1000 transactions
67-
- Any image with objects detected will incur usage for the **Enhanced add-on features for Object Grounding**: $1.50 per 1000 transactions
65+
If you turn on Enhancements, additional usage applies for using GPT-4 Turbo with Vision with Azure AI Vision functionality.
6866

69-
Additionally, if you use video prompt integration with the Video Retrieval add-on, it accrues other costs:
70-
- Ingestion: $0.05 per minute of video
71-
- Transactions: $0.25 per 1000 queries of the Video Retrieval index
67+
| Model | Price |
68+
|-----------------|-----------------|
69+
| + Enhanced add-on features for OCR | $1.5 per 1000 transactions |
70+
| + Enhanced add-on features for Object Detection | $1.5 per 1000 transactions |
71+
| + Enhanced add-on feature for “Add your Image” Image Embeddings | $1.5 per 1000 transactions |
72+
| + Enhanced add-on feature for “Video Retrieval” integration **<sup>1</sup>** | Ingestion: $0.05 per minute of video <br>Transactions: $0.25 per 1000 queries of the Video Retrieval index |
7273

73-
Processing videos involves the use of extra tokens to identify key frames for analysis. The number of these additional tokens will be roughly equivalent to the sum of the tokens in the text input, plus 700 tokens.
74+
**<sup>1</sup>** Processing videos involves the use of extra tokens to identify key frames for analysis. The number of these additional tokens will be roughly equivalent to the sum of the tokens in the text input, plus 700 tokens.
7475

7576
### Example image price calculation
7677
> [!IMPORTANT]

0 commit comments

Comments
 (0)