Skip to content

Commit 7e5f61e

Browse files
authored
Merge pull request #3 from msakande/Frogglew-mistral-ocr
Update use-image-models.md
2 parents 0d5a985 + f1c48a5 commit 7e5f61e

File tree

1 file changed

+40
-9
lines changed

1 file changed

+40
-9
lines changed

articles/ai-foundry/how-to/use-image-models.md

Lines changed: 40 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: How to use image models in the model catalog
33
titleSuffix: Azure AI Foundry
4-
description: Learn how to use image models from the AI Foundry model catalog.
4+
description: Learn how to use image-to-text models from the AI Foundry model catalog.
55
manager: scottpolly
66
author: msakande
77
reviewer: frogglew
@@ -13,9 +13,13 @@ ms.reviewer: frogglew
1313
ms.custom: references_regions, tool_generated
1414
---
1515

16-
# How to use image models in the model catalog
16+
# How to use image-to-text models in the model catalog
1717

18-
This article explains how to use _image_ models in the AI Foundry model catalog. Some models have unique parameters or data format requirements.
18+
This article explains how to use _image-to-text_ models in the AI Foundry model catalog.
19+
20+
Image-to-text models are designed to analyze images and generate descriptive text based on what they see. Think of them as a combination of a camera and a writer. You will provide an image as an input to the model, and the model will look at the image and identifies different elements within it, like objects, people, scenes, and even text. Based on its analysis, the model then generates a written description of the image, summarizing what it sees.
21+
22+
Image-to-text models excel at various use cases such as accessibility features, content organization (tagging), creating product and educational visual description, and digitizing content (via Optical Character Recognition). One might say image-to-text models bridge the gap between visual content and written language, making information more accessible and easier to process in various contexts.
1923

2024
## Prerequisites
2125

@@ -31,7 +35,7 @@ To use image models in your application, you need:
3135

3236
- The endpoint URL and key.
3337

34-
## Use image model
38+
## Use image-to-text model
3539

3640
1. Authenticate using an API key. First, deploy the model to generate the endpoint URL and an API key to authenticate against the service. In this example, the endpoint and key are strings holding the endpoint URL and the API key. The API endpoint URL and API key can be found on the **Deployments + Endpoint** page once the model is deployed.
3741

@@ -65,14 +69,16 @@ To use image models in your application, you need:
6569
"document": {
6670
"type": "document_url",
6771
"document_name": "test",
68-
"document_url": "data:application/pdf;base64,JVBER..."
72+
"document_url": "data:application/pdf;base64,JVBER... <replace with your base64 encoded image data>"
6973
}
7074
}'
7175
```
7276
7377
**More code samples for Mistral OCR 25.03**
7478
79+
**Processing PDF files**
7580
```bash
81+
# Read the pdf file
7682
input_file_path="assets/2201.04234v3.pdf"
7783
base64_value=$(base64 "$input_file_path")
7884
input_base64_value="data:application/pdf;base64,${base64_value}"
@@ -96,17 +102,42 @@ echo "$payload_body" | curl ${AZURE_AI_CHAT_ENDPOINT}/v1/ocr \
96102
-H "Authorization: Bearer ${AZURE_AI_CHAT_KEY}" \
97103
-d @- -o ocr_pdf_output.json
98104
```
99-
105+
**Processing an image file**
106+
```
107+
# Read the image file
108+
input_file_path="assets/receipt.png"
109+
base64_value=$(base64 "$input_file_path")
110+
input_base64_value="data:application/png;base64,${base64_value}"
111+
# echo $input_base64_value
112+
113+
# Prepare JSON data
114+
payload_body=$(cat <<EOF
115+
{
116+
"model": "mistral-ocr-2503",
117+
"document": {
118+
"type": "image_url",
119+
"image_url": "$input_base64_value"
120+
},
121+
"include_image_base64": true
122+
}
123+
EOF
124+
)
125+
126+
# Process the base64 data with ocr endpoint
127+
echo "$payload_body" | curl ${AZURE_AI_CHAT_ENDPOINT}/v1/ocr \
128+
-H "Content-Type: application/json" \
129+
-H "Authorization: Bearer ${AZURE_AI_CHAT_KEY}" \
130+
-d @- -o ocr_png_output.json
131+
```
100132
101133
## Model-specific parameters
102134
103-
Some image models only support specific data formats. Mistral OCR 25.03, for example, requires `base64 encoded image data` for their `document_url` parameter. The following table lists the supported and unsupported data formats for image models in the model catalog.
135+
Some image-to-text models only support specific data formats. Mistral OCR 25.03, for example, requires `base64 encoded image data` for their `document_url` parameter. The following table lists the supported and unsupported data formats for image models in the model catalog.
104136
105137
| Model | Supported | Not supported |
106138
| :---- | ----- | ----- |
107139
| Mistral OCR 25.03 | base64 encoded image data | document url, image url |
108-
| dall-e-3 | document url, image url, b64_json | base64 encoded image data |
109-
| gpt-image-1 | base64 encoded image data, image url | document url |
140+
110141
111142
112143
## Related content

0 commit comments

Comments
 (0)