Skip to content

Commit 3e3fb04

Browse files
committed
Merge branch 'main' into release-public-preview-translator-v4
2 parents a14cb5e + 81c7e68 commit 3e3fb04

File tree

21 files changed

+1137
-254
lines changed

21 files changed

+1137
-254
lines changed

articles/ai-foundry/concepts/models-featured.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -262,7 +262,7 @@ Mistral AI offers two categories of models, namely:
262262
| [Mistral-Large-2411](https://ai.azure.com/explore/models/Mistral-Large-2411/version/2/registry/azureml-mistral) | [chat-completion](../model-inference/how-to/use-chat-completions.md?context=/azure/ai-foundry/context/context) | - **Input:** text (128,000 tokens) <br /> - **Output:** text (4,096 tokens) <br /> - **Tool calling:** Yes <br /> - **Response formats:** Text, JSON |
263263
| [Mistral-large-2407](https://ai.azure.com/explore/models/Mistral-large-2407/version/1/registry/azureml-mistral) <br /> (deprecated) | [chat-completion](../model-inference/how-to/use-chat-completions.md?context=/azure/ai-foundry/context/context) | - **Input:** text (131,072 tokens) <br /> - **Output:** text (4,096 tokens) <br /> - **Tool calling:** Yes <br /> - **Response formats:** Text, JSON |
264264
| [Mistral-large](https://ai.azure.com/explore/models/Mistral-large/version/1/registry/azureml-mistral) <br /> (deprecated) | [chat-completion](../model-inference/how-to/use-chat-completions.md?context=/azure/ai-foundry/context/context) | - **Input:** text (32,768 tokens) <br /> - **Output:** text (4,096 tokens) <br /> - **Tool calling:** Yes <br /> - **Response formats:** Text, JSON |
265-
| [Mistral-OCR-2503](https://aka.ms/aistudio/landing/mistral-ocr-2503) | image to text | - **Input:** image or PDF pages (1,000 pages, max 50MB PDF file) <br> - **Output:** text <br /> - **Tool calling:** No <br /> - **Response formats:** Text, JSON, Markdown |
265+
| [Mistral-OCR-2503](https://aka.ms/aistudio/landing/mistral-ocr-2503) | [image to text](../how-to/use-image-models.md) | - **Input:** image or PDF pages (1,000 pages, max 50MB PDF file) <br> - **Output:** text <br /> - **Tool calling:** No <br /> - **Response formats:** Text, JSON, Markdown |
266266
| [Mistral-small-2503](https://aka.ms/aistudio/landing/mistral-small-2503) | [chat-completion (with images)](../model-inference/how-to/use-chat-multi-modal.md?context=/azure/ai-foundry/context/context) | - **Input:** text and images (131,072 tokens), <br> image-based tokens are 16px x 16px <br> blocks of the original images <br /> - **Output:** text (4,096 tokens) <br /> - **Tool calling:** Yes <br /> - **Response formats:** Text, JSON |
267267
| [Mistral-small](https://ai.azure.com/explore/models/Mistral-small/version/1/registry/azureml-mistral) | [chat-completion](../model-inference/how-to/use-chat-completions.md?context=/azure/ai-foundry/context/context) | - **Input:** text (32,768 tokens) <br /> - **Output:** text (4,096 tokens) <br /> - **Tool calling:** Yes <br /> - **Response formats:** Text, JSON |
268268

Lines changed: 149 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,149 @@
1+
---
2+
title: How to use image-to-text models in the model catalog
3+
titleSuffix: Azure AI Foundry
4+
description: Learn how to use image-to-text models from the AI Foundry model catalog.
5+
manager: scottpolly
6+
author: msakande
7+
reviewer: frogglew
8+
ms.service: azure-ai-model-inference
9+
ms.topic: how-to
10+
ms.date: 05/02/2025
11+
ms.author: mopeakande
12+
ms.reviewer: frogglew
13+
ms.custom: references_regions, tool_generated
14+
---
15+
16+
# How to use image-to-text models in the model catalog
17+
18+
This article explains how to use _image-to-text_ models in the AI Foundry model catalog.
19+
20+
Image-to-text models are designed to analyze images and generate descriptive text based on what they see. Think of them as a combination of a camera and a writer. You provide an image as an input to the model, and the model looks at the image and identifies different elements within it, like objects, people, scenes, and even text. Based on its analysis, the model then generates a written description of the image, summarizing what it sees.
21+
22+
Image-to-text models excel at various use cases such as accessibility features, content organization (tagging), creating product and educational visual descriptions, and digitizing content via Optical Character Recognition (OCR). One might say image-to-text models bridge the gap between visual content and written language, making information more accessible and easier to process in various contexts.
23+
24+
## Prerequisites
25+
26+
To use image models in your application, you need:
27+
28+
- An Azure subscription with a valid payment method. Free or trial Azure subscriptions won't work. If you don't have an Azure subscription, create a [paid Azure account](https://azure.microsoft.com/pricing/purchase-options/pay-as-you-go) to begin.
29+
30+
- An [Azure AI Foundry project](create-projects.md).
31+
32+
- An image model deployment on Azure AI Foundry.
33+
34+
- This article uses a __Mistral OCR__ model deployment.
35+
36+
- The endpoint URL and key.
37+
38+
## Use image-to-text model
39+
40+
1. Authenticate using an API key. First, deploy the model to generate the endpoint URL and an API key to authenticate against the service. In this example, the endpoint and key are strings holding the endpoint URL and the API key. The API endpoint URL and API key can be found on the **Deployments + Endpoint** page once the model is deployed.
41+
42+
If you're using Bash:
43+
44+
```bash
45+
export AZURE_API_KEY = "<your-api-key>"
46+
```
47+
48+
If you're in PowerShell:
49+
50+
```powershell
51+
$Env:AZURE_API_KEY = "<your-api-key>"
52+
```
53+
54+
If you're using Windows command prompt:
55+
56+
```
57+
export AZURE_API_KEY = "<your-api-key>"
58+
```
59+
60+
1. Run a basic code sample. Different image models accept different data formats. In this example, _Mistral OCR 25.03_ supports only base64 encoded data; document url or image url isn't supported. Paste the following code into a shell.
61+
62+
```http
63+
curl --request POST \
64+
--url https://<your_serverless_endpoint>/v1/ocr \
65+
--header 'Authorization: <api_key>' \
66+
--header 'Content-Type: Application/json' \
67+
--data '{
68+
"model": "mistral-ocr-2503",
69+
"document": {
70+
"type": "document_url",
71+
"document_name": "test",
72+
"document_url": "data:application/pdf;base64,JVBER... <replace with your base64 encoded image data>"
73+
}
74+
}'
75+
```
76+
77+
## More code samples for Mistral OCR 25.03
78+
79+
To process PDF files:
80+
81+
```bash
82+
# Read the pdf file
83+
input_file_path="assets/2201.04234v3.pdf"
84+
base64_value=$(base64 "$input_file_path")
85+
input_base64_value="data:application/pdf;base64,${base64_value}"
86+
# echo $input_base64_value
87+
88+
# Prepare JSON data
89+
payload_body=$(cat <<EOF
90+
{
91+
"model": "mistral-ocr-2503",
92+
"document": {
93+
"type": "document_url",
94+
"document_url": "$input_base64_value"
95+
},
96+
"include_image_base64": true
97+
}
98+
EOF
99+
)
100+
101+
echo "$payload_body" | curl ${AZURE_AI_CHAT_ENDPOINT}/v1/ocr \
102+
-H "Content-Type: application/json" \
103+
-H "Authorization: Bearer ${AZURE_AI_CHAT_KEY}" \
104+
-d @- -o ocr_pdf_output.json
105+
```
106+
107+
To process an image file:
108+
109+
```bash
110+
# Read the image file
111+
input_file_path="assets/receipt.png"
112+
base64_value=$(base64 "$input_file_path")
113+
input_base64_value="data:application/png;base64,${base64_value}"
114+
# echo $input_base64_value
115+
116+
# Prepare JSON data
117+
payload_body=$(cat <<EOF
118+
{
119+
"model": "mistral-ocr-2503",
120+
"document": {
121+
"type": "image_url",
122+
"image_url": "$input_base64_value"
123+
},
124+
"include_image_base64": true
125+
}
126+
EOF
127+
)
128+
129+
# Process the base64 data with ocr endpoint
130+
echo "$payload_body" | curl ${AZURE_AI_CHAT_ENDPOINT}/v1/ocr \
131+
-H "Content-Type: application/json" \
132+
-H "Authorization: Bearer ${AZURE_AI_CHAT_KEY}" \
133+
-d @- -o ocr_png_output.json
134+
```
135+
136+
## Model-specific parameters
137+
138+
Some image-to-text models only support specific data formats. Mistral OCR 25.03, for example, requires `base64 encoded image data` for their `document_url` parameter. The following table lists the supported and unsupported data formats for image models in the model catalog.
139+
140+
| Model | Supported | Not supported |
141+
| :---- | ----- | ----- |
142+
| Mistral OCR 25.03 | base64 encoded image data | document url, image url |
143+
144+
145+
146+
## Related content
147+
148+
- [How to use image generation models on Azure OpenAI](../../ai-services/openai/how-to/dall-e.md)
149+

articles/ai-foundry/model-inference/includes/use-embeddings/python.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ If you have configured the resource to with **Microsoft Entra ID** support, you
5252
```python
5353
import os
5454
from azure.ai.inference import EmbeddingsClient
55-
from azure.core.credentials import AzureKeyCredential
55+
from azure.identity import DefaultAzureCredential
5656

5757
model = EmbeddingsClient(
5858
endpoint=os.environ["AZURE_INFERENCE_ENDPOINT"],

articles/ai-foundry/toc.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -130,6 +130,8 @@ items:
130130
href: ../ai-foundry/model-inference/how-to/use-chat-reasoning.md?context=/azure/ai-foundry/context/context
131131
- name: Work with multimodal models
132132
href: ../ai-foundry/model-inference/how-to/use-chat-multi-modal.md?context=/azure/ai-foundry/context/context
133+
- name: Work with image models
134+
href: how-to/use-image-models.md
133135
- name: Azure OpenAI and AI services
134136
items:
135137
- name: Use Azure OpenAI Service in Azure AI Foundry portal
Lines changed: 18 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,36 +1,39 @@
11
---
2-
title: "Azure AI Foundry docs: What's new for March 2025"
3-
description: "What's new in the Azure AI Foundry docs for March 2025."
2+
title: "Azure AI Foundry docs: What's new for April 2025"
3+
description: "What's new in the Azure AI Foundry docs for April 2025."
44
ms.author: smcdowell
55
author: skpmcdowell
66
ms.topic: whats-new
77
ms.subject: ai-studio
8-
ms.custom: March-2025
9-
ms.date: 04/02/2025
8+
ms.custom: April-2025
9+
ms.date: 05/03/2025
1010
---
1111

12-
# Azure AI Foundry docs: What's new for March 2025
12+
# Azure AI Foundry docs: What's new for April 2025
1313

14-
Welcome to what's new in the Azure AI Foundry docs for March 2025. This article lists some of the major changes to docs during this period.
14+
Welcome to what's new in the Azure AI Foundry docs for April 2025. This article lists some of the major changes to docs during this period.
1515

1616

1717
## Azure AI Foundry
1818

1919
### New articles
2020

21-
- [Featured models of Azure AI Foundry](../ai-foundry/concepts/models-featured.md)
22-
- [How to deploy NVIDIA Inference Microservices](../ai-foundry/how-to/deploy-nvidia-inference-microservice.md)
23-
- [How to use image and audio in chat completions with Azure AI model inference](../ai-foundry/model-inference/how-to/use-chat-multi-modal.md)
24-
- [Tutorial: Get started with DeepSeek-R1 reasoning model in Azure AI model inference](../ai-foundry/model-inference/tutorials/get-started-deepseek-r1.md)
21+
- [AI Red Teaming Agent (preview)](../ai-foundry/concepts/ai-red-teaming-agent.md)
22+
- [Evaluate your AI agents locally with Azure AI Evaluation SDK (preview)](../ai-foundry/how-to/develop/agent-evaluate-sdk.md)
23+
- [How to use structured outputs for chat models](../ai-foundry/model-inference/how-to/use-structured-outputs.md)
24+
- [Run automated safety scans with AI Red Teaming Agent (preview)](../ai-foundry/how-to/develop/run-scans-ai-red-teaming-agent.md)
25+
- [Work with Azure AI Agent Service in Visual Studio Code (Preview)](../ai-foundry/how-to/develop/vs-code-agents.md)
26+
- [Work with the Azure AI Foundry for Visual Studio Code extension (Preview)](../ai-foundry/how-to/develop/get-started-projects-vs-code.md)
2527

2628

2729
### Updated articles
2830

29-
- [Deploy a flow for real-time inference](../ai-foundry/how-to/flow-deploy.md)
30-
- [Fine-tune models using serverless APIs in Azure AI Foundry](../ai-foundry/how-to/fine-tune-serverless.md)
31-
- [How to deploy and inference a managed compute deployment with code](../ai-foundry/how-to/deploy-models-managed.md)
32-
- [How to trace your application with Azure AI Foundry project library](../ai-foundry/how-to/develop/trace-local-sdk.md)
33-
- [Monitor quality and token usage of deployed prompt flow applications](../ai-foundry/how-to/monitor-quality-safety.md)
31+
- [Evaluate your AI agents locally with Azure AI Evaluation SDK (preview)](../ai-foundry/how-to/develop/agent-evaluate-sdk.md)
32+
- [Evaluate your Generative AI application locally with the Azure AI Evaluation SDK](../ai-foundry/how-to/develop/evaluate-sdk.md)
33+
- [Evaluation and monitoring metrics for generative AI](../ai-foundry/concepts/evaluation-metrics-built-in.md)
34+
- [Fine-tune models using serverless APIs in Azure AI Foundry](../ai-foundry/how-to/fine-tune-serverless.md)
35+
- [How to configure a private link for Azure AI Foundry hubs](../ai-foundry/how-to/configure-private-link.md)
36+
- [How to use MedImageParse healthcare AI models for segmentation of medical images](../ai-foundry/how-to/healthcare-ai/deploy-medimageparse.md)
3437

3538

3639

articles/ai-services/computer-vision/index.yml

Lines changed: 21 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -32,27 +32,6 @@ highlightedContent:
3232

3333
conceptualContent:
3434
items:
35-
- title: Optical character recognition
36-
links:
37-
- itemType: overview
38-
text: About OCR
39-
url: overview-ocr.md
40-
- itemType: quickstart
41-
text: Get started with OCR
42-
url: quickstarts-sdk/client-library.md
43-
- itemType: how-to-guide
44-
text: Call the Read API
45-
url: how-to/call-read-api.md
46-
- itemType: deploy
47-
text: Use the Read OCR container
48-
url: computer-vision-how-to-install-containers.md
49-
- itemType: learn
50-
text: Microsoft Learn training
51-
url: /training/modules/read-text-images-documents-with-computer-vision-service/
52-
- itemType: reference
53-
text: OCR API reference
54-
url: /rest/api/computervision/recognize-printed-text?view=rest-computervision-v3.2
55-
5635
- title: Image Analysis
5736
links:
5837
- itemType: overview
@@ -119,6 +98,27 @@ conceptualContent:
11998
- itemType: how-to-guide
12099
text: Call the Video Retrieval APIs
121100
url: how-to/video-retrieval.md
101+
- title: Optical character recognition (legacy)
102+
links:
103+
- itemType: overview
104+
text: About OCR
105+
url: overview-ocr.md
106+
- itemType: quickstart
107+
text: Get started with OCR
108+
url: quickstarts-sdk/client-library.md
109+
- itemType: how-to-guide
110+
text: Call the Read API
111+
url: how-to/call-read-api.md
112+
- itemType: deploy
113+
text: Use the Read OCR container
114+
url: computer-vision-how-to-install-containers.md
115+
- itemType: learn
116+
text: Microsoft Learn training
117+
url: /training/modules/read-text-images-documents-with-computer-vision-service/
118+
- itemType: reference
119+
text: OCR API reference
120+
url: /rest/api/computervision/recognize-printed-text?view=rest-computervision-v3.2
121+
122122

123123
tools:
124124
title: Software development kits (SDKs)

articles/ai-services/computer-vision/overview-ocr.md

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,12 @@ ms.custom: devx-track-csharp
1616

1717
# OCR - Optical Character Recognition
1818

19+
> [!WARNING]
20+
> This service, including the Azure AI Vision legacy [OCR API in v3.2](/rest/api/computervision/recognize-printed-text?view=rest-computervision-v3.2) and [RecognizeText API in v2.1](/rest/api/computervision/recognize-printed-text/recognize-printed-text?view=rest-computervision-v2.1), is not recommended for use.
21+
22+
[!INCLUDE [read-editions](includes/read-editions.md)]
23+
24+
1925
OCR or Optical Character Recognition is also referred to as text recognition or text extraction. Machine-learning-based OCR techniques allow you to extract printed or handwritten text from images such as posters, street signs and product labels, as well as from documents like articles, reports, forms, and invoices. The text is typically extracted as words, text lines, and paragraphs or text blocks, enabling access to digital version of the scanned text. This eliminates or significantly reduces the need for manual data entry.
2026

2127

@@ -24,10 +30,8 @@ OCR or Optical Character Recognition is also referred to as text recognition or
2430

2531
Microsoft's **Read** OCR engine is composed of multiple advanced machine-learning based models supporting [global languages](./language-support.md). It can extract printed and handwritten text including mixed languages and writing styles. **Read** is available as cloud service and on-premises container for deployment flexibility. It's also available as a synchronous API for single, non-document, image-only scenarios with performance enhancements that make it easier to implement OCR-assisted user experiences.
2632

27-
> [!WARNING]
28-
> The Azure AI Vision legacy [OCR API in v3.2](/rest/api/computervision/recognize-printed-text?view=rest-computervision-v3.2) and [RecognizeText API in v2.1](/rest/api/computervision/recognize-printed-text/recognize-printed-text?view=rest-computervision-v2.1) operations are not recommended for use.
2933

30-
[!INCLUDE [read-editions](includes/read-editions.md)]
34+
3135

3236
## How is OCR related to Intelligent Document Processing (IDP)?
3337

0 commit comments

Comments
 (0)