Skip to content

Commit c7e6bca

Browse files
authored
Merge pull request #50212 from MicrosoftDocs/NEW-gen-ai-vision
New gen ai vision
2 parents 5d1b96e + 2ecbb0a commit c7e6bca

File tree

16 files changed

+241
-6
lines changed

16 files changed

+241
-6
lines changed

learn-pr/paths/create-computer-vision-solutions-azure-ai/index.yml

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,19 @@
11
### YamlMime:LearningPath
22
uid: learn.wwl.create-computer-vision-solutions-azure-ai
33
metadata:
4-
title: Create computer vision solutions with Azure AI Vision AI-3004
5-
description: How to create computer vision solutions with Azure AI Vision (AI-3004)
6-
ms.date: 1/29/2024
4+
title: Create computer vision solutions in Azure
5+
description: How to create computer vision solutions in Azure
6+
ms.date: 4/29/2025
77
author: wwlpublish
88
ms.author: berryivor
99
ms.topic: learning-path
10-
title: Create computer vision solutions with Azure AI Vision
10+
title: Create computer vision solutions in Azure
1111
prerequisites: |
1212
Before starting this learning path, you should already have:
1313
- Familiarity with Azure and the Azure portal.
1414
- Experience programming with C# or Python.
15-
If you have no previous programming experience, we recommend you complete the [Take your first steps with C#](/training/paths/csharp-first-steps/) or [Take your first steps with Python](/training/paths/python-first-steps/) learning path before starting this one.
1615
summary: |
17-
Computer vision is an area of artificial intelligence that deals with visual perception. Azure AI Vision includes multiple services that support common computer vision scenarios.
16+
Computer vision is an area of artificial intelligence that deals with visual perception. Azure AI includes multiple services that support common computer vision scenarios.
1817
iconUrl: /training/achievements/cognitive-services-computer-vision.svg
1918
levels:
2019
- intermediate
@@ -35,5 +34,6 @@ modules:
3534
- learn.wwl.detect-analyze-recognize-faces
3635
- learn.wwl.read-text-images-documents-with-computer-vision-service
3736
- learn.wwl.analyze-video
37+
- learn.wwl.develop-generative-ai-vision-apps
3838
trophy:
3939
uid: learn.wwl.create-computer-vision-solutions-azure-ai.trophy
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.develop-generative-ai-vision-apps.introduction
3+
title: Introduction
4+
metadata:
5+
title: Introduction
6+
description: "Get started with vision-enabled generative AI models."
7+
ms.date: 04/29/2025
8+
author: gmalc
9+
ms.author: gmalc
10+
ms.topic: unit
11+
durationInMinutes: 1
12+
content: |
13+
[!include[](includes/1-introduction.md)]
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.develop-generative-ai-vision-apps.deploy-multimodal-models
3+
title: Deploy a multimodal model
4+
metadata:
5+
title: Deploy a multimodal model
6+
description: "Deploy a multimodal model that can respond to image-based prompts."
7+
ms.date: 04/29/2025
8+
author: gmalc
9+
ms.author: gmalc
10+
ms.topic: unit
11+
durationInMinutes: 3
12+
content: |
13+
[!include[](includes/2-deploy-multimodal-model.md)]
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.develop-generative-ai-vision-apps.develop-visual-chat-apps
3+
title: Develop a vision-based chat app
4+
metadata:
5+
title: Develop a vision-based chat app
6+
description: "Use Azure AI Foundry, Azure AI Model Inference, and Azure OpenAI SDKs to develop a vision-based chat app."
7+
ms.date: 04/29/2025
8+
author: gmalc
9+
ms.author: gmalc
10+
ms.topic: unit
11+
durationInMinutes: 5
12+
content: |
13+
[!include[](includes/3-develop-visual-chat-app.md)]
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.develop-generative-ai-vision-apps.exercise
3+
title: Exercise - Develop a vision-enabled chat app
4+
metadata:
5+
title: Exercise - Develop a vision-enabled chat app
6+
description: "Get practical experience of deploying a multimodal model and creating a vision-enabled chat app."
7+
ms.date: 04/29/2025
8+
author: gmalc
9+
ms.author: gmalc
10+
ms.topic: unit
11+
durationInMinutes: 30
12+
content: |
13+
[!include[](includes/4-exercise.md)]
Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.develop-generative-ai-vision-apps.knowledge-check
3+
title: Module assessment
4+
metadata:
5+
title: Module assessment
6+
description: "Check your learning on vision-enabled generative AI."
7+
ms.date: 04/29/2025
8+
author: gmalc
9+
ms.author: gmalc
10+
ms.topic: unit
11+
durationInMinutes: 3
12+
content: |
13+
quiz:
14+
questions:
15+
- content: "Which kind of model can you use to respond to visual input?"
16+
choices:
17+
- content: "Only OpenAI GPT models"
18+
isCorrect: false
19+
explanation: "Incorrect."
20+
- content: "Embedding models"
21+
isCorrect: false
22+
explanation: Incorrect."
23+
- content: "Multimodal models"
24+
isCorrect: true
25+
explanation: "Correct."
26+
- content: "How can you submit a prompt that asks a model to analyze an image?"
27+
choices:
28+
- content: "Submit one prompt with an image-based message followed by another prompt with a text-based message."
29+
isCorrect: false
30+
explanation: "Incorrect."
31+
- content: "Submit a prompt that contains a multi-part user message, containing both text content and image content."
32+
isCorrect: true
33+
explanation: "Correct."
34+
- content: "Submit the image as the system message and the instruction or question as the user message."
35+
isCorrect: false
36+
explanation: "Incorrect."
37+
- content: "How can you include an image in a message?"
38+
choices:
39+
- content: "As a URL or as binary data"
40+
isCorrect: true
41+
explanation: "Correct."
42+
- content: "Only as a URL"
43+
isCorrect: false
44+
explanation: "Incorrect."
45+
- content: "Only as binary data"
46+
isCorrect: false
47+
explanation: "Incorrect."
48+
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
### YamlMime:ModuleUnit
2+
uid: learn.wwl.develop-generative-ai-vision-apps.summary
3+
title: Summary
4+
metadata:
5+
title: Summary
6+
description: "Reflect on what you've learned about vision-enabled generative AI models."
7+
ms.date: 04/29/2025
8+
author: gmalc
9+
ms.author: gmalc
10+
ms.topic: unit
11+
durationInMinutes: 1
12+
content: |
13+
[!include[](includes/6-summary.md)]
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
Generative AI models enable you to develop chat-based applications that reason over and respond to input. Often this input takes the form of a text-based prompt, but increasingly multimodal models that can respond to visual input are becoming available.
2+
3+
In this module, we'll discuss vision-enabled generative AI and explore how you can use Azure AI Foundry to create generative AI solutions that respond to prompts that include a mix of text and image data.
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
To handle prompts that include images, you need to deploy a *multimodal* generative AI model - in other words, a model that supports not only text-based input, but image-based (and in some cases, audio-based) input as well. Multimodal models available in Azure AI Foundry include (among others):
2+
3+
- Microsoft **Phi-4-multimodal-instruct**
4+
- OpenAI **gpt-4o**
5+
- OpenAI **gpt-4o-mini**
6+
7+
8+
> [!TIP]
9+
> To learn more about available models in Azure AI Foundry, see the **[Model catalog and collections in Azure AI Foundry portal](/azure/ai-foundry/how-to/model-catalog-overview)** article in the Azure AI Foundry documentation.
10+
11+
## Testing multimodal models with image-based prompts
12+
13+
After deploying a multimodal model, you can test it in the chat playground in Azure AI Foundry portal.
14+
15+
![Screenshot of the chat playground with an image-based prompt.](../media/image-prompt.png)
16+
17+
In the chat playground, you can upload an image from a local file and add text to the message to elicit a response from a multimodal model.
Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
To develop a client app that engages in vision-based chats with a multimodal model, you can use the same basic techniques used for text-based chats. You require a connection to the endpoint where the model is deployed, and you use that endpoint to submit prompts that consists of messages to the model and process the responses.
2+
3+
The key difference is that prompts for a vision-based chat include multi-part user messages that contain both a *text* (or *audio* where supported) content item and an *image* content item.
4+
5+
![Diagram of a multi-part prompt being submitted to a model.](../media/multi-part-prompt.png)
6+
7+
The JSON representation of a prompt that includes a multi-part user message looks something like this:
8+
9+
```json
10+
{
11+
"messages": [
12+
{ "role": "system", "content": "You are a helpful assistant." },
13+
{ "role": "user", "content": [
14+
{
15+
"type": "text",
16+
"text": "Describe this picture:"
17+
},
18+
{
19+
"type": "image_url",
20+
"image_url": {
21+
"url": "https://....."
22+
}
23+
}
24+
] }
25+
]
26+
}
27+
```
28+
29+
The image content item can be:
30+
31+
- A URL to an image file in a web site.
32+
- Binary image data
33+
34+
When using binary data to submit a local image file, the **image_url** content takes the form of a base64 encoded value in a data URL format:
35+
36+
```json
37+
{
38+
"type": "image_url",
39+
"image_url": {
40+
"url": "data:image/jpeg;base64,<binary_image_data>"
41+
}
42+
}
43+
```
44+
45+
Depending on the model type, and where you deployed it, you can use Microsoft Azure AI Model Inference or OpenAI APIs to submit vision-based prompts. These libraries also provide language-specific SDKs that abstract the underlying REST APIs.
46+
47+
In the exercise that follows in this module, you can use the Python or .NET SDK for the Azure AI Model Inference API and the OpenAI API to develop a vision-enabled chat application.

0 commit comments

Comments
 (0)