Merge pull request #50212 from MicrosoftDocs/NEW-gen-ai-vision

JillGrant615 · web-flow · commit c7e6bca2b835 · 2025-04-29T17:05:41.000-06:00
New gen ai vision
diff --git a/learn-pr/paths/create-computer-vision-solutions-azure-ai/index.yml b/learn-pr/paths/create-computer-vision-solutions-azure-ai/index.yml
@@ -1,20 +1,19 @@
 ### YamlMime:LearningPath
 uid: learn.wwl.create-computer-vision-solutions-azure-ai
 metadata:
-  title: Create computer vision solutions with Azure AI Vision AI-3004
-  description: How to create computer vision solutions with Azure AI Vision (AI-3004)
-  ms.date: 1/29/2024
+  title: Create computer vision solutions in Azure
+  description: How to create computer vision solutions in Azure
+  ms.date: 4/29/2025
   author: wwlpublish
   ms.author: berryivor
   ms.topic: learning-path
-title: Create computer vision solutions with Azure AI Vision
+title: Create computer vision solutions in Azure
 prerequisites: |
   Before starting this learning path, you should already have:
   - Familiarity with Azure and the Azure portal.
   - Experience programming with C# or Python.
-  If you have no previous programming experience, we recommend you complete the [Take your first steps with C#](/training/paths/csharp-first-steps/) or [Take your first steps with Python](/training/paths/python-first-steps/) learning path before starting this one.
 summary: |
-  Computer vision is an area of artificial intelligence that deals with visual perception. Azure AI Vision includes multiple services that support common computer vision scenarios.
+  Computer vision is an area of artificial intelligence that deals with visual perception. Azure AI includes multiple services that support common computer vision scenarios.
 iconUrl: /training/achievements/cognitive-services-computer-vision.svg
 levels:
 - intermediate
@@ -35,5 +34,6 @@ modules:
 - learn.wwl.detect-analyze-recognize-faces
 - learn.wwl.read-text-images-documents-with-computer-vision-service
 - learn.wwl.analyze-video
+- learn.wwl.develop-generative-ai-vision-apps
 trophy:
   uid: learn.wwl.create-computer-vision-solutions-azure-ai.trophy
diff --git a/learn-pr/wwl-data-ai/develop-generative-ai-vision-apps/1-introduction.yml b/learn-pr/wwl-data-ai/develop-generative-ai-vision-apps/1-introduction.yml
@@ -0,0 +1,13 @@
+### YamlMime:ModuleUnit
+uid: learn.wwl.develop-generative-ai-vision-apps.introduction
+title: Introduction
+metadata:
+  title: Introduction
+  description: "Get started with vision-enabled generative AI models."
+  ms.date: 04/29/2025
+  author: gmalc
+  ms.author: gmalc
+  ms.topic: unit
+durationInMinutes: 1
+content: |
+  [!include[](includes/1-introduction.md)]
diff --git a/learn-pr/wwl-data-ai/develop-generative-ai-vision-apps/2-deploy-multimodal-model.yml b/learn-pr/wwl-data-ai/develop-generative-ai-vision-apps/2-deploy-multimodal-model.yml
@@ -0,0 +1,13 @@
+### YamlMime:ModuleUnit
+uid: learn.wwl.develop-generative-ai-vision-apps.deploy-multimodal-models
+title: Deploy a multimodal model
+metadata:
+  title: Deploy a multimodal model
+  description: "Deploy a multimodal model that can respond to image-based prompts."
+  ms.date: 04/29/2025
+  author: gmalc
+  ms.author: gmalc
+  ms.topic: unit
+durationInMinutes: 3
+content: |
+  [!include[](includes/2-deploy-multimodal-model.md)]
diff --git a/learn-pr/wwl-data-ai/develop-generative-ai-vision-apps/3-develop-visual-chat-app.yml b/learn-pr/wwl-data-ai/develop-generative-ai-vision-apps/3-develop-visual-chat-app.yml
@@ -0,0 +1,13 @@
+### YamlMime:ModuleUnit
+uid: learn.wwl.develop-generative-ai-vision-apps.develop-visual-chat-apps
+title: Develop a vision-based chat app
+metadata:
+  title: Develop a vision-based chat app
+  description: "Use Azure AI Foundry, Azure AI Model Inference, and Azure OpenAI SDKs to develop a vision-based chat app."
+  ms.date: 04/29/2025
+  author: gmalc
+  ms.author: gmalc
+  ms.topic: unit
+durationInMinutes: 5
+content: |
+  [!include[](includes/3-develop-visual-chat-app.md)]
diff --git a/learn-pr/wwl-data-ai/develop-generative-ai-vision-apps/4-exercise.yml b/learn-pr/wwl-data-ai/develop-generative-ai-vision-apps/4-exercise.yml
@@ -0,0 +1,13 @@
+### YamlMime:ModuleUnit
+uid: learn.wwl.develop-generative-ai-vision-apps.exercise
+title: Exercise - Develop a vision-enabled chat app
+metadata:
+  title: Exercise - Develop a vision-enabled chat app
+  description: "Get practical experience of deploying a multimodal model and creating a vision-enabled chat app."
+  ms.date: 04/29/2025
+  author: gmalc
+  ms.author: gmalc
+  ms.topic: unit
+durationInMinutes: 30
+content: |
+  [!include[](includes/4-exercise.md)]
diff --git a/learn-pr/wwl-data-ai/develop-generative-ai-vision-apps/5-knowledge-check.yml b/learn-pr/wwl-data-ai/develop-generative-ai-vision-apps/5-knowledge-check.yml
@@ -0,0 +1,48 @@
+### YamlMime:ModuleUnit
+uid: learn.wwl.develop-generative-ai-vision-apps.knowledge-check
+title: Module assessment
+metadata:
+  title: Module assessment
+  description: "Check your learning on vision-enabled generative AI."
+  ms.date: 04/29/2025
+  author: gmalc
+  ms.author: gmalc
+  ms.topic: unit
+durationInMinutes: 3
+content: |
+quiz:
+  questions:
+  - content: "Which kind of model can you use to respond to visual input?"
+    choices:
+    - content: "Only OpenAI GPT models"
+      isCorrect: false
+      explanation: "Incorrect."
+    - content: "Embedding models"
+      isCorrect: false
+      explanation: Incorrect."
+    - content: "Multimodal models"
+      isCorrect: true
+      explanation: "Correct."
+  - content: "How can you submit a prompt that asks a model to analyze an image?"
+    choices:
+    - content: "Submit one prompt with an image-based message followed by another prompt with a text-based message."
+      isCorrect: false
+      explanation: "Incorrect."
+    - content: "Submit a prompt that contains a multi-part user message, containing both text content and image content."
+      isCorrect: true
+      explanation: "Correct."
+    - content: "Submit the image as the system message and the instruction or question as the user message."
+      isCorrect: false
+      explanation: "Incorrect."
+  - content: "How can you include an image in a message?"
+    choices:
+    - content: "As a URL or as binary data"
+      isCorrect: true
+      explanation: "Correct."
+    - content: "Only as a URL"
+      isCorrect: false
+      explanation: "Incorrect."
+    - content: "Only as binary data"
+      isCorrect: false
+      explanation: "Incorrect."
+
diff --git a/learn-pr/wwl-data-ai/develop-generative-ai-vision-apps/6-summary.yml b/learn-pr/wwl-data-ai/develop-generative-ai-vision-apps/6-summary.yml
@@ -0,0 +1,13 @@
+### YamlMime:ModuleUnit
+uid: learn.wwl.develop-generative-ai-vision-apps.summary
+title: Summary
+metadata:
+  title: Summary
+  description: "Reflect on what you've learned about vision-enabled generative AI models."
+  ms.date: 04/29/2025
+  author: gmalc
+  ms.author: gmalc
+  ms.topic: unit
+durationInMinutes: 1
+content: |
+  [!include[](includes/6-summary.md)]
diff --git a/learn-pr/wwl-data-ai/develop-generative-ai-vision-apps/includes/1-introduction.md b/learn-pr/wwl-data-ai/develop-generative-ai-vision-apps/includes/1-introduction.md
@@ -0,0 +1,3 @@
+Generative AI models enable you to develop chat-based applications that reason over and respond to input. Often this input takes the form of a text-based prompt, but increasingly multimodal models that can respond to visual input are becoming available.
+
+In this module, we'll discuss vision-enabled generative AI and explore how you can use Azure AI Foundry to create generative AI solutions that respond to prompts that include a mix of text and image data.
diff --git a/learn-pr/wwl-data-ai/develop-generative-ai-vision-apps/includes/2-deploy-multimodal-model.md b/learn-pr/wwl-data-ai/develop-generative-ai-vision-apps/includes/2-deploy-multimodal-model.md
@@ -0,0 +1,17 @@
+To handle prompts that include images, you need to deploy a *multimodal* generative AI model - in other words, a model that supports not only text-based input, but image-based (and in some cases, audio-based) input as well. Multimodal models available in Azure AI Foundry include (among others):
+
+- Microsoft **Phi-4-multimodal-instruct**
+- OpenAI **gpt-4o**
+- OpenAI **gpt-4o-mini**
+
+
+> [!TIP]
+> To learn more about available models in Azure AI Foundry, see the **[Model catalog and collections in Azure AI Foundry portal](/azure/ai-foundry/how-to/model-catalog-overview)** article in the Azure AI Foundry documentation.
+
+## Testing multimodal models with image-based prompts
+
+After deploying a multimodal model, you can test it in the chat playground in Azure AI Foundry portal.
+
+![Screenshot of the chat playground with an image-based prompt.](../media/image-prompt.png)
+
+In the chat playground, you can upload an image from a local file and add text to the message to elicit a response from a multimodal model.
diff --git a/learn-pr/wwl-data-ai/develop-generative-ai-vision-apps/includes/3-develop-visual-chat-app.md b/learn-pr/wwl-data-ai/develop-generative-ai-vision-apps/includes/3-develop-visual-chat-app.md
@@ -0,0 +1,47 @@
+To develop a client app that engages in vision-based chats with a multimodal model, you can use the same basic techniques used for text-based chats. You require a connection to the endpoint where the model is deployed, and you use that endpoint to submit prompts that consists of messages to the model and process the responses.
+
+The key difference is that prompts for a vision-based chat include multi-part user messages that contain both a *text* (or *audio* where supported) content item and an *image* content item.
+
+![Diagram of a multi-part prompt being submitted to a model.](../media/multi-part-prompt.png)
+
+The JSON representation of a prompt that includes a multi-part user message looks something like this:
+
+```json
+{ 
+    "messages": [ 
+        { "role": "system", "content": "You are a helpful assistant." }, 
+        { "role": "user", "content": [  
+            { 
+                "type": "text", 
+                "text": "Describe this picture:" 
+            },
+            { 
+                "type": "image_url",
+                "image_url": {
+                    "url": "https://....."
+                }
+            }
+        ] } 
+    ]
+} 
+```
+
+The image content item can be:
+
+- A URL to an image file in a web site.
+- Binary image data
+
+When using binary data to submit a local image file, the **image_url** content takes the form of a base64 encoded value in a data URL format:
+
+```json
+{
+    "type": "image_url",
+    "image_url": {
+       "url": "data:image/jpeg;base64,<binary_image_data>"
+    }
+}
+```
+
+Depending on the model type, and where you deployed it, you can use Microsoft Azure AI Model Inference or OpenAI APIs to submit vision-based prompts. These libraries also provide language-specific SDKs that abstract the underlying REST APIs.
+
+In the exercise that follows in this module, you can use the Python or .NET SDK for the Azure AI Model Inference API and the OpenAI API to develop a vision-enabled chat application.
diff --git a/learn-pr/wwl-data-ai/develop-generative-ai-vision-apps/includes/4-exercise.md b/learn-pr/wwl-data-ai/develop-generative-ai-vision-apps/includes/4-exercise.md
@@ -0,0 +1,8 @@
+If you have an Azure subscription, you can complete this exercise to develop a vision-enabled chat app.
+
+> [!NOTE]
+> If you don't have an Azure subscription, you can [sign up for an account](https://azure.microsoft.com/free?azure-portal=true), which includes credits for the first 30 days.
+
+Launch the exercise and follow the instructions.
+
+[![Button to launch exercise.](../media/launch-exercise.png)](https://go.microsoft.com/fwlink/?linkid=2318613&azure-portal=true)
diff --git a/learn-pr/wwl-data-ai/develop-generative-ai-vision-apps/includes/6-summary.md b/learn-pr/wwl-data-ai/develop-generative-ai-vision-apps/includes/6-summary.md
@@ -0,0 +1,6 @@
+In this module, you learned about vision-enabled generative AI models and how to implement chat solutions that include image-based input.
+
+Vision-enabled models let you create AI solutions that can understand images and respond to related questions or instructions. Beyond just identifying objects in pictures, some models can also use reasoning based on what they see. For instance, they can interpret a chart or assess if an object is damaged.
+
+> [!TIP]
+> For more information about working with multimodal models in Azure AI Foundry, see **[How to use image and audio in chat completions with Azure AI model inference](/azure/ai-foundry/model-inference/how-to/use-chat-multi-modal)** and **[Quickstart: Use images in your AI chats](/azure/ai-services/openai/gpt-v-quickstart)**.
diff --git a/learn-pr/wwl-data-ai/develop-generative-ai-vision-apps/index.yml b/learn-pr/wwl-data-ai/develop-generative-ai-vision-apps/index.yml
@@ -0,0 +1,41 @@
+### YamlMime:Module
+uid: learn.wwl.develop-generative-ai-vision-apps
+metadata:
+  title: Develop a vision-enabled generative AI application
+  description: A picture says a thousand words, and multimodal generative AI models can interpret images to respond to visual prompts. Learn how to build vision-enabled chat apps.
+  ms.date: 04/29/2025
+  author: gmalc
+  ms.author: gmalc
+  ms.topic: module-standard-task-based # Please don't edit, used for our analytics
+  ms.service: azure-ai-services
+title: Develop a vision-enabled generative AI application
+summary: A picture says a thousand words, and multimodal generative AI models can interpret images to respond to visual prompts. Learn how to build vision-enabled chat apps.
+abstract: |
+  After completing this module, you'll be able to:
+  - Deploy a vision-enabled generative AI model in Azure AI Foundry.
+  - Test an image-based prompt in the chat playground.
+  - Create a chat app that submits image-based prompts.
+prerequisites: |
+  Before starting this module, you should have:
+  - Experience with deploying generative AI models in Azure AI Foundry.
+  - Programming experience with Python or Microsoft C#.
+iconUrl: /learn/achievements/generic-badge.svg
+levels:
+  - intermediate
+roles: 
+  - ai-engineer
+products: 
+  - azure
+  - ai-services
+subjects: 
+  - artificial-intelligence
+units:
+- learn.wwl.develop-generative-ai-vision-apps.introduction
+- learn.wwl.develop-generative-ai-vision-apps.deploy-multimodal-models
+- learn.wwl.develop-generative-ai-vision-apps.develop-visual-chat-apps
+- learn.wwl.develop-generative-ai-vision-apps.exercise
+- learn.wwl.develop-generative-ai-vision-apps.knowledge-check
+- learn.wwl.develop-generative-ai-vision-apps.summary
+badge:
+  uid: learn.wwl.develop-generative-ai-vision-apps.badge
+
diff --git a/learn-pr/wwl-data-ai/develop-generative-ai-vision-apps/media/image-prompt.png b/learn-pr/wwl-data-ai/develop-generative-ai-vision-apps/media/image-prompt.png
diff --git a/learn-pr/wwl-data-ai/develop-generative-ai-vision-apps/media/launch-exercise.png b/learn-pr/wwl-data-ai/develop-generative-ai-vision-apps/media/launch-exercise.png
diff --git a/learn-pr/wwl-data-ai/develop-generative-ai-vision-apps/media/multi-part-prompt.png b/learn-pr/wwl-data-ai/develop-generative-ai-vision-apps/media/multi-part-prompt.png

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+Generative AI models enable you to develop chat-based applications that reason over and respond to input. Often this input takes the form of a text-based prompt, but increasingly multimodal models that can respond to visual input are becoming available.`
	`2`	`+`
	`3`	`+In this module, we'll discuss vision-enabled generative AI and explore how you can use Azure AI Foundry to create generative AI solutions that respond to prompts that include a mix of text and image data.`