You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-studio/how-to/flow-process-image.md
+19-23Lines changed: 19 additions & 23 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,33 +1,29 @@
1
1
---
2
-
title: Process images in prompt flow (preview)
3
-
titleSuffix: Azure Machine Learning
4
-
description: Learn how to incorporate images into prompt flow.
5
-
services: machine-learning
6
-
ms.service: machine-learning
7
-
ms.subservice: prompt-flow
2
+
title: Process images in prompt flow
3
+
titleSuffix: Azure AI Studio
4
+
description: Learn how to use images in prompt flow.
5
+
ms.service: azure-ai-studio
8
6
ms.topic: how-to
9
-
ms.author: jinzhong
10
-
author: zhongj
11
-
ms.reviewer: lagayhar
12
-
ms.date: 02/05/2024
7
+
ms.date: 2/26/2024
8
+
ms.reviewer: jinzhong
9
+
ms.author: lagayhar
10
+
author: lgayhardt
13
11
---
14
12
15
-
# Process images in prompt flow (preview)
13
+
# Process images in prompt flow
14
+
15
+
[!INCLUDE [Azure AI Studio preview](../includes/preview-ai-studio.md)]
16
16
17
17
Multimodal Large Language Models (LLMs), which can process and interpret diverse forms of data inputs, present a powerful tool that can elevate the capabilities of language-only systems to new heights. Among the various data types, images are important for many real-world applications. The incorporation of image data into AI systems provides an essential layer of visual understanding.
18
18
19
-
In this article, you'll learn:
19
+
In this article, you learn:
20
20
> [!div class="checklist"]
21
21
> - How to use image data in prompt flow
22
22
> - How to use built-in GPT-4V tool to analyze image inputs.
23
23
> - How to build a chatbot that can process image and text inputs.
24
24
> - How to create a batch run using image data.
25
25
> - How to consume online endpoint with image data.
26
26
27
-
> [!IMPORTANT]
28
-
> Prompt flow image support is currently in public preview. This preview is provided without a service-level agreement, and is not recommended for production workloads. Certain features might not be supported or might have constrained capabilities.
29
-
> For more information, see [Supplemental Terms of Use for Microsoft Azure Previews](https://azure.microsoft.com/support/legal/preview-supplemental-terms/).
30
-
31
27
## Image type in prompt flow
32
28
33
29
Prompt flow input and output support Image as a new data type.
@@ -38,10 +34,10 @@ To use image data in prompt flow authoring page:
38
34
:::image type="content" source="../media/prompt-flow/how-to-process-image/add-image-type-input.png" alt-text="Screenshot of flow authoring page showing adding flow input as Image type." lightbox = "../media/prompt-flow/how-to-process-image/add-image-type-input.png":::
39
35
2. Preview the image. If the image isn't displayed correctly, delete the image and add it again.
3. You might want to **preprocess the image using Python tool** before feeding it to LLM, for example, you can resize or crop the image to a smaller size.
37
+
3. You might want to preprocess the image using the [Python tool](./prompt-flow-tools/python-tool.md) before feeding it to the LLM. For example, you can resize or crop the image to a smaller size.
42
38
:::image type="content" source="../media/prompt-flow/how-to-process-image/process-image-using-python.png" alt-text="Screenshot of using python tool to do image preprocessing." lightbox = "../media/prompt-flow/how-to-process-image/process-image-using-python.png":::
43
39
> [!IMPORTANT]
44
-
> To process image using Python function, you need to use the `Image` class, import it from `promptflow.contracts.multimedia` package. The Image class is used to represent an Image type within prompt flow. It is designed to work with image data in byte format, which is convenient when you need to handle or manipulate the image data directly.
40
+
> To process images using a Python function, you need to use the `Image` class that you import from the `promptflow.contracts.multimedia` package. The `Image` class is used to represent an `Image` type within prompt flow. It is designed to work with image data in byte format, which is convenient when you need to handle or manipulate the image data directly.
45
41
>
46
42
> To return the processed image data, you need to use the `Image` class to wrap the image data. Create an `Image` object by providing the image data in bytes and the [MIME type](https://developer.mozilla.org/docs/Web/HTTP/Basics_of_HTTP/MIME_types/Common_types)`mime_type`. The MIME type lets the system understand the format of the image data, or it can be `*` for unknown type.
47
43
@@ -51,7 +47,7 @@ If the Image object from Python node is set as the flow output, you can preview
51
47
52
48
## Use GPT-4V tool
53
49
54
-
Azure OpenAI GPT-4 Turbo with Vision tool and OpenAI GPT-4V are built-in tools in prompt flow that can use OpenAI GPT-4V model to answer questions based on input images. You can find the tool by selecting **More tool** in the flow authoring page.
50
+
The [Azure OpenAI GPT-4 Turbo with Vision tool](./prompt-flow-tools/azure-open-ai-gpt-4v-tool.md) and OpenAI GPT-4V are built-in tools in prompt flow that can use OpenAI GPT-4V model to answer questions based on input images. You can find the tool by selecting **+ More tools** in the flow authoring page.
55
51
56
52
Add the [Azure OpenAI GPT-4 Turbo with Vision tool](./prompt-flow-tools/azure-open-ai-gpt-4v-tool.md) to the flow. Make sure you have an Azure OpenAI connection, with the availability of GPT-4 vision-preview models.
57
53
@@ -65,11 +61,11 @@ You can assign a value to the image input through the following ways:
65
61
66
62
- Reference from the flow input of Image type.
67
63
- Reference from other node's output of Image type.
68
-
- Upload, drag, paste an image, or specify an image URL or the relative image path.
64
+
- Upload, drag, or paste an image, or specify an image URL or the relative image path.
69
65
70
66
## Build a chatbot to process images
71
67
72
-
In this section, you'll learn how to build a chatbot that can process image and text inputs.
68
+
In this section, you learn how to build a chatbot that can process image and text inputs.
73
69
74
70
Assume you want to build a chatbot that can answer any questions about the image and text together. You can achieve this by following the steps below:
75
71
@@ -120,13 +116,13 @@ If the batch run outputs contain images, you can check the **flow_outputs datase
120
116
121
117
You can [deploy a flow to an online endpoint for real-time inference](./flow-deploy.md).
122
118
123
-
Currently the **Test** tab in the deployment detail page does not support image inputs or outputs.
119
+
Currently the **Test** tab in the deployment detail page doesn't support image inputs or outputs.
124
120
125
121
For now, you can test the endpoint by sending request including image inputs.
126
122
127
123
To consume the online endpoint with image input, you should represent the image by using the format `{"data:<mime type>;<representation>": "<value>"}`. In this case, `<representation>` can either be `url` or `base64`.
128
124
129
-
If the flow generates image output, it will be returned with `base64` format, for example, `{"data:<mime type>;base64": "<base64 string>"}`.
125
+
If the flow generates image output, it is returned with `base64` format, for example, `{"data:<mime type>;base64": "<base64 string>"}`.
Copy file name to clipboardExpand all lines: articles/ai-studio/how-to/prompt-flow-tools/azure-open-ai-gpt-4v-tool.md
+48-8Lines changed: 48 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,7 +5,7 @@ description: This article introduces the Azure OpenAI GPT-4 Turbo with Vision to
5
5
manager: nitinme
6
6
ms.service: azure-ai-studio
7
7
ms.topic: how-to
8
-
ms.date: 1/8/2024
8
+
ms.date: 2/26/2024
9
9
ms.reviewer: keli19
10
10
ms.author: lagayhar
11
11
author: lgayhardt
@@ -27,18 +27,51 @@ The prompt flow *Azure OpenAI GPT-4 Turbo with Vision* tool enables you to use y
27
27
28
28
- An [Azure AI hub resource](../../how-to/create-azure-ai-resource.md) with a GPT-4 Turbo with Vision model deployed in one of the regions that support GPT-4 Turbo with Vision: Australia East, Switzerland North, Sweden Central, and West US. When you deploy from your project's **Deployments** page, select: `gpt-4` as the model name and `vision-preview` as the model version.
29
29
30
-
## Connection
30
+
## Build with the Azure OpenAI GPT-4 Turbo with Vision tool
31
31
32
-
Set up connections to provisioned resources in prompt flow.
32
+
1. Create or open a flow in [Azure AI Studio](https://ai.azure.com). For more information, see [Create a flow](../flow-develop.md).
33
+
1. Select **+ More tools** > **Azure OpenAI GPT-4 Turbo with Vision** to add the Azure OpenAI GPT-4 Turbo with Vision tool to your flow.
33
34
34
-
| Type | Name | API KEY | API Type | API Version |
:::image type="content" source="../../media/prompt-flow/azure-openai-gpt-4-vision-tool.png" alt-text="Screenshot of the Azure OpenAI GPT-4 Turbo with Vision tool added to a flow in Azure AI Studio." lightbox="../../media/prompt-flow/azure-openai-gpt-4-vision-tool.png":::
36
+
37
+
1. Select the connection to your Azure OpenAI Service. For example, you can select the **Default_AzureOpenAI** connection. For more information, see [Prerequisites](#prerequisites).
38
+
1. Enter values for the Azure OpenAI GPT-4 Turbo with Vision tool input parameters described [here](#inputs). For example, you can use this example prompt:
39
+
40
+
```jinja
41
+
# system:
42
+
As an AI assistant, your task involves interpreting images and responding to questions about the image.
43
+
Remember to provide accurate answers based on the information present in the image.
44
+
45
+
# user:
46
+
Can you tell me what the image depicts?
47
+

48
+
```
49
+
50
+
1. Select **Validate and parse input** to validate the tool inputs.
51
+
1. Specify an image to analyze in the `image_input` input parameter. For example, you can upload an image or enter the URL of an image to analyze. Otherwise you can paste or drag and drop an image into the tool.
52
+
1. Add more tools to your flow as needed, or select **Run** to run the flow.
53
+
1. The outputs are described [here](#outputs).
54
+
55
+
Here's an example output response:
56
+
57
+
```json
58
+
{
59
+
"system_metrics": {
60
+
"completion_tokens": 96,
61
+
"duration": 4.874329,
62
+
"prompt_tokens": 1157,
63
+
"total_tokens": 1253
64
+
},
65
+
"output": "The image depicts a user interface for Azure's OpenAI GPT-4 service. It is showing a configuration screen where settings related to the AI's behavior can be adjusted, such as the model (GPT-4), temperature, top_p, frequency penalty, etc. There's also an area where users can enter a prompt to generate text, and an option to include an image input for the AI to interpret, suggesting that this particular interface supports both text and image inputs."
0 commit comments