Skip to content

Commit a9641ac

Browse files
committed
Merge branch 'main' into release-proj-wednesday-ga
2 parents d9a59c9 + 9d9db0d commit a9641ac

File tree

109 files changed

+1063
-631
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

109 files changed

+1063
-631
lines changed

.openpublishing.redirection.json

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -10640,11 +10640,6 @@
1064010640
"redirect_url": "/azure/orbital/overview",
1064110641
"redirect_document_id": false
1064210642
},
10643-
{
10644-
"source_path_from_root": "/articles/load-balancer/cross-region-overview.md",
10645-
"redirect_url": "/azure/reliability/reliability-load-balancer",
10646-
"redirect_document_id": false
10647-
},
1064810643
{
1064910644
"source_path_from_root": "/articles/load-balancer/load-balancer-standard-availability-zones.md",
1065010645
"redirect_url": "/azure/reliability/reliability-load-balancer",

articles/ai-services/document-intelligence/concept-read.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ ms.author: lajanuar
3131

3232
> [!NOTE]
3333
>
34-
> For extracting text from external images like labels, street signs, and posters, use the [Azure AI Vision v4.0 preview Read](../../ai-services/Computer-vision/concept-ocr.md) feature optimized for general, non-document images with a performance-enhanced synchronous API that makes it easier to embed OCR in your user experience scenarios.
34+
> For extracting text from external images like labels, street signs, and posters, use the [Azure AI Image Analysis v4.0 Read](../../ai-services/Computer-vision/concept-ocr.md) feature optimized for general, non-document images with a performance-enhanced synchronous API that makes it easier to embed OCR in your user experience scenarios.
3535
>
3636
3737
Document Intelligence Read Optical Character Recognition (OCR) model runs at a higher resolution than Azure AI Vision Read and extracts print and handwritten text from PDF documents and scanned images. It also includes support for extracting text from Microsoft Word, Excel, PowerPoint, and HTML documents. It detects paragraphs, text lines, words, locations, and languages. The Read model is the underlying OCR engine for other Document Intelligence prebuilt models like Layout, General Document, Invoice, Receipt, Identity (ID) document, Health insurance card, W2 in addition to custom models.

articles/ai-services/openai/reference.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@ POST https://{your-resource-name}.openai.azure.com/openai/deployments/{deploymen
6161
- `2023-08-01-preview` (retiring April 2, 2024) [Swagger spec](https://github.com/Azure/azure-rest-api-specs/blob/main/specification/cognitiveservices/data-plane/AzureOpenAI/inference/preview/2023-08-01-preview/inference.json)
6262
- `2023-09-01-preview` (retiring April 2, 2024) [Swagger spec](https://github.com/Azure/azure-rest-api-specs/blob/main/specification/cognitiveservices/data-plane/AzureOpenAI/inference/preview/2023-09-01-preview/inference.json)
6363
- `2023-12-01-preview` [Swagger spec](https://github.com/Azure/azure-rest-api-specs/blob/main/specification/cognitiveservices/data-plane/AzureOpenAI/inference/preview/2023-12-01-preview/inference.json)
64-
- `2024-02-15-preview`[Swagger spec](https://github.com/Azure/azure-rest-api-specs/blob/main/specification/cognitiveservices/data-plane/AzureOpenAI/inference/preview/2024-12-15-preview/inference.json)
64+
- `2024-02-15-preview`[Swagger spec](https://github.com/Azure/azure-rest-api-specs/blob/main/specification/cognitiveservices/data-plane/AzureOpenAI/inference/preview/2024-02-15-preview/inference.json)
6565

6666
**Request body**
6767

articles/ai-services/speech-service/how-to-pronunciation-assessment.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -621,15 +621,15 @@ The following table summarizes which features that locales support. For more spe
621621

622622
| Phoneme alphabet | IPA | SAPI |
623623
|:-----------------|:--------|:-----|
624-
| Phoneme name | `en-US` | `en-US`, `en-GB`, `zh-CN` |
625-
| Syllable group | `en-US` | `en-US`, `en-GB` |
626-
| Spoken phoneme | `en-US` | `en-US`, `en-GB` |
624+
| Phoneme name | `en-US` | `en-US`, `zh-CN` |
625+
| Syllable group | `en-US` | `en-US`|
626+
| Spoken phoneme | `en-US` | `en-US` |
627627

628628
### Syllable groups
629629

630630
Pronunciation assessment can provide syllable-level assessment results. A word is typically pronounced syllable by syllable rather than phoneme by phoneme. Grouping in syllables is more legible and aligned with speaking habits.
631631

632-
Pronunciation assessment supports syllable groups only in `en-US` with IPA and in both `en-US` and `en-GB` with SAPI.
632+
Pronunciation assessment supports syllable groups only in `en-US` with IPA and with SAPI.
633633

634634
The following table compares example phonemes with the corresponding syllables.
635635

@@ -644,7 +644,7 @@ To request syllable-level results along with phonemes, set the granularity [conf
644644

645645
### Phoneme alphabet format
646646

647-
Pronunciation assessment supports phoneme name in `en-US` with IPA and in `en-US`, `en-GB` and `zh-CN` with SAPI.
647+
Pronunciation assessment supports phoneme name in `en-US` with IPA and in `en-US` and `zh-CN` with SAPI.
648648

649649
For locales that support phoneme name, the phoneme name is provided together with the score. Phoneme names help identify which phonemes were pronounced accurately or inaccurately. For other locales, you can only get the phoneme score.
650650

@@ -722,7 +722,7 @@ pronunciationAssessmentConfig?.phonemeAlphabet = "IPA"
722722

723723
With spoken phonemes, you can get confidence scores that indicate how likely the spoken phonemes matched the expected phonemes.
724724

725-
Pronunciation assessment supports spoken phonemes in `en-US` with IPA and in both `en-US` and `en-GB` with SAPI.
725+
Pronunciation assessment supports spoken phonemes in `en-US` with IPA and with SAPI.
726726

727727
For example, to obtain the complete spoken sound for the word `Hello`, you can concatenate the first spoken phoneme for each expected phoneme with the highest confidence score. In the following assessment result, when you speak the word `hello`, the expected IPA phonemes are `h ɛ l oʊ`. However, the actual spoken phonemes are `h ə l oʊ`. You have five possible candidates for each expected phoneme in this example. The assessment result shows that the most likely spoken phoneme was `ə` instead of the expected phoneme `ɛ`. The expected phoneme `ɛ` only received a confidence score of 47. Other potential matches received confidence scores of 52, 17, and 2.
728728

articles/ai-services/speech-service/personal-voice-how-to-use.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,8 @@ Here's example SSML in a request for text to speech with the voice name and the
4747
You can use the SSML via the [Speech SDK](./get-started-text-to-speech.md), [REST API](rest-text-to-speech.md), or [batch synthesis API](batch-synthesis.md).
4848

4949
* **Real-time speech synthesis**: Use the [Speech SDK](./get-started-text-to-speech.md) or [REST API](rest-text-to-speech.md) to convert text to speech.
50+
* When you use Speech SDK, don't set Endpoint Id, just like prebuild voice.
51+
* When you use REST API, please use prebuilt neural voices endpoint.
5052

5153
* **Asynchronous synthesis of long audio**: Use the [batch synthesis API](batch-synthesis.md) (Preview) to asynchronously synthesize text to speech files longer than 10 minutes (for example, audio books or lectures). Unlike synthesis performed via the Speech SDK or Speech to text REST API, responses aren't returned in real-time. The expectation is that requests are sent asynchronously, responses are polled for, and synthesized audio is downloaded when the service makes it available.
5254

Lines changed: 134 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,134 @@
1+
---
2+
title: Process images in prompt flow (preview)
3+
titleSuffix: Azure Machine Learning
4+
description: Learn how to incorporate images into prompt flow.
5+
services: machine-learning
6+
ms.service: machine-learning
7+
ms.subservice: prompt-flow
8+
ms.topic: how-to
9+
ms.author: jinzhong
10+
author: zhongj
11+
ms.reviewer: lagayhar
12+
ms.date: 02/05/2024
13+
---
14+
15+
# Process images in prompt flow (preview)
16+
17+
Multimodal Large Language Models (LLMs), which can process and interpret diverse forms of data inputs, present a powerful tool that can elevate the capabilities of language-only systems to new heights. Among the various data types, images are important for many real-world applications. The incorporation of image data into AI systems provides an essential layer of visual understanding.
18+
19+
In this article, you'll learn:
20+
> [!div class="checklist"]
21+
> - How to use image data in prompt flow
22+
> - How to use built-in GPT-4V tool to analyze image inputs.
23+
> - How to build a chatbot that can process image and text inputs.
24+
> - How to create a batch run using image data.
25+
> - How to consume online endpoint with image data.
26+
27+
> [!IMPORTANT]
28+
> Prompt flow image support is currently in public preview. This preview is provided without a service-level agreement, and is not recommended for production workloads. Certain features might not be supported or might have constrained capabilities.
29+
> For more information, see [Supplemental Terms of Use for Microsoft Azure Previews](https://azure.microsoft.com/support/legal/preview-supplemental-terms/).
30+
31+
## Image type in prompt flow
32+
33+
Prompt flow input and output support Image as a new data type.
34+
35+
To use image data in prompt flow authoring page:
36+
37+
1. Add a flow input, select the data type as **Image**. You can upload, drag and drop an image file, paste an image from clipboard, or specify an image URL or the relative image path in the flow folder.
38+
:::image type="content" source="../media/prompt-flow/how-to-process-image/add-image-type-input.png" alt-text="Screenshot of flow authoring page showing adding flow input as Image type." lightbox = "../media/prompt-flow/how-to-process-image/add-image-type-input.png":::
39+
2. Preview the image. If the image isn't displayed correctly, delete the image and add it again.
40+
:::image type="content" source="../media/prompt-flow/how-to-process-image/flow-input-image-preview.png" alt-text="Screenshot of flow authoring page showing image preview flow input." lightbox = "../media/prompt-flow/how-to-process-image/flow-input-image-preview.png":::
41+
3. You might want to **preprocess the image using Python tool** before feeding it to LLM, for example, you can resize or crop the image to a smaller size.
42+
:::image type="content" source="../media/prompt-flow/how-to-process-image/process-image-using-python.png" alt-text="Screenshot of using python tool to do image preprocessing." lightbox = "../media/prompt-flow/how-to-process-image/process-image-using-python.png":::
43+
> [!IMPORTANT]
44+
> To process image using Python function, you need to use the `Image` class, import it from `promptflow.contracts.multimedia` package. The Image class is used to represent an Image type within prompt flow. It is designed to work with image data in byte format, which is convenient when you need to handle or manipulate the image data directly.
45+
>
46+
> To return the processed image data, you need to use the `Image` class to wrap the image data. Create an `Image` object by providing the image data in bytes and the [MIME type](https://developer.mozilla.org/docs/Web/HTTP/Basics_of_HTTP/MIME_types/Common_types) `mime_type`. The MIME type lets the system understand the format of the image data, or it can be `*` for unknown type.
47+
48+
4. Run the Python node and check the output. In this example, the Python function returns the processed Image object. Select the image output to preview the image.
49+
:::image type="content" source="../media/prompt-flow/how-to-process-image/python-node-image-output.png" alt-text="Screenshot of Python node's image output." lightbox = "../media/prompt-flow/how-to-process-image/python-node-image-output.png":::
50+
If the Image object from Python node is set as the flow output, you can preview the image in the flow output page as well.
51+
52+
## Use GPT-4V tool
53+
54+
Azure OpenAI GPT-4 Turbo with Vision tool and OpenAI GPT-4V are built-in tools in prompt flow that can use OpenAI GPT-4V model to answer questions based on input images. You can find the tool by selecting **More tool** in the flow authoring page.
55+
56+
Add the [Azure OpenAI GPT-4 Turbo with Vision tool](./prompt-flow-tools/azure-open-ai-gpt-4v-tool.md) to the flow. Make sure you have an Azure OpenAI connection, with the availability of GPT-4 vision-preview models.
57+
58+
:::image type="content" source="../media/prompt-flow/how-to-process-image/gpt-4v-tool.png" alt-text="Screenshot of GPT-4V tool." lightbox = "../media/prompt-flow/how-to-process-image/gpt-4v-tool.png":::
59+
60+
The Jinja template for composing prompts in the GPT-4V tool follows a similar structure to the chat API in the LLM tool. To represent an image input within your prompt, you can use the syntax `![image]({{INPUT NAME}})`. Image input can be passed in the `user`, `system` and `assistant` messages.
61+
62+
Once you've composed the prompt, select the **Validate and parse input** button to parse the input placeholders. The image input represented by `![image]({{INPUT NAME}})` will be parsed as image type with the input name as INPUT NAME.
63+
64+
You can assign a value to the image input through the following ways:
65+
66+
- Reference from the flow input of Image type.
67+
- Reference from other node's output of Image type.
68+
- Upload, drag, paste an image, or specify an image URL or the relative image path.
69+
70+
## Build a chatbot to process images
71+
72+
In this section, you'll learn how to build a chatbot that can process image and text inputs.
73+
74+
Assume you want to build a chatbot that can answer any questions about the image and text together. You can achieve this by following the steps below:
75+
76+
1. Create a **chat flow**.
77+
1. Add a **chat input**, select the data type as **"list"**. In the chat box, user can input a mixed sequence of texts and images, and prompt flow service will transform that into a list.
78+
:::image type="content" source="../media/prompt-flow/how-to-process-image/chat-input-definition.png" alt-text="Screenshot of chat input type configuration." lightbox = "../media/prompt-flow/how-to-process-image/chat-input-definition.png":::
79+
1. Add **GPT-4V** tool to the flow.
80+
:::image type="content" source="../media/prompt-flow/how-to-process-image/gpt-4v-tool-in-chatflow.png" alt-text=" Screenshot of GPT-4V tool in chat flow." lightbox = "../media/prompt-flow/how-to-process-image/gpt-4v-tool-in-chatflow.png":::
81+
82+
In this example, `{{question}}` refers to the chat input, which is a list of texts and images.
83+
1. (Optional) You can add any custom logic to the flow to process the GPT-4V output. For example, you can add content safety tool to detect if the answer contains any inappropriate content, and return a final answer to the user.
84+
:::image type="content" source="../media/prompt-flow/how-to-process-image/chat-flow-postprocess.png" alt-text="Screenshot of processing gpt-4v output with content safety tool." lightbox = "../media/prompt-flow/how-to-process-image/chat-flow-postprocess.png":::
85+
1. Now you can **test the chatbot**. Open the chat window, and input any questions with images. The chatbot will answer the questions based on the image and text inputs. The chat input value is automatically backfilled from the input in the chat window. You can find the texts with images in the chat box which is translated into a list of texts and images.
86+
:::image type="content" source="../media/prompt-flow/how-to-process-image/chatbot-test.png" alt-text="Screenshot of chatbot interaction with images." lightbox = "../media/prompt-flow/how-to-process-image/chatbot-test.png":::
87+
88+
> [!NOTE]
89+
> To enable your chatbot to respond with rich text and images, make the chat output `list` type. The list should consist of strings (for text) and prompt flow Image objects (for images) in custom order.
90+
> :::image type="content" source="../media/prompt-flow/how-to-process-image/chatbot-image-output.png" alt-text="Screenshot of chatbot responding with rich text and images." lightbox = "../media/prompt-flow/how-to-process-image/chatbot-image-output.png":::
91+
92+
## Create a batch run using image data
93+
94+
A batch run allows you to test the flow with an extensive dataset. There are three methods to represent image data: through an image file, a public image URL, or a Base64 string.
95+
96+
- **Image file:** To test with image files in batch run, you need to prepare a **data folder**. This folder should contain a batch run entry file in `jsonl` format located in the root directory, along with all image files stored in the same folder or subfolders.
97+
:::image type="content" source="../media/prompt-flow/how-to-process-image/batch-run-sample-data.png" alt-text="Screenshot of batch run sample data with images." lightbox = "../media/prompt-flow/how-to-process-image/batch-run-sample-data.png":::
98+
In the entry file, you should use the format: `{"data:<mime type>;path": "<image relative path>"}` to reference each image file. For example, `{"data:image/png;path": "./images/1.png"}`.
99+
- **Public image URL:** You can also reference the image URL in the entry file using this format: `{"data:<mime type>;url": "<image URL>"}`. For example, `{"data:image/png;url": "https://www.example.com/images/1.png"}`.
100+
- **Base64 string:** A Base64 string can be referenced in the entry file using this format: `{"data:<mime type>;base64": "<base64 string>"}`. For example, `{"data:image/png;base64": "iVBORw0KGgoAAAANSUhEUgAAAGQAAABLAQMAAAC81rD0AAAABGdBTUEAALGPC/xhBQAAACBjSFJNAAB6JgAAgIQAAPoAAACA6AAAdTAAAOpgAAA6mAAAF3CculE8AAAABlBMVEUAAP7////DYP5JAAAAAWJLR0QB/wIt3gAAAAlwSFlzAAALEgAACxIB0t1+/AAAAAd0SU1FB+QIGBcKN7/nP/UAAAASSURBVDjLY2AYBaNgFIwCdAAABBoAAaNglfsAAAAZdEVYdGNvbW1lbnQAQ3JlYXRlZCB3aXRoIEdJTVDnr0DLAAAAJXRFWHRkYXRlOmNyZWF0ZQAyMDIwLTA4LTI0VDIzOjEwOjU1KzAzOjAwkHdeuQAAACV0RVh0ZGF0ZTptb2RpZnkAMjAyMC0wOC0yNFQyMzoxMDo1NSswMzowMOEq5gUAAAAASUVORK5CYII="}`.
101+
102+
In summary, prompt flow uses a unique dictionary format to represent an image, which is `{"data:<mime type>;<representation>": "<value>"}`. Here, `<mime type>` refers to HTML standard [MIME](https://developer.mozilla.org/docs/Web/HTTP/Basics_of_HTTP/MIME_types/Common_types) image types, and `<representation>` refers to the supported image representations: `path`,`url` and `base64`.
103+
104+
### Create a batch run
105+
106+
In flow authoring page, select the **Evaluate->Custom evaluation** button to initiate a batch run. In Batch run settings, select a dataset, which can be either a folder (containing the entry file and image files) or a file (containing only the entry file). You can preview the entry file and perform input mapping to align the columns in the entry file with the flow inputs.
107+
:::image type="content" source="../media/prompt-flow/how-to-process-image/batch-run-data-selection.png" alt-text="Screenshot of batch run data selection." lightbox = "../media/prompt-flow/how-to-process-image/batch-run-data-selection.png":::
108+
109+
### View batch run results
110+
111+
You can check the batch run outputs in the run detail page. Select the image object in the output table to easily preview the image.
112+
113+
:::image type="content" source="../media/prompt-flow/how-to-process-image/batch-run-output.png" alt-text="Screenshot of batch run output." lightbox = "../media/prompt-flow/how-to-process-image/batch-run-output.png":::
114+
115+
If the batch run outputs contain images, you can check the **flow_outputs dataset** with the output jsonl file and the output images.
116+
117+
:::image type="content" source="../media/prompt-flow/how-to-process-image/explore-run-outputs.png" alt-text="Screenshot of batch run flow output." lightbox = "../media/prompt-flow/how-to-process-image/explore-run-outputs.png":::
118+
119+
## Consume online endpoint with image data
120+
121+
You can [deploy a flow to an online endpoint for real-time inference](./flow-deploy.md).
122+
123+
Currently the **Test** tab in the deployment detail page does not support image inputs or outputs.
124+
125+
For now, you can test the endpoint by sending request including image inputs.
126+
127+
To consume the online endpoint with image input, you should represent the image by using the format `{"data:<mime type>;<representation>": "<value>"}`. In this case, `<representation>` can either be `url` or `base64`.
128+
129+
If the flow generates image output, it will be returned with `base64` format, for example, `{"data:<mime type>;base64": "<base64 string>"}`.
130+
131+
## Next steps
132+
133+
- [Iterate and optimize your flow by tuning prompts using variants](./flow-tune-prompts-using-variants.md)
134+
- [Deploy a flow](./flow-deploy.md)
39.1 KB
Loading
144 KB
Loading
182 KB
Loading
66.8 KB
Loading

0 commit comments

Comments
 (0)