Skip to content

Commit 66adf79

Browse files
authored
Merge pull request #4753 from MicrosoftDocs/release-build-azure-search
Release build azure search
2 parents 0c19a27 + 468c8d3 commit 66adf79

File tree

91 files changed

+7879
-505
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

91 files changed

+7879
-505
lines changed
Lines changed: 289 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,289 @@
1+
---
2+
title: Utilize the content generation capabilities of language models as part of content ingestion pipeline
3+
titleSuffix: Azure AI Search
4+
description: Use language models to caption your images and facilitate an image search through your data.
5+
author: amitkalay
6+
ms.author: amitkalay
7+
ms.service: azure-ai-search
8+
ms.topic: how-to
9+
ms.date: 05/05/2025
10+
ms.custom:
11+
- devx-track-csharp
12+
- build-2025
13+
---
14+
15+
# Generate captions for images in another language
16+
17+
In this article, learn how to generate captions using AI enrichment and a skillset. Images often contain useful information that's relevant in search scenarios. You can [vectorize images](search-get-started-portal-image-search.md) to represent visual content in your search index. Or, you can use [AI enrichment and skillsets](cognitive-search-concept-intro.md) to create and extract searchable *text* from images.
18+
19+
The GenAI Prompt skill (preview) generates a description of each image in your data source and the indexer pushes that description into a search index. To view the descriptions, you can run a query that includes them in the response.
20+
21+
## Prerequisites
22+
23+
To work with image content in a skillset, you need:
24+
25+
+ A supported data source
26+
+ Files or blobs containing images
27+
+ Read access on the supported data source. This article uses key-based authentication, but indexers can also connect using the search service identity and Microsoft Entra ID authentication. For role-based access control, assign roles on the data source to allow read access by the service identity. If you're testing on a local development machine, make sure you also have read access on the supported data source.
28+
+ A search indexer, configured for image actions
29+
+ A skillset with the new custom genAI prompt skill
30+
+ A search index with fields to receive the verbalized text output, plus output field mappings in the indexer that establish association
31+
32+
Optionally, you can define projections to accept image-analyzed output into a [knowledge store](knowledge-store-concept-intro.md) for data mining scenarios.
33+
34+
<a name="get-normalized-images"></a>
35+
36+
## Configure indexers for image processing
37+
38+
After the source files are set up, enable image normalization by setting the `imageAction` parameter in indexer configuration. Image normalization helps make images more uniform for downstream processing. Image normalization includes the following operations:
39+
40+
+ Large images are resized to a maximum height and width to make them uniform.
41+
+ For images that have metadata that specifies orientation, image rotation is adjusted for vertical loading.
42+
43+
Note that enabling `imageAction` (setting this parameter to other than `none`) will incur an additional charge for image extraction according to [Azure AI Search pricing](https://azure.microsoft.com/pricing/details/search/).
44+
45+
1. [Create or update an indexer](/rest/api/searchservice/indexers/create-or-update) to set the configuration properties:
46+
47+
```json
48+
{
49+
"parameters":
50+
{
51+
"configuration":
52+
{
53+
"dataToExtract": "contentAndMetadata",
54+
"parsingMode": "default",
55+
"imageAction": "generateNormalizedImages"
56+
}
57+
}
58+
}
59+
```
60+
61+
1. Set `dataToExtract` to `contentAndMetadata` (required).
62+
63+
1. Verify that the `parsingMode` is set to *default* (required).
64+
65+
This parameter determines the granularity of search documents created in the index. The default mode sets up a one-to-one correspondence so that one blob results in one search document. If documents are large, or if skills require smaller chunks of text, you can add the Text Split skill that subdivides a document into paging for processing purposes. But for search scenarios, one blob per document is required if enrichment includes image processing.
66+
67+
1. Set `imageAction` to enable the `normalized_images` node in an enrichment tree (required):
68+
69+
+ `generateNormalizedImages` to generate an array of normalized images as part of document cracking.
70+
71+
+ `generateNormalizedImagePerPage` (applies to PDF only) to generate an array of normalized images where each page in the PDF is rendered to one output image. For non-PDF files, the behavior of this parameter is similar as if you had set `generateNormalizedImages`. However, setting `generateNormalizedImagePerPage` can make indexing operation less performant by design (especially for large documents) since several images would have to be generated.
72+
73+
1. Optionally, adjust the width or height of the generated normalized images:
74+
75+
+ `normalizedImageMaxWidth` in pixels. Default is 2,000. Maximum value is 10,000.
76+
77+
+ `normalizedImageMaxHeight` in pixels. Default is 2,000. Maximum value is 10,000.
78+
79+
### About normalized images
80+
81+
When `imageAction` is set to a value other than *none*, the new `normalized_images` field contains an array of images. Each image is a complex type that has the following members:
82+
83+
| Image member | Description |
84+
|--------------------|-----------------------------------------|
85+
| data | BASE64 encoded string of the normalized image in JPEG format. |
86+
| width | Width of the normalized image in pixels. |
87+
| height | Height of the normalized image in pixels. |
88+
| originalWidth | The original width of the image before normalization. |
89+
| originalHeight | The original height of the image before normalization. |
90+
| rotationFromOriginal | Counter-clockwise rotation in degrees that occurred to create the normalized image. A value between 0 degrees and 360 degrees. This step reads the metadata from the image that is generated by a camera or scanner. Usually a multiple of 90 degrees. |
91+
| contentOffset | The character offset within the content field where the image was extracted from. This field is only applicable for files with embedded images. The `contentOffset` for images extracted from PDF documents is always at the end of the text on the page it was extracted from in the document. This means images appear after all text on that page, regardless of the original location of the image in the page. |
92+
| pageNumber | If the image was extracted or rendered from a PDF, this field contains the page number in the PDF it was extracted or rendered from, starting from 1. If the image isn't from a PDF, this field is 0. |
93+
94+
Sample value of `normalized_images`:
95+
96+
```json
97+
[
98+
{
99+
"data": "BASE64 ENCODED STRING OF A JPEG IMAGE",
100+
"width": 500,
101+
"height": 300,
102+
"originalWidth": 5000,
103+
"originalHeight": 3000,
104+
"rotationFromOriginal": 90,
105+
"contentOffset": 500,
106+
"pageNumber": 2
107+
}
108+
]
109+
```
110+
111+
## Define skillsets for image processing
112+
113+
This section supplements the [skill reference](cognitive-search-defining-skillset.md) articles by providing context for working with skill inputs, outputs, and patterns, as they relate to image processing.
114+
115+
+ Create or update a skillset to add skills.
116+
117+
Once the basic framework of your skillset is created and Azure AI services is configured, you can focus on each individual image skill, defining inputs and source context, and mapping outputs to fields in either an index or knowledge store.
118+
119+
> [!NOTE]
120+
> For an example skillset that combines image processing with downstream natural language processing, see [REST Tutorial: Use REST and AI to generate searchable content from Azure blobs](cognitive-search-tutorial-blob.md). It shows how to feed skill imaging output into entity recognition and key phrase extraction.
121+
122+
### Example inputs for image processing
123+
124+
As noted, images are extracted during document cracking and then normalized as a preliminary step. The normalized images are the inputs to any image processing skill, and are always represented in an enriched document tree in either one of two ways:
125+
126+
+ `/document/normalized_images/*` is for documents that are processed whole.
127+
128+
```json
129+
{
130+
"@odata.type": "#Microsoft.Skills.Custom.ChatCompletionSkill",
131+
"context": "/document/normalized_images/*",
132+
"uri": "https://contoso.openai.azure.com/openai/deployments/contoso-gpt-4o/chat/completions?api-version=2025-01-01-preview",
133+
"timeout": "PT1M",
134+
"apiKey": "<YOUR-API-KEY here>"
135+
"inputs": [
136+
{
137+
"name": "image",
138+
"source": "/document/normalized_images/*/data"
139+
},
140+
{
141+
"name": "systemMessage",
142+
"source": "='You are a useful artificial intelligence assistant that helps people.'"
143+
},
144+
{
145+
"name": "userMessage",
146+
"source": "='Describe what you see in this image in 20 words or less in Spanish.'"
147+
}
148+
],
149+
"outputs": [
150+
{
151+
"name": "response",
152+
"targetName": "captionedImage"
153+
}
154+
]
155+
},
156+
```
157+
158+
### Example using json schema responses with text inputs
159+
160+
This example illustrates how you can use structured outputs for language models. Note that this capability is mainly supported mostly by OpenAI language models, although that may change in the future.
161+
162+
```json
163+
{
164+
"@odata.type": "#Microsoft.Skills.Custom.ChatCompletionSkill",
165+
"context": "/document/content",
166+
"uri": "https://contoso.openai.azure.com/openai/deployments/contoso-gpt-4o/chat/completions?api-version=2025-01-01-preview",
167+
"timeout": "PT1M",
168+
"apiKey": "<YOUR-API-KEY here>"
169+
"inputs": [
170+
{
171+
"name": "systemMessage",
172+
"source": "='You are a useful artificial intelligence assistant that helps people.'"
173+
},
174+
{
175+
"name": "userMessage",
176+
"source": "='How many languages are there in the world and what are they?'"
177+
}
178+
],
179+
"response_format": {
180+
"type": "json_schema",
181+
"json_schema": {
182+
"name": "structured_output",
183+
"strict": true,
184+
"schema": {
185+
"type": "object",
186+
"properties": {
187+
"total": { "type": "number" },
188+
"languages": {
189+
"type": "array",
190+
"items": {
191+
"type": "string"
192+
}
193+
}
194+
},
195+
"required": ["total", "languages"],
196+
"additionalProperties": false
197+
}
198+
},
199+
"outputs": [
200+
{
201+
"name": "response",
202+
"targetName": "responseJsonForLanguages"
203+
}
204+
]
205+
},
206+
```
207+
208+
<a name="output-field-mappings"></a>
209+
210+
## Map outputs to search fields
211+
212+
Output text is represented as nodes in an internal enriched document tree, and each node must be mapped to fields in a search index, or to projections in a knowledge store, to make the content available in your app.
213+
214+
1. [Create or update a search index](/rest/api/searchservice/indexes/create-or-update) to add fields to accept the skill outputs.
215+
216+
In the following fields collection example, *content* is blob content. *Metadata_storage_name* contains the name of the file (set `retrievable` to *true*). *Metadata_storage_path* is the unique path of the blob and is the default document key. *Merged_content* is output from Text Merge (useful when images are embedded).
217+
218+
*captionedImage* is the skill outputs and must be a string-type in order to the capture all of the language model output in the search index.
219+
220+
```json
221+
"fields": [
222+
{
223+
"name": "content",
224+
"type": "Edm.String",
225+
"filterable": false,
226+
"retrievable": true,
227+
"searchable": true,
228+
"sortable": false
229+
},
230+
{
231+
"name": "metadata_storage_name",
232+
"type": "Edm.String",
233+
"filterable": true,
234+
"retrievable": true,
235+
"searchable": true,
236+
"sortable": false
237+
},
238+
{
239+
"name": "metadata_storage_path",
240+
"type": "Edm.String",
241+
"filterable": false,
242+
"key": true,
243+
"retrievable": true,
244+
"searchable": false,
245+
"sortable": false
246+
},
247+
{
248+
"name": "captioned_image",
249+
"type": "Edm.String",
250+
"filterable": false,
251+
"retrievable": true,
252+
"searchable": true,
253+
"sortable": false
254+
}
255+
]
256+
```
257+
258+
1. [Update the indexer](/rest/api/searchservice/indexers/create-or-update) to map skillset output (nodes in an enrichment tree) to index fields.
259+
260+
Enriched documents are internal. To externalize the nodes in an enriched document tree, set up an output field mapping that specifies which index field receives node content. Enriched data is accessed by your app through an index field. The following example shows a *text* node (OCR output) in an enriched document that's mapped to a *text* field in a search index.
261+
262+
```json
263+
"outputFieldMappings": [
264+
{
265+
"sourceFieldName": "/document/normalized_images/*/captionedImage",
266+
"targetFieldName": "captioned_image"
267+
}
268+
]
269+
```
270+
271+
1. Run the indexer to invoke source document retrieval, image processing via language model captions, and indexing.
272+
273+
### Verify results
274+
275+
Run a query against the index to check the results of image processing. Use [Search Explorer](search-explorer.md) as a search client, or any tool that sends HTTP requests. The following query selects fields that contain the output of image processing.
276+
277+
```http
278+
POST /indexes/[index name]/docs/search?api-version=[api-version]
279+
{
280+
"search": "A cat in a picture",
281+
"select": "metadata_storage_name, captioned_image"
282+
}
283+
```
284+
285+
## Related content
286+
+ [Create indexer (REST)](/rest/api/searchservice/indexers/create)
287+
+ [GenAI Prompt skill](cognitive-search-skill-genai-prompt.md)
288+
+ [How to create a skillset](cognitive-search-defining-skillset.md)
289+
+ [Map enriched output to fields](cognitive-search-output-field-mapping.md)

articles/search/cognitive-search-aml-skill.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ ms.custom:
1010
- ignite-2023
1111
- build-2024
1212
ms.topic: reference
13-
ms.date: 04/18/2025
13+
ms.date: 05/08/2025
1414
---
1515

1616
# AML skill in an Azure AI Search enrichment pipeline
@@ -37,7 +37,7 @@ Starting in 2024-05-01-preview REST API and in the Azure portal (which also targ
3737

3838
During indexing, the **AML** skill can connect to the model catalog to generate vectors for the index. At query time, queries can use a vectorizer to connect to the same model to vectorize text strings for a vector query. In this workflow, the **AML** skill and the model catalog vectorizer should be used together so that you're using the same embedding model for both indexing and queries. See [Use embedding models from Azure AI Foundry model catalog](vector-search-integrated-vectorization-ai-studio.md) for details and for a list of the [supported embedding models](vector-search-integrated-vectorization-ai-studio.md#supported-embedding-models).
3939

40-
We recommend using the [**Import and vectorize data**](search-get-started-portal-import-vectors.md) wizard to generate a skillset that includes an AML skill for deployed embedding models on Azure AI Foundry. AML skill definition for inputs, outputs, and mappings are generated by the wizard, which gives you an easy way to test a model before writing any code.
40+
We recommend using the [**Import and vectorize data wizard**](search-get-started-portal-import-vectors.md) to generate a skillset that includes an AML skill for deployed embedding models on Azure AI Foundry. AML skill definition for inputs, outputs, and mappings are generated by the wizard, which gives you an easy way to test a model before writing any code.
4141

4242
## Prerequisites
4343

articles/search/cognitive-search-concept-image-scenarios.md

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ author: HeidiSteen
66
ms.author: heidist
77
ms.service: azure-ai-search
88
ms.topic: how-to
9-
ms.date: 04/28/2025
9+
ms.date: 05/01/2025
1010
ms.custom:
1111
- devx-track-csharp
1212
- ignite-2023
@@ -100,7 +100,7 @@ Metadata adjustments are captured in a complex type created for each image. You
100100

101101
The default of 2,000 pixels for the normalized images maximum width and height is based on the maximum sizes supported by the [OCR skill](cognitive-search-skill-ocr.md) and the [image analysis skill](cognitive-search-skill-image-analysis.md). The [OCR skill](cognitive-search-skill-ocr.md) supports a maximum width and height of 4,200 for non-English languages, and 10,000 for English. If you increase the maximum limits, processing could fail on larger images depending on your skillset definition and the language of the documents.
102102

103-
+ Optionally, [set file type criteria](search-blob-storage-integration.md#PartsOfBlobToIndex) if the workload targets a specific file type. Blob indexer configuration includes file inclusion and exclusion settings. You can filter out files you don't want.
103+
1. Optionally, [set file type criteria](search-blob-storage-integration.md#PartsOfBlobToIndex) if the workload targets a specific file type. Blob indexer configuration includes file inclusion and exclusion settings. You can filter out files you don't want.
104104

105105
```json
106106
{
@@ -127,6 +127,7 @@ When `imageAction` is set to a value other than *none*, the new `normalized_imag
127127
| rotationFromOriginal | Counter-clockwise rotation in degrees that occurred to create the normalized image. A value between 0 degrees and 360 degrees. This step reads the metadata from the image that is generated by a camera or scanner. Usually a multiple of 90 degrees. |
128128
| contentOffset | The character offset within the content field where the image was extracted from. This field is only applicable for files with embedded images. The `contentOffset` for images extracted from PDF documents is always at the end of the text on the page it was extracted from in the document. This means images appear after all text on that page, regardless of the original location of the image in the page. |
129129
| pageNumber | If the image was extracted or rendered from a PDF, this field contains the page number in the PDF it was extracted or rendered from, starting from 1. If the image isn't from a PDF, this field is 0. |
130+
| boundingPolygon | If the image was extracted or rendered from a PDF, this field contains the coordinates of the bounding polygon that encloses the image on the page. The polygon is represented as a nested array of points, where each point has x and y coordinates normalized to the dimensions of the page. This only applies to images extracted using `imageAction: generateNormalizedImages`. |
130131

131132
Sample value of `normalized_images`:
132133

@@ -140,11 +141,16 @@ When `imageAction` is set to a value other than *none*, the new `normalized_imag
140141
"originalHeight": 3000,
141142
"rotationFromOriginal": 90,
142143
"contentOffset": 500,
143-
"pageNumber": 2
144+
"pageNumber": 2,
145+
"boundingPolygon": "[[{\"x\":0.0,\"y\":0.0},{\"x\":500.0,\"y\":0.0},{\"x\":0.0,\"y\":300.0},{\"x\":500.0,\"y\":300.0}]]"
144146
}
145147
]
146148
```
147149

150+
> [!NOTE]
151+
> Bounding polygon data is represented as a string containing a double-nested, JSON-encoded array of polygons. Each polygon is an array of points, where each point has x and y coordinates. Coordinates are relative to the PDF page, with the origin (0, 0) at the top-left corner.
152+
> Currently, images extracted using `imageAction: generateNormalizedImages` will always produce a single polygon, but the double-nested structure is maintained for consistency with the Document Layout skill, which supports multiple polygons.
153+
148154
## Define skillsets for image processing
149155

150156
This section supplements the [skill reference](cognitive-search-predefined-skills.md) articles by providing context for working with skill inputs, outputs, and patterns, as they relate to image processing.

articles/search/cognitive-search-concept-troubleshooting.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,10 +8,10 @@ ms.service: azure-ai-search
88
ms.custom:
99
- ignite-2023
1010
ms.topic: conceptual
11-
ms.date: 12/10/2024
11+
ms.date: 05/08/2025
1212
---
1313

14-
# Tips for AI enrichment in Azure AI Search
14+
# Tips for AI enrichment in Azure AI Search
1515

1616
This article contains tips to help you get started with AI enrichment and skillsets used during indexing.
1717

0 commit comments

Comments
 (0)