MicrosoftDocs
diff --git a/‎articles/search/chat-completion-skill-example-usage.md‎
Lines changed: 289 additions & 0 deletions b/‎articles/search/chat-completion-skill-example-usage.md‎
Lines changed: 289 additions & 0 deletions
diff --git a/‎articles/search/cognitive-search-aml-skill.md‎
Lines changed: 2 additions & 2 deletions b/‎articles/search/cognitive-search-aml-skill.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎articles/search/cognitive-search-concept-image-scenarios.md‎
Lines changed: 9 additions & 3 deletions b/‎articles/search/cognitive-search-concept-image-scenarios.md‎
Lines changed: 9 additions & 3 deletions
diff --git a/‎articles/search/cognitive-search-concept-troubleshooting.md‎
Lines changed: 2 additions & 2 deletions b/‎articles/search/cognitive-search-concept-troubleshooting.md‎
Lines changed: 2 additions & 2 deletions
@@ -0,0 +1,289 @@
+---
+title: Utilize the content generation capabilities of language models as part of content ingestion pipeline
+titleSuffix: Azure AI Search
+description: Use language models to caption your images and facilitate an image search through your data.
+author: amitkalay
+ms.author: amitkalay
+ms.service: azure-ai-search
+ms.topic: how-to
+ms.date: 05/05/2025
+ms.custom:
+  - devx-track-csharp
+  - build-2025
+---
+
+# Generate captions for images in another language
+
+In this article, learn how to generate captions using AI enrichment and a skillset. Images often contain useful information that's relevant in search scenarios. You can [vectorize images](search-get-started-portal-image-search.md) to represent visual content in your search index. Or, you can use [AI enrichment and skillsets](cognitive-search-concept-intro.md) to create and extract searchable *text* from images.
+
+The GenAI Prompt skill (preview) generates a description of each image in your data source and the indexer pushes that description into a search index. To view the descriptions, you can run a query that includes them in the response.
+
+## Prerequisites
+
+To work with image content in a skillset, you need:
+
++ A supported data source
++ Files or blobs containing images
++ Read access on the supported data source. This article uses key-based authentication, but indexers can also connect using the search service identity and Microsoft Entra ID authentication. For role-based access control, assign roles on the data source to allow read access by the service identity. If you're testing on a local development machine, make sure you also have read access on the supported data source.
++ A search indexer, configured for image actions
++ A skillset with the new custom genAI prompt skill
++ A search index with fields to receive the verbalized text output, plus output field mappings in the indexer that establish association
+
+Optionally, you can define projections to accept image-analyzed output into a [knowledge store](knowledge-store-concept-intro.md) for data mining scenarios.
+
+<a name="get-normalized-images"></a>
+
+## Configure indexers for image processing
+
+After the source files are set up, enable image normalization by setting the `imageAction` parameter in indexer configuration. Image normalization helps make images more uniform for downstream processing. Image normalization includes the following operations:
+
++ Large images are resized to a maximum height and width to make them uniform.
++ For images that have metadata that specifies orientation, image rotation is adjusted for vertical loading.
+
+Note that enabling `imageAction` (setting this parameter to other than `none`) will incur an additional charge for image extraction according to [Azure AI Search pricing](https://azure.microsoft.com/pricing/details/search/).
+
+1. [Create or update an indexer](/rest/api/searchservice/indexers/create-or-update) to set the configuration properties:
+
+    ```json
+    {
+      "parameters":
+      {
+        "configuration": 
+        {
+           "dataToExtract": "contentAndMetadata",
+           "parsingMode": "default",
+           "imageAction": "generateNormalizedImages"
+        }
+      }
+    }
+    ```
+
+1. Set `dataToExtract` to `contentAndMetadata` (required).
+
+1. Verify that the `parsingMode` is set to *default* (required).
+
+   This parameter determines the granularity of search documents created in the index. The default mode sets up a one-to-one correspondence so that one blob results in one search document. If documents are large, or if skills require smaller chunks of text, you can add the Text Split skill that subdivides a document into paging for processing purposes. But for search scenarios, one blob per document is required if enrichment includes image processing.
+
+1. Set `imageAction` to enable the `normalized_images` node in an enrichment tree (required):
+
+   + `generateNormalizedImages` to generate an array of normalized images as part of document cracking.
+
+   + `generateNormalizedImagePerPage` (applies to PDF only) to generate an array of normalized images where each page in the PDF is rendered to one output image. For non-PDF files, the behavior of this parameter is similar as if you had set `generateNormalizedImages`. However, setting `generateNormalizedImagePerPage` can make indexing operation less performant by design (especially for large documents) since several images would have to be generated.
+
+1. Optionally, adjust the width or height of the generated normalized images:
+
+   + `normalizedImageMaxWidth` in pixels. Default is 2,000. Maximum value is 10,000.
+
+   + `normalizedImageMaxHeight` in pixels. Default is 2,000. Maximum value is 10,000. 
+
+### About normalized images
+
+When `imageAction` is set to a value other than *none*, the new `normalized_images` field contains an array of images. Each image is a complex type that has the following members:
+
+| Image member       | Description                             |
+|--------------------|-----------------------------------------|
+| data               | BASE64 encoded string of the normalized image in JPEG format.   |
+| width              | Width of the normalized image in pixels. |
+| height             | Height of the normalized image in pixels. |
+| originalWidth      | The original width of the image before normalization. |
+| originalHeight      | The original height of the image before normalization. |
+| rotationFromOriginal |  Counter-clockwise rotation in degrees that occurred to create the normalized image. A value between 0 degrees and 360 degrees. This step reads the metadata from the image that is generated by a camera or scanner. Usually a multiple of 90 degrees. |
+| contentOffset | The character offset within the content field where the image was extracted from. This field is only applicable for files with embedded images. The `contentOffset` for images extracted from PDF documents is always at the end of the text on the page it was extracted from in the document. This means images appear after all text on that page, regardless of the original location of the image in the page. |
+| pageNumber | If the image was extracted or rendered from a PDF, this field contains the page number in the PDF it was extracted or rendered from, starting from 1. If the image isn't from a PDF, this field is 0.  |
+
+ Sample value of `normalized_images`:
+
+```json
+[
+  {
+    "data": "BASE64 ENCODED STRING OF A JPEG IMAGE",
+    "width": 500,
+    "height": 300,
+    "originalWidth": 5000,  
+    "originalHeight": 3000,
+    "rotationFromOriginal": 90,
+    "contentOffset": 500,
+    "pageNumber": 2
+  }
+]
+```
+
+## Define skillsets for image processing
+
+This section supplements the [skill reference](cognitive-search-defining-skillset.md) articles by providing context for working with skill inputs, outputs, and patterns, as they relate to image processing.
+
++ Create or update a skillset to add skills.
+
+Once the basic framework of your skillset is created and Azure AI services is configured, you can focus on each individual image skill, defining inputs and source context, and mapping outputs to fields in either an index or knowledge store.
+
+> [!NOTE]
+> For an example skillset that combines image processing with downstream natural language processing, see [REST Tutorial: Use REST and AI to generate searchable content from Azure blobs](cognitive-search-tutorial-blob.md). It shows how to feed skill imaging output into entity recognition and key phrase extraction.
+
+### Example inputs for image processing
+
+As noted, images are extracted during document cracking and then normalized as a preliminary step. The normalized images are the inputs to any image processing skill, and are always represented in an enriched document tree in either one of two ways:
+
++ `/document/normalized_images/*` is for documents that are processed whole.
+
+```json
+    {
+      "@odata.type": "#Microsoft.Skills.Custom.ChatCompletionSkill",
+      "context": "/document/normalized_images/*",
+      "uri": "https://contoso.openai.azure.com/openai/deployments/contoso-gpt-4o/chat/completions?api-version=2025-01-01-preview",
+      "timeout": "PT1M",
+      "apiKey": "<YOUR-API-KEY here>"
+      "inputs": [
+        {
+          "name": "image",
+          "source": "/document/normalized_images/*/data"
+        },
+        {
+          "name": "systemMessage",
+          "source": "='You are a useful artificial intelligence assistant that helps people.'"
+        },
+        {
+          "name": "userMessage",
+          "source": "='Describe what you see in this image in 20 words or less in Spanish.'"
+        }
+      ],
+      "outputs": [ 
+          {
+            "name": "response",
+            "targetName": "captionedImage"
+          } 
+        ]
+    },
+```
+
+### Example using json schema responses with text inputs
+
+This example illustrates how you can use structured outputs for language models. Note that this capability is mainly supported mostly by OpenAI language models, although that may change in the future.
+
+```json
+    {
+      "@odata.type": "#Microsoft.Skills.Custom.ChatCompletionSkill",
+      "context": "/document/content",
+      "uri": "https://contoso.openai.azure.com/openai/deployments/contoso-gpt-4o/chat/completions?api-version=2025-01-01-preview",
+      "timeout": "PT1M",
+      "apiKey": "<YOUR-API-KEY here>"
+      "inputs": [
+        {
+          "name": "systemMessage",
+          "source": "='You are a useful artificial intelligence assistant that helps people.'"
+        },
+        {
+          "name": "userMessage",
+          "source": "='How many languages are there in the world and what are they?'"
+        }
+      ],
+      "response_format": { 
+        "type": "json_schema",
+        "json_schema": {
+            "name": "structured_output",
+            "strict": true,
+            "schema": {
+                "type": "object",
+                "properties": {
+                    "total": { "type": "number" },
+                    "languages": {
+                        "type": "array",
+                        "items": {
+                            "type": "string"
+                        }
+                    }
+                },
+                "required": ["total", "languages"],
+                "additionalProperties": false
+                }
+      },
+      "outputs": [ 
+          {
+            "name": "response",
+            "targetName": "responseJsonForLanguages"
+          } 
+        ]
+    },
+```
+
+<a name="output-field-mappings"></a>
+
+## Map outputs to search fields
+
+Output text is represented as nodes in an internal enriched document tree, and each node must be mapped to fields in a search index, or to projections in a knowledge store, to make the content available in your app. 
+
+1. [Create or update a search index](/rest/api/searchservice/indexes/create-or-update) to add fields to accept the skill outputs. 
+
+   In the following fields collection example, *content* is blob content. *Metadata_storage_name* contains the name of the file (set `retrievable` to *true*). *Metadata_storage_path* is the unique path of the blob and is the default document key. *Merged_content* is output from Text Merge (useful when images are embedded). 
+
+    *captionedImage* is the skill outputs and must be a string-type in order to the capture all of the language model output in the search index.
+
+    ```json
+      "fields": [
+        {
+          "name": "content",
+          "type": "Edm.String",
+          "filterable": false,
+          "retrievable": true,
+          "searchable": true,
+          "sortable": false
+        },
+        {
+          "name": "metadata_storage_name",
+          "type": "Edm.String",
+          "filterable": true,
+          "retrievable": true,
+          "searchable": true,
+          "sortable": false
+        },
+        {
+          "name": "metadata_storage_path",
+          "type": "Edm.String",
+          "filterable": false,
+          "key": true,
+          "retrievable": true,
+          "searchable": false,
+          "sortable": false
+        },
+        {
+          "name": "captioned_image",
+          "type": "Edm.String",
+          "filterable": false,
+          "retrievable": true,
+          "searchable": true,
+          "sortable": false
+        }
+      ]
+    ```
+
+1. [Update the indexer](/rest/api/searchservice/indexers/create-or-update) to map skillset output (nodes in an enrichment tree) to index fields.
+
+   Enriched documents are internal. To externalize the nodes in an enriched document tree, set up an output field mapping that specifies which index field receives node content. Enriched data is accessed by your app through an index field. The following example shows a *text* node (OCR output) in an enriched document that's mapped to a *text* field in a search index.
+
+    ```json
+      "outputFieldMappings": [
+        {
+          "sourceFieldName": "/document/normalized_images/*/captionedImage",
+          "targetFieldName": "captioned_image"
+        }
+      ]
+    ```
+
+1. Run the indexer to invoke source document retrieval, image processing via language model captions, and indexing.
+
+### Verify results
+
+Run a query against the index to check the results of image processing. Use [Search Explorer](search-explorer.md) as a search client, or any tool that sends HTTP requests. The following query selects fields that contain the output of image processing.
+
+```http
+POST /indexes/[index name]/docs/search?api-version=[api-version]
+{
+    "search": "A cat in a picture",
+    "select": "metadata_storage_name, captioned_image"
+}
+```
+
+## Related content
++ [Create indexer (REST)](/rest/api/searchservice/indexers/create)
++ [GenAI Prompt skill](cognitive-search-skill-genai-prompt.md)
++ [How to create a skillset](cognitive-search-defining-skillset.md)
++ [Map enriched output to fields](cognitive-search-output-field-mapping.md)
@@ -10,7 +10,7 @@ ms.custom:
   - ignite-2023
   - build-2024
 ms.topic: reference
-ms.date: 04/18/2025
+ms.date: 05/08/2025
 ---
 
 # AML skill in an Azure AI Search enrichment pipeline
@@ -37,7 +37,7 @@ Starting in 2024-05-01-preview REST API and in the Azure portal (which also targ
 
 During indexing, the **AML** skill can connect to the model catalog to generate vectors for the index. At query time, queries can use a vectorizer to connect to the same model to vectorize text strings for a vector query. In this workflow, the **AML** skill and the model catalog vectorizer should be used together so that you're using the same embedding model for both indexing and queries. See [Use embedding models from Azure AI Foundry model catalog](vector-search-integrated-vectorization-ai-studio.md) for details and for a list of the [supported embedding models](vector-search-integrated-vectorization-ai-studio.md#supported-embedding-models).
 
-We recommend using the [**Import and vectorize data**](search-get-started-portal-import-vectors.md) wizard to generate a skillset that includes an AML skill for deployed embedding models on Azure AI Foundry. AML skill definition for inputs, outputs, and mappings are generated by the wizard, which gives you an easy way to test a model before writing any code.
+We recommend using the [**Import and vectorize data wizard**](search-get-started-portal-import-vectors.md) to generate a skillset that includes an AML skill for deployed embedding models on Azure AI Foundry. AML skill definition for inputs, outputs, and mappings are generated by the wizard, which gives you an easy way to test a model before writing any code.
 
 ## Prerequisites
 
 
@@ -6,7 +6,7 @@ author: HeidiSteen
 ms.author: heidist
 ms.service: azure-ai-search
 ms.topic: how-to
-ms.date: 04/28/2025
+ms.date: 05/01/2025
 ms.custom:
   - devx-track-csharp
   - ignite-2023
@@ -100,7 +100,7 @@ Metadata adjustments are captured in a complex type created for each image. You
 
    The default of 2,000 pixels for the normalized images maximum width and height is based on the maximum sizes supported by the [OCR skill](cognitive-search-skill-ocr.md) and the [image analysis skill](cognitive-search-skill-image-analysis.md). The [OCR skill](cognitive-search-skill-ocr.md) supports a maximum width and height of 4,200 for non-English languages, and 10,000 for English. If you increase the maximum limits, processing could fail on larger images depending on your skillset definition and the language of the documents. 
 
-+ Optionally, [set file type criteria](search-blob-storage-integration.md#PartsOfBlobToIndex) if the workload targets a specific file type. Blob indexer configuration includes file inclusion and exclusion settings. You can filter out files you don't want.
+1.  Optionally, [set file type criteria](search-blob-storage-integration.md#PartsOfBlobToIndex) if the workload targets a specific file type. Blob indexer configuration includes file inclusion and exclusion settings. You can filter out files you don't want.
 
   ```json
   {
@@ -127,6 +127,7 @@ When `imageAction` is set to a value other than *none*, the new `normalized_imag
 | rotationFromOriginal |  Counter-clockwise rotation in degrees that occurred to create the normalized image. A value between 0 degrees and 360 degrees. This step reads the metadata from the image that is generated by a camera or scanner. Usually a multiple of 90 degrees. |
 | contentOffset | The character offset within the content field where the image was extracted from. This field is only applicable for files with embedded images. The `contentOffset` for images extracted from PDF documents is always at the end of the text on the page it was extracted from in the document. This means images appear after all text on that page, regardless of the original location of the image in the page. |
 | pageNumber | If the image was extracted or rendered from a PDF, this field contains the page number in the PDF it was extracted or rendered from, starting from 1. If the image isn't from a PDF, this field is 0.  |
+| boundingPolygon | If the image was extracted or rendered from a PDF, this field contains the coordinates of the bounding polygon that encloses the image on the page. The polygon is represented as a nested array of points, where each point has x and y coordinates normalized to the dimensions of the page. This only applies to images extracted using `imageAction: generateNormalizedImages`. |
 
  Sample value of `normalized_images`:
 
@@ -140,11 +141,16 @@ When `imageAction` is set to a value other than *none*, the new `normalized_imag
     "originalHeight": 3000,
     "rotationFromOriginal": 90,
     "contentOffset": 500,
-    "pageNumber": 2
+    "pageNumber": 2,
+    "boundingPolygon": "[[{\"x\":0.0,\"y\":0.0},{\"x\":500.0,\"y\":0.0},{\"x\":0.0,\"y\":300.0},{\"x\":500.0,\"y\":300.0}]]"
   }
 ]
 ```
 
+> [!NOTE]
+> Bounding polygon data is represented as a string containing a double-nested, JSON-encoded array of polygons. Each polygon is an array of points, where each point has x and y coordinates. Coordinates are relative to the PDF page, with the origin (0, 0) at the top-left corner.
+> Currently, images extracted using `imageAction: generateNormalizedImages` will always produce a single polygon, but the double-nested structure is maintained for consistency with the Document Layout skill, which supports multiple polygons.
+
 ## Define skillsets for image processing
 
 This section supplements the [skill reference](cognitive-search-predefined-skills.md) articles by providing context for working with skill inputs, outputs, and patterns, as they relate to image processing.
 
@@ -8,10 +8,10 @@ ms.service: azure-ai-search
 ms.custom:
   - ignite-2023
 ms.topic: conceptual
-ms.date: 12/10/2024
+ms.date: 05/08/2025
 ---
 
-# Tips for AI enrichment in Azure AI Search 
+# Tips for AI enrichment in Azure AI Search
 
 This article contains tips to help you get started with AI enrichment and skillsets used during indexing.