Skip to content

Commit e77ec88

Browse files
authored
Merge pull request #282236 from HeidiSteen/main
[azure search] Refresh output field mappings
2 parents 2bde1c9 + 6563761 commit e77ec88

File tree

1 file changed

+139
-58
lines changed

1 file changed

+139
-58
lines changed

articles/search/cognitive-search-output-field-mapping.md

Lines changed: 139 additions & 58 deletions
Original file line numberDiff line numberDiff line change
@@ -7,15 +7,17 @@ ms.author: heidist
77
ms.service: cognitive-search
88
ms.custom:
99
- ignite-2023
10-
ms.topic: conceptual
11-
ms.date: 01/18/2024
10+
ms.topic: how-to
11+
ms.date: 07/30/2024
1212
---
1313

1414
# Map enriched output to fields in a search index in Azure AI Search
1515

1616
![Indexer Stages](./media/cognitive-search-output-field-mapping/indexer-stages-output-field-mapping.png "indexer stages")
1717

18-
This article explains how to set up *output field mappings*, defining a data path between in-memory data structures created during [skillset processing](cognitive-search-concept-intro.md), and target fields in a search index. An output field mapping is defined in an [indexer](search-indexer-overview.md) and has the following elements:
18+
This article explains how to set up *output field mappings*, defining a data path between in-memory data generated during [skillset processing](cognitive-search-concept-intro.md), and target fields in a search index. During indexer execution, skills-generated information exists in memory only. To persist this information in a search index, you need to tell the indexer where to send the data.
19+
20+
An output field mapping is defined in an [indexer](search-indexer-overview.md) and has the following elements:
1921

2022
```json
2123
"outputFieldMappings": [
@@ -27,83 +29,110 @@ This article explains how to set up *output field mappings*, defining a data pat
2729
],
2830
```
2931

30-
In contrast with a [`fieldMappings`](search-indexer-field-mappings.md) definition that maps a path between two physical data structures, an `outputFieldMappings` definition maps in-memory enrichments to fields in a search index.
32+
In contrast with a [`fieldMappings`](search-indexer-field-mappings.md) definition that maps a path between verbatim source fields and index fields, an `outputFieldMappings` definition maps in-memory enrichments to fields in a search index.
3133

32-
Output field mappings are required if your indexer has an attached [skillset](cognitive-search-working-with-skillsets.md) that creates new information, such as text translation or key phrase extraction. During indexer execution, AI-generated information exists in memory only. To persist this information in a search index, you'll need to tell the indexer where to send the data.
34+
## Prerequisites
3335

34-
Output field mappings can also be used to retrieve specific nodes in a source document's complex type. For example, you might want just "FullName/LastName" in a multi-part "FullName" property. When you don't need the full complex structure, you can [flatten individual nodes in a nested data structures](#flattening-information-from-complex-types), and then use an output field mapping to send the output to a string collection in your search index.
36+
- Indexer, index, data source, and skillset.
3537

36-
Output field mappings apply to:
38+
- Index fields must be simple or top-level fields. You can't output to a [complex type](search-howto-complex-data-types.md), but if you have a complex type, you can use an output field definition to flatten parts of the complex type and send them to a collection in a search index.
3739

38-
+ In-memory content that's created by skills or extracted by an indexer. The source field is a node in an enriched document tree.
40+
## When to use an output field mapping
3941

40-
+ Search indexes. If you're populating a [knowledge store](knowledge-store-concept-intro.md), use [projections](knowledge-store-projections-examples.md) for data path configuration. If you're populating vector fields, output field mappings aren't used.
42+
Output field mappings are required if your indexer has an attached [skillset](cognitive-search-working-with-skillsets.md) that creates new information that you want in your index. Examples include:
4143

42-
Output field mappings are applied after [skillset execution](cognitive-search-working-with-skillsets.md) or after document cracking if there's no associated skillset.
44+
- Vectors from embedding skills
45+
- OCR text from image skills
46+
- Locations, organizations, or people from entity recognition skills
4347

44-
## Define an output field mapping
48+
Output field mappings can also be used to:
4549

46-
Output field mappings are added to the `outputFieldMappings` array in an indexer definition, typically placed after the `fieldMappings` array. An output field mapping consists of three parts.
50+
- Create multiple copies of your generated content (one-to-many output field mappings).
4751

48-
```json
49-
"fieldMappings": []
50-
"outputFieldMappings": [
51-
{
52-
"sourceFieldName": "/document/path-to-a-node-in-an-enriched-document",
53-
"targetFieldName": "some-search-field-in-an-index",
54-
"mappingFunction": null
55-
}
56-
],
57-
```
52+
- Flatten a source document's complex type. For example, assume source documents have a complex type, such as a multipart address, and you want just the city. You can use an output field mapping to [flatten a nested data structure](#flattening-information-from-complex-types), and then use an output field mapping to send the output to a string collection in your search index.
53+
54+
Output field mappings apply to search indexes only. If you're populating a [knowledge store](knowledge-store-concept-intro.md), use [projections](knowledge-store-projections-examples.md) for data path configuration.
5855

59-
| Property | Description |
60-
|----------|-------------|
61-
| sourceFieldName | Required. Specifies a path to enriched content. An example might be `/document/content`. See [Reference enrichments in an Azure AI Search skillset](cognitive-search-concept-annotations-syntax.md) for path syntax and examples. |
62-
| targetFieldName | Optional. Specifies the search field that receives the enriched content. Target fields must be top-level simple fields or collections. It can't be a path to a subfield in a complex type. If you want to retrieve specific nodes in a complex structure, you can [flatten individual nodes](#flattening-information-from-complex-types) in memory, and then send the output to a string collection in your index. |
63-
| mappingFunction | Optional. Adds extra processing provided by [mapping functions](search-indexer-field-mappings.md#mappingFunctions) supported by indexers. For enrichment nodes, encoding and decoding are the most commonly used functions. |
56+
## Define an output field mapping
57+
58+
Output field mappings are added to the `outputFieldMappings` array in an indexer definition, typically placed after the `fieldMappings` array. An output field mapping consists of three parts.
6459

6560
You can use the REST API or an Azure SDK to define output field mappings.
6661

6762
> [!TIP]
68-
> Indexers created by the [Import data wizard](search-import-data-portal.md) include output field mappings generated by the wizard. If you need examples, run the wizard over your data source to see the rendered definition.
63+
> Indexers created by the [Import data wizard](search-import-data-portal.md) include output field mappings generated by the wizard. If you need examples, run the wizard over your data source to see the output field mappings in the indexer.
6964
7065
### [**REST APIs**](#tab/rest)
7166

72-
Use [Create Indexer (REST)](/rest/api/searchservice/create-Indexer) or [Update Indexer (REST)](/rest/api/searchservice/update-indexer), any API version.
67+
1. Use [Create Indexer](/rest/api/searchservice/indexers/create) or [Create or Update Indexer](/rest/api/searchservice/indexers/create-or-update) or an equivalent method in an Azure SDK. Here's an example of an indexer definition.
68+
69+
```json
70+
{
71+
"name": "myindexer",
72+
"description": null,
73+
"dataSourceName": "mydatasource",
74+
"targetIndexName": "myindex",
75+
"schedule": { },
76+
"parameters": { },
77+
"fieldMappings": [],
78+
"outputFieldMappings": [],
79+
"disabled": false,
80+
"encryptionKey": { }
81+
}
82+
```
7383

74-
This example adds entities and sentiment labels extracted from a blob's content property to fields in a search index.
84+
1. Fill out the `outputFieldMappings` array to specify the mappings. A field mapping consists of three parts.
7585

76-
```JSON
77-
PUT https://[service name].search.windows.net/indexers/myindexer?api-version=[api-version]
78-
Content-Type: application/json
79-
api-key: [admin key]
80-
{
81-
"name": "myIndexer",
82-
"dataSourceName": "myDataSource",
83-
"targetIndexName": "myIndex",
84-
"skillsetName": "myFirstSkillSet",
85-
"fieldMappings": [],
86+
```json
8687
"outputFieldMappings": [
87-
{
88-
"sourceFieldName": "/document/content/organizations/*/description",
89-
"targetFieldName": "descriptions",
90-
"mappingFunction": {
91-
"name": "base64Decode"
92-
}
93-
},
94-
{
95-
"sourceFieldName": "/document/content/organizations",
96-
"targetFieldName": "orgNames"
97-
},
98-
{
99-
"sourceFieldName": "/document/content/sentiment",
100-
"targetFieldName": "sentiment"
101-
}
88+
{
89+
"sourceFieldName": "/document/path-to-a-node-in-an-enriched-document",
90+
"targetFieldName": "some-search-field-in-an-index",
91+
"mappingFunction": null
92+
}
10293
]
103-
}
104-
```
94+
```
95+
96+
| Property | Description |
97+
|----------|-------------|
98+
| sourceFieldName | Required. Specifies a path to enriched content. An example might be `/document/content`. See [Reference enrichments in an Azure AI Search skillset](cognitive-search-concept-annotations-syntax.md) for path syntax and examples. |
99+
| targetFieldName | Optional. Specifies the search field that receives the enriched content. Target fields must be top-level simple fields or collections. It can't be a path to a subfield in a complex type. If you want to retrieve specific nodes in a complex structure, you can [flatten individual nodes](#flattening-information-from-complex-types) in memory, and then send the output to a string collection in your index. |
100+
| mappingFunction | Optional. Adds extra processing provided by [mapping functions](search-indexer-field-mappings.md#mappingFunctions) supported by indexers. For enrichment nodes, encoding and decoding are the most commonly used functions. |
101+
102+
1. The `targetFieldName` is always the name of the field in the search index.
105103

106-
For each output field mapping, set the location of the data in the enriched document tree (sourceFieldName), and the name of the field as referenced in the index (targetFieldName). Assign any [mapping functions](search-indexer-field-mappings.md#mappingFunctions) needed to transform the content of a field before it's stored in the index.
104+
1. The `sourceFieldName` is a path to a node in the enriched document. It's the output of a skill. The path always starts with `/document`, and if you're indexing from a blob, the second element of the path is `/content`. The third element is the value produced by the skill. For more information and examples, see [Reference enrichments in an Azure AI Search skillset](cognitive-search-concept-annotations-syntax.md).
105+
106+
This example adds entities and sentiment labels extracted from a blob's content property to fields in a search index.
107+
108+
```JSON
109+
{
110+
"name": "myIndexer",
111+
"dataSourceName": "myDataSource",
112+
"targetIndexName": "myIndex",
113+
"skillsetName": "myFirstSkillSet",
114+
"fieldMappings": [],
115+
"outputFieldMappings": [
116+
{
117+
"sourceFieldName": "/document/content/organizations/*/description",
118+
"targetFieldName": "descriptions",
119+
"mappingFunction": {
120+
"name": "base64Decode"
121+
}
122+
},
123+
{
124+
"sourceFieldName": "/document/content/organizations",
125+
"targetFieldName": "orgNames"
126+
},
127+
{
128+
"sourceFieldName": "/document/content/sentiment",
129+
"targetFieldName": "sentiment"
130+
}
131+
]
132+
}
133+
```
134+
135+
1. Assign any [mapping functions](search-indexer-field-mappings.md#mappingFunctions) needed to transform the content of a field before it's stored in the index. For enrichment nodes, encoding and decoding are the most commonly used functions.
107136

108137
### [**.NET SDK (C#)**](#tab/csharp)
109138

@@ -132,6 +161,58 @@ await indexerClient.CreateIndexerAsync(indexer);
132161

133162
---
134163

164+
## One-to-many output field mapping
165+
166+
You can use an output field mapping to route a single source field to multiple fields in a search index. You might do this for comparison testing or if you want fields with different attributes.
167+
168+
Assume a skillset that generates embeddings for a vector field, and an index that has multiple vector fields that vary by algorithm and compression settings. Within the indexer, map the embedding skill's output to each of the multiple vector fields in a search index.
169+
170+
```json
171+
"outputFieldMappings": [
172+
{ "sourceFieldName" : "/document/content/text_vector", "targetFieldName" : "vector_hnsw" },
173+
{ "sourceFieldName" : "/document/content/text_vector", "targetFieldName" : "vector_eknn" },
174+
{ "sourceFieldName" : "/document/content/text_vector", "targetFieldName" : "vector_narrow" },
175+
{ "sourceFieldName" : "/document/content/text_vector", "targetFieldName" : "vector_no_stored" },
176+
{ "sourceFieldName" : "/document/content/text_vector", "targetFieldName" : "vector_scalar" }
177+
]
178+
```
179+
180+
The source field path is skill output. In this example, the output is `text_vector`. Target name is an optional property. If you don't give the output mapping a target name, the path would be `embedding` or more precisely, `/document/content/embedding`.
181+
182+
```json
183+
{
184+
"name": "test-vector-size-ss",
185+
"description": "Generate embeddings using AOAI",
186+
"skills": [
187+
{
188+
"@odata.type": "#Microsoft.Skills.Text.AzureOpenAIEmbeddingSkill",
189+
"name": "#1",
190+
"description": null,
191+
"context": "/document/content",
192+
"resourceUri": "https://my-demo-eastus.openai.azure.com",
193+
"apiKey": null,
194+
"deploymentId": "text-embedding-ada-002",
195+
"dimensions": 1536,
196+
"modelName": "text-embedding-ada-002",
197+
"inputs": [
198+
{
199+
"name": "text",
200+
"source": "/document/content"
201+
}
202+
],
203+
"outputs": [
204+
{
205+
"name": "embedding",
206+
"targetName": "text_vector"
207+
}
208+
],
209+
"authIdentity": null
210+
}
211+
]
212+
}
213+
```
214+
215+
135216
<a name="flattening-information-from-complex-types"></a>
136217

137218
## Flatten complex structures into a string collection

0 commit comments

Comments
 (0)