You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -30,26 +30,15 @@ In contrast with a [`fieldMappings`](search-indexer-field-mappings.md) definitio
30
30
31
31
Output field mappings are required if your indexer has an attached [skillset](cognitive-search-working-with-skillsets.md) that creates new information, such as text translation or key phrase extraction. During indexer execution, AI-generated information exists in memory only. To persist this information in a search index, you'll need to tell the indexer where to send the data.
32
32
33
-
Output field mappings can also be used to flatten nested data structures during indexing. A regular [fieldMapping definition](search-indexer-field-mappings.md) doesn't support target fields of a complex type. If you need to set up field associations for hierarchical or nested data structures, you can use a skillset and an output field mapping to create the data path.
33
+
Output field mappings can also be used to [flatten nested data structures]() during indexing. A regular [fieldMapping definition](search-indexer-field-mappings.md) doesn't support target fields of a complex type. If you need to set up field associations for hierarchical or nested data structures, you can use a skillset and an output field mapping to create the data path.
34
34
35
35
Output field mappings apply to:
36
36
37
-
+ Content that's created by skills or extracted by an indexer.
37
+
+ Content that's created by skills or extracted by an indexer. The source field is a node in an enriched document residing in memory.
38
38
39
39
+ Search indexes. If you're populating a [knowledge store](knowledge-store-concept-intro.md), use [projections](knowledge-store-projections-examples.md) for data path configuration.
40
40
41
-
Output field mappings always occur after [skillset execution](cognitive-search-working-with-skillsets.md), although it's possible for this stage to run even if no skillset is defined.
42
-
43
-
<!--
44
-
The enriched document is really a tree of information, and even though there is support for complex types in the index, sometimes you may want to transform the information from the enriched tree into a more simple type (for instance, an array of strings).
45
-
46
-
Examples of output field mapping scenarios:
47
-
48
-
* **Content consolidation.** Your skillset extracts the names of organizations mentioned in within each page of a document. Now you want to map each of those organization names into a field in your index of type Edm.Collection(Edm.String).
49
-
50
-
* **Content creation.** As part of your skillset, you produced a new node called "document/translated_text". You would like to map this new information to a specific field in your index.
51
-
52
-
* **Content extraction.** You don’t have a skillset but are indexing a complex type from a Cosmos DB database. You'd like to get to a node on that complex type and map it into a field in your index. -->
41
+
Output field mappings always occur after [skillset execution](cognitive-search-working-with-skillsets.md), although it's possible to processing an output field mapping if no skillset is defined. See
53
42
54
43
## Define an output field mapping
55
44
@@ -69,8 +58,8 @@ Output field mappings are added to the `outputFieldMappings` array in an indexer
69
58
| Property | Description |
70
59
|----------|-------------|
71
60
| "sourceFieldName" | Required. Specifies a path to enriched content. See [Reference annotations in an Azure Cognitive Search skillset](cognitive-search-concept-annotations-syntax.md) for path syntax. |
72
-
| "targetFieldName" | Optional. Specifies the search field that receives the enriched content. This is always a single top-level field or a collection. |
73
-
| "mappingFunction" | Optional. Adds extra processing provided by [search-indexer-field-mappings.md#predefined functions](#mappingFunctions) supported by indexers. In the case of enrichment nodes, encoding and decoding are the most commonly used functions. |
61
+
| "targetFieldName" | Optional. Specifies the search field that receives the enriched content. This is always a single top-level field or a collection. Target fields must be top-level simple fields or collections. It can't be a path to a subfield in a complex type. Although a target field can't resolve to subfield, you can [flatten a nested source structure into a string collection](#flatten-information-from-complex-types) in memory, and then send it to a string collection in your index. |
62
+
| "mappingFunction" | Optional. Adds extra processing provided by [mapping functions](search-indexer-field-mappings.md#mappingFunctions) supported by indexers. In the case of enrichment nodes, encoding and decoding are the most commonly used functions. |
74
63
75
64
You can use the REST API or an Azure SDK to define output field mappings.
The path in a sourceFieldName can represent one element or multiple elements. In the example above, ```/document/content/sentiment``` represents a single numeric value, while ```/document/content/organizations/*/description``` represents several organization descriptions.
137
+
## Flatten complex structures into a string collection
151
138
152
-
In cases where there are several elements, they are "flattened" into an array that contains each of the elements.
139
+
If your source data is composed of nested or hierarchical JSON, you can't use field mappings to set up the data paths. Instead, your search index must mirror the source data structure for at each level for a full import.
153
140
154
-
More concretely, for the ```/document/content/organizations/*/description``` example, the data in the *descriptions* field would look like a flat array of descriptions before it gets indexed:
155
-
156
-
```
157
-
["Microsoft is a company in Seattle","LinkedIn's office is in San Francisco"]
158
-
```
159
-
160
-
This is an important principle, so we'll provide another example. Imagine that you have an array of complex types as part of the enrichment tree. Let's say there's a member called customEntities that has an array of complex types like the one described below.
141
+
Sample source JSON document in Cosmos DB with nested JSON:
161
142
162
143
```json
163
144
{
164
-
"document/customEntities":[
145
+
"palette":"primary colors",
146
+
"colors":[
165
147
{
166
-
"name":"heart failure",
167
-
"matches":[
168
-
{
169
-
"text":"heart failure",
170
-
"offset":10,
171
-
"length":12,
172
-
"matchDistance":0.0
173
-
}
148
+
"name":"blue",
149
+
"medium":[
150
+
"acrylic",
151
+
"oil",
152
+
"pastel"
174
153
]
175
154
},
176
155
{
177
-
"name":"morquio",
178
-
"matches":[
179
-
{
180
-
"text":"morquio",
181
-
"offset":25,
182
-
"length":7,
183
-
"matchDistance":0.0
184
-
}
156
+
"name":"red",
157
+
"medium":[
158
+
"acrylic",
159
+
"pastel",
160
+
"watercolor"
161
+
]
162
+
},
163
+
{
164
+
"name":"yellow",
165
+
"medium":[
166
+
"acrylic",
167
+
"watercolor"
185
168
]
186
169
}
187
170
]
188
171
}
189
172
```
190
173
191
-
Let's assume that your index has a field called 'diseases' of type Collection(Edm.String), where you would like to store each of the names of the entities.
174
+
Sample index definition in Cognitive Search, where names, levels, and types are reflected as a complex type:
192
175
193
-
This can be done easily by using the "\*" symbol, as follows:
An alternative rendering in a search index is to flatten the source's nested structure to a string collection in a search index.
257
+
258
+
To accomplish this task, you'll need an `outputFieldMapping` that maps an in-memory node to a string collection in the index. Although output field mappings primarily apply to skill outputs, you can also use them to address nodes after "document cracking" where the indexer opens a source document and reads it into memory.
259
+
260
+
Below is a sample index definition in Cognitive Search, using string collections to receive flattened output:
This operation will simply “flatten” each of the names of the customEntities elements into a single array of strings like this:
275
+
Here's the sample indexer definition, using `outputFieldMappings` to associate the nested JSON with the string collection fields. Notice that the source field uses the path syntax for enrichment nodes, even though there's no skillset. Enriched documents are created in the system during document cracking, which means you can access nodes in each document tree as long as those nodes exist when the document is cracked.
205
276
206
277
```json
207
-
"diseases" : ["heart failure","morquio"]
278
+
{
279
+
"name": "my-test-indexer",
280
+
"dataSourceName": "my-test-ds",
281
+
"skillsetName": null,
282
+
"targetIndexName": "my-new-flattened-index",
283
+
"parameters": { },
284
+
"fieldMappings": [ ],
285
+
"outputFieldMappings": [
286
+
{
287
+
"sourceFieldName": "/document/colors/*/name",
288
+
"targetFieldName": "color_names"
289
+
},
290
+
{
291
+
"sourceFieldName": "/document/colors/*/medium",
292
+
"targetFieldName": "color_mediums"
293
+
}
294
+
]
295
+
}
296
+
```
297
+
298
+
Results from the above definition are as follows. Simplifying the structure loses context in this case. There's no longer any associations between a given color and the mediums it's available in. However, depending on your scenario, a result similar to the one shown below might be exactly what you need.
Copy file name to clipboardExpand all lines: articles/search/search-indexer-field-mappings.md
+4-2Lines changed: 4 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -26,7 +26,9 @@ Field mappings apply to:
26
26
27
27
+ Search indexes only. If you're populating a [knowledge store](knowledge-store-concept-intro.md), use [projections](knowledge-store-projections-examples.md) for data path configuration.
28
28
29
-
+ Top-level search fields only, where the "targetFieldName" is either a simple field or a collection. If you're working with complex data (nested or hierarchical structures), see [outputFieldMappings](cognitive-search-output-field-mapping.md) for workarounds.
29
+
+ Top-level search fields only, where the "targetFieldName" is either a simple field or a collection. A target field can't be a complex type.
30
+
31
+
If you're working with complex data (nested or hierarchical structures), and you'd like to mirror that data structure in your search index, your search index must match the source structure exactly (same field names, levels, and types) so that the default mappings will work. Optionally, you can flatten incoming data into a string collection (see [outputFieldMappings](cognitive-search-output-field-mapping.md) for this workaround).
30
32
31
33
## Supported scenarios
32
34
@@ -55,7 +57,7 @@ Field mappings are added to the "fieldMappings" array of an indexer definition.
55
57
| Property | Description |
56
58
|----------|-------------|
57
59
| "sourceFieldName" | Required. Represents a field in your data source. |
58
-
| "targetFieldName" | Optional. Represents a field in your search index. If omitted, the value of "sourceFieldName" is assumed for the target. Target fields must be top-level simple fields or collections. It can't be a path to a subfield in a complex type. If you need this functionality, use an [outputFieldMapping](cognitive-search-output-field-mapping.md) instead.|
60
+
| "targetFieldName" | Optional. Represents a field in your search index. If omitted, the value of "sourceFieldName" is assumed for the target. Target fields must be top-level simple fields or collections. It can't be a complex type or collection. |
59
61
| "mappingFunction" | Optional. Consists of [predefined functions](#mappingFunctions) that transform data. You can apply functions to both source and target field mappings. |
60
62
61
63
Azure Cognitive Search uses case-insensitive comparison to resolve the field and function names in field mappings. This is convenient (you don't have to get all the casing right), but it means that your data source or index can't have fields that differ only by case.
0 commit comments