Skip to content

Commit 65c46da

Browse files
committed
Revised the flattened data section
1 parent 2489757 commit 65c46da

File tree

2 files changed

+168
-54
lines changed

2 files changed

+168
-54
lines changed

articles/search/cognitive-search-output-field-mapping.md

Lines changed: 164 additions & 52 deletions
Original file line numberDiff line numberDiff line change
@@ -30,26 +30,15 @@ In contrast with a [`fieldMappings`](search-indexer-field-mappings.md) definitio
3030

3131
Output field mappings are required if your indexer has an attached [skillset](cognitive-search-working-with-skillsets.md) that creates new information, such as text translation or key phrase extraction. During indexer execution, AI-generated information exists in memory only. To persist this information in a search index, you'll need to tell the indexer where to send the data.
3232

33-
Output field mappings can also be used to flatten nested data structures during indexing. A regular [fieldMapping definition](search-indexer-field-mappings.md) doesn't support target fields of a complex type. If you need to set up field associations for hierarchical or nested data structures, you can use a skillset and an output field mapping to create the data path.
33+
Output field mappings can also be used to [flatten nested data structures]() during indexing. A regular [fieldMapping definition](search-indexer-field-mappings.md) doesn't support target fields of a complex type. If you need to set up field associations for hierarchical or nested data structures, you can use a skillset and an output field mapping to create the data path.
3434

3535
Output field mappings apply to:
3636

37-
+ Content that's created by skills or extracted by an indexer.
37+
+ Content that's created by skills or extracted by an indexer. The source field is a node in an enriched document residing in memory.
3838

3939
+ Search indexes. If you're populating a [knowledge store](knowledge-store-concept-intro.md), use [projections](knowledge-store-projections-examples.md) for data path configuration.
4040

41-
Output field mappings always occur after [skillset execution](cognitive-search-working-with-skillsets.md), although it's possible for this stage to run even if no skillset is defined.
42-
43-
<!--
44-
The enriched document is really a tree of information, and even though there is support for complex types in the index, sometimes you may want to transform the information from the enriched tree into a more simple type (for instance, an array of strings).
45-
46-
Examples of output field mapping scenarios:
47-
48-
* **Content consolidation.** Your skillset extracts the names of organizations mentioned in within each page of a document. Now you want to map each of those organization names into a field in your index of type Edm.Collection(Edm.String).
49-
50-
* **Content creation.** As part of your skillset, you produced a new node called "document/translated_text". You would like to map this new information to a specific field in your index.
51-
52-
* **Content extraction.** You don’t have a skillset but are indexing a complex type from a Cosmos DB database. You'd like to get to a node on that complex type and map it into a field in your index. -->
41+
Output field mappings always occur after [skillset execution](cognitive-search-working-with-skillsets.md), although it's possible to processing an output field mapping if no skillset is defined. See
5342

5443
## Define an output field mapping
5544

@@ -69,8 +58,8 @@ Output field mappings are added to the `outputFieldMappings` array in an indexer
6958
| Property | Description |
7059
|----------|-------------|
7160
| "sourceFieldName" | Required. Specifies a path to enriched content. See [Reference annotations in an Azure Cognitive Search skillset](cognitive-search-concept-annotations-syntax.md) for path syntax. |
72-
| "targetFieldName" | Optional. Specifies the search field that receives the enriched content. This is always a single top-level field or a collection. |
73-
| "mappingFunction" | Optional. Adds extra processing provided by [search-indexer-field-mappings.md#predefined functions](#mappingFunctions) supported by indexers. In the case of enrichment nodes, encoding and decoding are the most commonly used functions. |
61+
| "targetFieldName" | Optional. Specifies the search field that receives the enriched content. This is always a single top-level field or a collection. Target fields must be top-level simple fields or collections. It can't be a path to a subfield in a complex type. Although a target field can't resolve to subfield, you can [flatten a nested source structure into a string collection](#flatten-information-from-complex-types) in memory, and then send it to a string collection in your index. |
62+
| "mappingFunction" | Optional. Adds extra processing provided by [mapping functions](search-indexer-field-mappings.md#mappingFunctions) supported by indexers. In the case of enrichment nodes, encoding and decoding are the most commonly used functions. |
7463

7564
You can use the REST API or an Azure SDK to define output field mappings.
7665

@@ -145,66 +134,189 @@ await indexerClient.CreateIndexerAsync(indexer);
145134

146135
<a name="flatten-information-from-complex-types"></a>
147136

148-
## Flatten complex structures for data import
149-
150-
The path in a sourceFieldName can represent one element or multiple elements. In the example above, ```/document/content/sentiment``` represents a single numeric value, while ```/document/content/organizations/*/description``` represents several organization descriptions.
137+
## Flatten complex structures into a string collection
151138

152-
In cases where there are several elements, they are "flattened" into an array that contains each of the elements.
139+
If your source data is composed of nested or hierarchical JSON, you can't use field mappings to set up the data paths. Instead, your search index must mirror the source data structure for at each level for a full import.
153140

154-
More concretely, for the ```/document/content/organizations/*/description``` example, the data in the *descriptions* field would look like a flat array of descriptions before it gets indexed:
155-
156-
```
157-
["Microsoft is a company in Seattle","LinkedIn's office is in San Francisco"]
158-
```
159-
160-
This is an important principle, so we'll provide another example. Imagine that you have an array of complex types as part of the enrichment tree. Let's say there's a member called customEntities that has an array of complex types like the one described below.
141+
Sample source JSON document in Cosmos DB with nested JSON:
161142

162143
```json
163144
{
164-
"document/customEntities":[
145+
"palette":"primary colors",
146+
"colors":[
165147
{
166-
"name":"heart failure",
167-
"matches":[
168-
{
169-
"text":"heart failure",
170-
"offset":10,
171-
"length":12,
172-
"matchDistance":0.0
173-
}
148+
"name":"blue",
149+
"medium":[
150+
"acrylic",
151+
"oil",
152+
"pastel"
174153
]
175154
},
176155
{
177-
"name":"morquio",
178-
"matches":[
179-
{
180-
"text":"morquio",
181-
"offset":25,
182-
"length":7,
183-
"matchDistance":0.0
184-
}
156+
"name":"red",
157+
"medium":[
158+
"acrylic",
159+
"pastel",
160+
"watercolor"
161+
]
162+
},
163+
{
164+
"name":"yellow",
165+
"medium":[
166+
"acrylic",
167+
"watercolor"
185168
]
186169
}
187170
]
188171
}
189172
```
190173

191-
Let's assume that your index has a field called 'diseases' of type Collection(Edm.String), where you would like to store each of the names of the entities.
174+
Sample index definition in Cognitive Search, where names, levels, and types are reflected as a complex type:
192175

193-
This can be done easily by using the "\*" symbol, as follows:
176+
```json
177+
{
178+
"name": "my-test-index",
179+
"defaultScoringProfile": "",
180+
"fields": [
181+
{ "name": "id", "type": "Edm.String", "searchable": false, "retrievable": true, "key": true},
182+
{ "name": "palette", "type": "Edm.String", "searchable": true, "retrievable": true },
183+
{ "name": "colors", "type": "Collection(Edm.ComplexType)",
184+
"fields": [
185+
{
186+
"name": "name",
187+
"type": "Edm.String",
188+
"searchable": true,
189+
"retrievable": true
190+
},
191+
{
192+
"name": "medium",
193+
"type": "Collection(Edm.String)",
194+
"searchable": true,
195+
"retrievable": true,
196+
}
197+
]
198+
}
199+
]
200+
}
201+
```
202+
203+
Sample indexer definition that runs the import (notice there are no field mappings and no skillset):
194204

195205
```json
196-
"outputFieldMappings": [
206+
{
207+
"name": "my-test-indexer",
208+
"dataSourceName": "my-test-ds",
209+
"skillsetName": null,
210+
"targetIndexName": "my-test-index",
211+
212+
"fieldMappings": [],
213+
"outputFieldMappings": []
214+
}
215+
```
216+
217+
Sample search document, post-import, is similar to the original in Cosmos DB:
218+
219+
```json
220+
{
221+
"value": [
222+
{
223+
"@search.score": 1,
224+
"id": "240a98f5-90c9-406b-a8c8-f50ff86f116c",
225+
"palette": "primary colors",
226+
"colors": [
227+
{
228+
"name": "blue",
229+
"medium": [
230+
"acrylic",
231+
"oil",
232+
"pastel"
233+
]
234+
},
235+
{
236+
"name": "red",
237+
"medium": [
238+
"acrylic",
239+
"pastel",
240+
"watercolor"
241+
]
242+
},
197243
{
198-
"sourceFieldName": "/document/customEntities/*/name",
199-
"targetFieldName": "diseases"
244+
"name": "yellow",
245+
"medium": [
246+
"acrylic",
247+
"watercolor"
248+
]
200249
}
201-
]
250+
]
251+
}
252+
]
253+
}
254+
```
255+
256+
An alternative rendering in a search index is to flatten the source's nested structure to a string collection in a search index.
257+
258+
To accomplish this task, you'll need an `outputFieldMapping` that maps an in-memory node to a string collection in the index. Although output field mappings primarily apply to skill outputs, you can also use them to address nodes after "document cracking" where the indexer opens a source document and reads it into memory.
259+
260+
Below is a sample index definition in Cognitive Search, using string collections to receive flattened output:
261+
262+
```json
263+
{
264+
"name": "my-new-flattened-index",
265+
"defaultScoringProfile": "",
266+
"fields": [
267+
{ "name": "id", "type": "Edm.String", "searchable": false, "retrievable": true, "key": true },
268+
{ "name": "palette", "type": "Edm.String", "searchable": true, "retrievable": true },
269+
{ "name": "color_names", "type": "Collection(Edm.String)", "searchable": true, "retrievable": true },
270+
{ "name": "color_mediums", "type": "Collection(Edm.String)", "searchable": true, "retrievable": true}
271+
]
272+
}
202273
```
203274

204-
This operation will simply “flatten” each of the names of the customEntities elements into a single array of strings like this:
275+
Here's the sample indexer definition, using `outputFieldMappings` to associate the nested JSON with the string collection fields. Notice that the source field uses the path syntax for enrichment nodes, even though there's no skillset. Enriched documents are created in the system during document cracking, which means you can access nodes in each document tree as long as those nodes exist when the document is cracked.
205276

206277
```json
207-
"diseases" : ["heart failure","morquio"]
278+
{
279+
"name": "my-test-indexer",
280+
"dataSourceName": "my-test-ds",
281+
"skillsetName": null,
282+
"targetIndexName": "my-new-flattened-index",
283+
"parameters": { },
284+
"fieldMappings": [ ],
285+
"outputFieldMappings": [
286+
{
287+
"sourceFieldName": "/document/colors/*/name",
288+
"targetFieldName": "color_names"
289+
},
290+
{
291+
"sourceFieldName": "/document/colors/*/medium",
292+
"targetFieldName": "color_mediums"
293+
}
294+
]
295+
}
296+
```
297+
298+
Results from the above definition are as follows. Simplifying the structure loses context in this case. There's no longer any associations between a given color and the mediums it's available in. However, depending on your scenario, a result similar to the one shown below might be exactly what you need.
299+
300+
```json
301+
{
302+
"value": [
303+
{
304+
"@search.score": 1,
305+
"id": "240a98f5-90c9-406b-a8c8-f50ff86f116c",
306+
"palette": "primary colors",
307+
"color_names": [
308+
"blue",
309+
"red",
310+
"yellow"
311+
],
312+
"color_mediums": [
313+
"[\"acrylic\",\"oil\",\"pastel\"]",
314+
"[\"acrylic\",\"pastel\",\"watercolor\"]",
315+
"[\"acrylic\",\"watercolor\"]"
316+
]
317+
}
318+
]
319+
}
208320
```
209321

210322
## See also

articles/search/search-indexer-field-mappings.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,9 @@ Field mappings apply to:
2626

2727
+ Search indexes only. If you're populating a [knowledge store](knowledge-store-concept-intro.md), use [projections](knowledge-store-projections-examples.md) for data path configuration.
2828

29-
+ Top-level search fields only, where the "targetFieldName" is either a simple field or a collection. If you're working with complex data (nested or hierarchical structures), see [outputFieldMappings](cognitive-search-output-field-mapping.md) for workarounds.
29+
+ Top-level search fields only, where the "targetFieldName" is either a simple field or a collection. A target field can't be a complex type.
30+
31+
If you're working with complex data (nested or hierarchical structures), and you'd like to mirror that data structure in your search index, your search index must match the source structure exactly (same field names, levels, and types) so that the default mappings will work. Optionally, you can flatten incoming data into a string collection (see [outputFieldMappings](cognitive-search-output-field-mapping.md) for this workaround).
3032

3133
## Supported scenarios
3234

@@ -55,7 +57,7 @@ Field mappings are added to the "fieldMappings" array of an indexer definition.
5557
| Property | Description |
5658
|----------|-------------|
5759
| "sourceFieldName" | Required. Represents a field in your data source. |
58-
| "targetFieldName" | Optional. Represents a field in your search index. If omitted, the value of "sourceFieldName" is assumed for the target. Target fields must be top-level simple fields or collections. It can't be a path to a subfield in a complex type. If you need this functionality, use an [outputFieldMapping](cognitive-search-output-field-mapping.md) instead.|
60+
| "targetFieldName" | Optional. Represents a field in your search index. If omitted, the value of "sourceFieldName" is assumed for the target. Target fields must be top-level simple fields or collections. It can't be a complex type or collection. |
5961
| "mappingFunction" | Optional. Consists of [predefined functions](#mappingFunctions) that transform data. You can apply functions to both source and target field mappings. |
6062

6163
Azure Cognitive Search uses case-insensitive comparison to resolve the field and function names in field mappings. This is convenient (you don't have to get all the casing right), but it means that your data source or index can't have fields that differ only by case.

0 commit comments

Comments
 (0)