Skip to content

Commit b19c6e0

Browse files
authored
Merge pull request #211200 from HeidiSteen/heidist-fresh
[azure search] Freshness pass on field mapping docs
2 parents e53e209 + 05021c4 commit b19c6e0

File tree

5 files changed

+303
-128
lines changed

5 files changed

+303
-128
lines changed
Lines changed: 254 additions & 73 deletions
Original file line numberDiff line numberDiff line change
@@ -1,61 +1,87 @@
11
---
2-
title: Map skill output fields
2+
title: Map enrichments in indexers
33
titleSuffix: Azure Cognitive Search
44
description: Export the enriched content created by a skillset by mapping its output fields to fields in a search index.
55

6-
author: LiamCavanagh
7-
ms.author: liamca
6+
author: HeidiSteen
7+
ms.author: heidist
88
ms.service: cognitive-search
99
ms.topic: conceptual
10-
ms.date: 08/10/2021
10+
ms.date: 09/14/2022
1111
---
1212

13-
# Map enrichment output to fields in a search index
13+
# Map enriched output to fields in a search index in Azure Cognitive Search
1414

1515
![Indexer Stages](./media/cognitive-search-output-field-mapping/indexer-stages-output-field-mapping.png "indexer stages")
1616

17-
In this article, you learn how to map enriched input fields to output fields in a searchable index. Once you've [defined a skillset](cognitive-search-defining-skillset.md), you must map the output fields of any skill that directly contributes values to a given field in your search index.
17+
This article explains how to set up *output field mappings* that determine a data path between in-memory data structures created during skill processing, and target fields in a search index. An output field mapping is defined in an [indexer](search-indexer-overview.md) and has the following elements:
1818

19-
Output Field Mappings are required for moving content from enriched documents into the index. The enriched document is really a tree of information, and even though there is support for complex types in the index, sometimes you may want to transform the information from the enriched tree into a more simple type (for instance, an array of strings). Output field mappings allow you to perform data shape transformations by flattening information. Output field mappings always occur after skillset execution, although it is possible for this stage to run even if no skillset is defined.
19+
```json
20+
"outputFieldMappings": [
21+
{
22+
"sourceFieldName": "document/path-to-a-node-in-an-enriched-document",
23+
"targetFieldName": "some-search-field-in-an-index",
24+
"mappingFunction": null
25+
}
26+
],
27+
```
2028

21-
Examples of output field mappings:
29+
In contrast with a [`fieldMappings`](search-indexer-field-mappings.md) definition that maps a path between two physical data structures, an `outputFieldMappings` definition maps in-memory data to fields in a search index.
2230

23-
* As part of your skillset, you extracted the names of organizations mentioned in each of the pages of your document. Now you want to map each of those organization names into a field in your index of type Edm.Collection(Edm.String).
31+
Output field mappings are required if your indexer has an attached [skillset](cognitive-search-working-with-skillsets.md) that creates new information, such as text translation or key phrase extraction. During indexer execution, AI-generated information exists in memory only. To persist this information in a search index, you'll need to tell the indexer where to send the data.
2432

25-
* As part of your skillset, you produced a new node called “document/translated_text”. You would like to map the information on this node to a specific field in your index.
33+
Output field mappings can also be used to retrieve specific nodes in a source document's complex type. If you don't need the full complex structure, you can [flatten individual nodes in a nested data structures](#flattening-information-from-complex-types), and then use an output field mapping to send the output to a string collection in your search index.
2634

27-
* You don’t have a skillset but are indexing a complex type from a Cosmos DB database. You would like to get to a node on that complex type and map it into a field in your index.
35+
Output field mappings apply to:
2836

29-
> [!NOTE]
30-
> Output field mappings apply to search indexes only. For indexers that create [knowledge stores](knowledge-store-concept-intro.md), output field mappings are ignored.
37+
+ Content that's created by skills or extracted by an indexer. The source field is a node in an enriched document residing in memory.
3138

32-
## Use outputFieldMappings
39+
+ Search indexes. If you're populating a [knowledge store](knowledge-store-concept-intro.md), use [projections](knowledge-store-projections-examples.md) for data path configuration.
3340

34-
To map fields, add `outputFieldMappings` to your indexer definition as shown below:
41+
Output field mappings are applied after [skillset execution](cognitive-search-working-with-skillsets.md) or after document cracking if there's no associated skillset.
3542

36-
```http
37-
PUT https://[servicename].search.windows.net/indexers/[indexer name]?api-version=2020-06-30
38-
api-key: [admin key]
39-
Content-Type: application/json
40-
```
43+
## Define an output field mapping
4144

42-
The body of the request is structured as follows:
45+
Output field mappings are added to the `outputFieldMappings` array in an indexer definition, typically placed after the `fieldMappings` array. An output field mapping consists of three parts.
4346

4447
```json
48+
"fieldMappings": []
49+
"outputFieldMappings": [
50+
{
51+
"sourceFieldName": "/document/path-to-a-node-in-an-enriched-document",
52+
"targetFieldName": "some-search-field-in-an-index",
53+
"mappingFunction": null
54+
}
55+
],
56+
```
57+
58+
| Property | Description |
59+
|----------|-------------|
60+
| sourceFieldName | Required. Specifies a path to enriched content. An example might be `/document/content`. See [Reference annotations in an Azure Cognitive Search skillset](cognitive-search-concept-annotations-syntax.md) for path syntax and examples. |
61+
| targetFieldName | Optional. Specifies the search field that receives the enriched content. Target fields must be top-level simple fields or collections. It can't be a path to a subfield in a complex type. If you want to retrieve specific nodes in a complex structure, you can [flatten individual nodes](#flattening-information-from-complex-types) in memory, and then send the output to a string collection in your index. |
62+
| mappingFunction | Optional. Adds extra processing provided by [mapping functions](search-indexer-field-mappings.md#mappingFunctions) supported by indexers. In the case of enrichment nodes, encoding and decoding are the most commonly used functions. |
63+
64+
You can use the REST API or an Azure SDK to define output field mappings.
65+
66+
> [!TIP]
67+
> Indexers created by the [Import data wizard](search-import-data-portal.md) include output field mappings generated by the wizard. If you need examples, run the wizard over your data source to see the rendered definition.
68+
69+
### [**REST APIs**](#tab/rest)
70+
71+
Use [Create Indexer (REST)](/rest/api/searchservice/create-Indexer) or [Update Indexer (REST)](/rest/api/searchservice/update-indexer), any API version.
72+
73+
This example adds entities and sentiment labels extracted from a blob's content property to fields in a search index.
74+
75+
```JSON
76+
PUT https://[service name].search.windows.net/indexers/myindexer?api-version=[api-version]
77+
Content-Type: application/json
78+
api-key: [admin key]
4579
{
4680
"name": "myIndexer",
4781
"dataSourceName": "myDataSource",
4882
"targetIndexName": "myIndex",
4983
"skillsetName": "myFirstSkillSet",
50-
"fieldMappings": [
51-
{
52-
"sourceFieldName": "metadata_storage_path",
53-
"targetFieldName": "id",
54-
"mappingFunction": {
55-
"name": "base64Encode"
56-
}
57-
}
58-
],
84+
"fieldMappings": [],
5985
"outputFieldMappings": [
6086
{
6187
"sourceFieldName": "/document/content/organizations/*/description",
@@ -76,72 +102,227 @@ The body of the request is structured as follows:
76102
}
77103
```
78104

79-
For each output field mapping, set the location of the data in the enriched document tree (sourceFieldName), and the name of the field as referenced in the index (targetFieldName). Assign any [mapping functions](search-indexer-field-mappings.md#field-mapping-functions-and-examples) that you require to transform the content of a field before it's stored in the index.
105+
For each output field mapping, set the location of the data in the enriched document tree (sourceFieldName), and the name of the field as referenced in the index (targetFieldName). Assign any [mapping functions](search-indexer-field-mappings.md#mappingFunctions) that you require to transform the content of a field before it's stored in the index.
80106

81-
## Flattening Information from Complex Types
107+
### [**.NET SDK (C#)**](#tab/csharp)
82108

83-
The path in a sourceFieldName can represent one element or multiple elements. In the example above, ```/document/content/sentiment``` represents a single numeric value, while ```/document/content/organizations/*/description``` represents several organization descriptions.
109+
In the Azure SDK for .NET, use the [OutputFieldMappingEntry](/dotnet/api/azure.search.documents.indexes.models.outputfieldmappingentry) class that provides "Name" and "TargetFieldName" properties and an optional "MappingFunction" reference.
84110

85-
In cases where there are several elements, they are "flattened" into an array that contains each of the elements.
111+
Specify output field mappings when constructing the indexer, or later by directly setting [SearchIndexer.OutputFieldMappings](/dotnet/api/azure.search.documents.indexes.models.searchindexer.outputfieldmappings). The following C# example sets the output field mappings when constructing an indexer.
86112

87-
More concretely, for the ```/document/content/organizations/*/description``` example, the data in the *descriptions* field would look like a flat array of descriptions before it gets indexed:
113+
```csharp
114+
string indexerName = "cog-search-demo";
115+
SearchIndexer indexer = new SearchIndexer(
116+
indexerName,
117+
dataSourceConnectionName,
118+
indexName)
119+
{
120+
// Field mappings omitted for this example (assume default mappings)
121+
OutputFieldMappings =
122+
{
123+
new FieldMapping("/document/content/organizations") { TargetFieldName = "orgNames" },
124+
new FieldMapping("/document/content/sentiment") { TargetFieldName = "sentiment" }
125+
},
126+
SkillsetName = skillsetName
127+
};
88128

129+
await indexerClient.CreateIndexerAsync(indexer);
89130
```
90-
["Microsoft is a company in Seattle","LinkedIn's office is in San Francisco"]
131+
-->
132+
133+
---
134+
135+
<a name="flattening-information-from-complex-types"></a>
136+
137+
## Flatten complex structures into a string collection
138+
139+
If your source data is composed of nested or hierarchical JSON, you can't use field mappings to set up the data paths. Instead, your search index must mirror the source data structure for at each level for a full import.
140+
141+
This section walks you through an import process that produces a one-to-one reflection of a complex document on both the source and target sides. Next, it uses the same source document to illustrate the retrieval and flattening of individual nodes into string collections.
142+
143+
Here's an example of a document in Cosmos DB with nested JSON:
144+
145+
```json
146+
{
147+
"palette":"primary colors",
148+
"colors":[
149+
{
150+
"name":"blue",
151+
"medium":[
152+
"acrylic",
153+
"oil",
154+
"pastel"
155+
]
156+
},
157+
{
158+
"name":"red",
159+
"medium":[
160+
"acrylic",
161+
"pastel",
162+
"watercolor"
163+
]
164+
},
165+
{
166+
"name":"yellow",
167+
"medium":[
168+
"acrylic",
169+
"watercolor"
170+
]
171+
}
172+
]
173+
}
91174
```
92175

93-
This is an important principle, so we will provide another example. Imagine that you have an array of complex types as part of the enrichment tree. Let's say there is a member called customEntities that has an array of complex types like the one described below.
176+
If you wanted to fully index the above source document, you'd create an index definition where the field names, levels, and types are reflected as a complex type. Because field mappings aren't supported for complex types in the search index, your index definition must mirror the source document.
94177

95178
```json
96-
"document/customEntities":
97-
[
98-
{
99-
"name": "heart failure",
100-
"matches": [
101-
{
102-
"text": "heart failure",
103-
"offset": 10,
104-
"length": 12,
105-
"matchDistance": 0.0
106-
}
107-
]
108-
},
109-
{
110-
"name": "morquio",
111-
"matches": [
112-
{
113-
"text": "morquio",
114-
"offset": 25,
115-
"length": 7,
116-
"matchDistance": 0.0
117-
}
118-
]
179+
{
180+
"name": "my-test-index",
181+
"defaultScoringProfile": "",
182+
"fields": [
183+
{ "name": "id", "type": "Edm.String", "searchable": false, "retrievable": true, "key": true},
184+
{ "name": "palette", "type": "Edm.String", "searchable": true, "retrievable": true },
185+
{ "name": "colors", "type": "Collection(Edm.ComplexType)",
186+
"fields": [
187+
{
188+
"name": "name",
189+
"type": "Edm.String",
190+
"searchable": true,
191+
"retrievable": true
192+
},
193+
{
194+
"name": "medium",
195+
"type": "Collection(Edm.String)",
196+
"searchable": true,
197+
"retrievable": true,
198+
}
199+
]
119200
}
120-
//...
121-
]
201+
]
202+
}
122203
```
123204

124-
Let's assume that your index has a field called 'diseases' of type Collection(Edm.String), where you would like to store each of the names of the entities.
205+
Here's a sample indexer definition that executes the import (notice there are no field mappings and no skillset).
125206

126-
This can be done easily by using the "\*" symbol, as follows:
207+
```json
208+
{
209+
"name": "my-test-indexer",
210+
"dataSourceName": "my-test-ds",
211+
"skillsetName": null,
212+
"targetIndexName": "my-test-index",
213+
214+
"fieldMappings": [],
215+
"outputFieldMappings": []
216+
}
217+
```
218+
219+
The result is the following sample search document, similar to the original in Cosmos DB.
127220

128221
```json
129-
"outputFieldMappings": [
222+
{
223+
"value": [
224+
{
225+
"@search.score": 1,
226+
"id": "240a98f5-90c9-406b-a8c8-f50ff86f116c",
227+
"palette": "primary colors",
228+
"colors": [
130229
{
131-
"sourceFieldName": "/document/customEntities/*/name",
132-
"targetFieldName": "diseases"
230+
"name": "blue",
231+
"medium": [
232+
"acrylic",
233+
"oil",
234+
"pastel"
235+
]
236+
},
237+
{
238+
"name": "red",
239+
"medium": [
240+
"acrylic",
241+
"pastel",
242+
"watercolor"
243+
]
244+
},
245+
{
246+
"name": "yellow",
247+
"medium": [
248+
"acrylic",
249+
"watercolor"
250+
]
133251
}
134-
]
252+
]
253+
}
254+
]
255+
}
135256
```
136257

137-
This operation will simply “flatten” each of the names of the customEntities elements into a single array of strings like this:
258+
An alternative rendering in a search index is to flatten individual nodes in the source's nested structure into a string collection in a search index.
259+
260+
To accomplish this task, you'll need an `outputFieldMapping` that maps an in-memory node to a string collection in the index. Although output field mappings primarily apply to skill outputs, you can also use them to address nodes after ["document cracking"](search-indexer-overview.md#stage-1-document-cracking) where the indexer opens a source document and reads it into memory.
261+
262+
Below is a sample index definition in Cognitive Search, using string collections to receive flattened output:
138263

139264
```json
140-
"diseases" : ["heart failure","morquio"]
265+
{
266+
"name": "my-new-flattened-index",
267+
"defaultScoringProfile": "",
268+
"fields": [
269+
{ "name": "id", "type": "Edm.String", "searchable": false, "retrievable": true, "key": true },
270+
{ "name": "palette", "type": "Edm.String", "searchable": true, "retrievable": true },
271+
{ "name": "color_names", "type": "Collection(Edm.String)", "searchable": true, "retrievable": true },
272+
{ "name": "color_mediums", "type": "Collection(Edm.String)", "searchable": true, "retrievable": true}
273+
]
274+
}
141275
```
142276

143-
## See also
277+
Here's the sample indexer definition, using `outputFieldMappings` to associate the nested JSON with the string collection fields. Notice that the source field uses the path syntax for enrichment nodes, even though there's no skillset. Enriched documents are created in the system during document cracking, which means you can access nodes in each document tree as long as those nodes exist when the document is cracked.
278+
279+
```json
280+
{
281+
"name": "my-test-indexer",
282+
"dataSourceName": "my-test-ds",
283+
"skillsetName": null,
284+
"targetIndexName": "my-new-flattened-index",
285+
"parameters": { },
286+
"fieldMappings": [ ],
287+
"outputFieldMappings": [
288+
{
289+
"sourceFieldName": "/document/colors/*/name",
290+
"targetFieldName": "color_names"
291+
},
292+
{
293+
"sourceFieldName": "/document/colors/*/medium",
294+
"targetFieldName": "color_mediums"
295+
}
296+
]
297+
}
298+
```
299+
300+
Results from the above definition are as follows. Simplifying the structure loses context in this case. There's no longer any associations between a given color and the mediums it's available in. However, depending on your scenario, a result similar to the one shown below might be exactly what you need.
301+
302+
```json
303+
{
304+
"value": [
305+
{
306+
"@search.score": 1,
307+
"id": "240a98f5-90c9-406b-a8c8-f50ff86f116c",
308+
"palette": "primary colors",
309+
"color_names": [
310+
"blue",
311+
"red",
312+
"yellow"
313+
],
314+
"color_mediums": [
315+
"[\"acrylic\",\"oil\",\"pastel\"]",
316+
"[\"acrylic\",\"pastel\",\"watercolor\"]",
317+
"[\"acrylic\",\"watercolor\"]"
318+
]
319+
}
320+
]
321+
}
322+
```
144323

145-
* [Search indexes in Azure Cognitive Search](search-what-is-an-index.md).
324+
## See also
146325

147-
* [Define field mappings in a search indexer](search-indexer-field-mappings.md).
326+
+ [Define field mappings in a search indexer](search-indexer-field-mappings.md)
327+
+ [AI enrichment overview](cognitive-search-concept-intro.md)
328+
+ [Skillset overview](cognitive-search-working-with-skillsets.md)

articles/search/search-howto-index-plaintext-blobs.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ author: HeidiSteen
88
ms.author: heidist
99
ms.service: cognitive-search
1010
ms.topic: conceptual
11-
ms.date: 02/01/2021
11+
ms.date: 09/13/2022
1212
---
1313

1414
# How to index plain text blobs and files in Azure Cognitive Search

0 commit comments

Comments
 (0)