You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/search/cognitive-search-defining-skillset.md
+30-32Lines changed: 30 additions & 32 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,37 +1,35 @@
1
1
---
2
2
title: Create a skillset
3
3
titleSuffix: Azure Cognitive Search
4
-
description: Define data extraction, natural language processing, or image analysis stepsto enrich and extract structured information from your data for use in Azure Cognitive Search.
4
+
description: A skillset defines data extraction, natural language processing, and image analysis steps. A skillset is attached to indexer. It's used to enrich and extract information from source data for use in Azure Cognitive Search.
A skillset defines the operations that extract and enrich data to make it searchable. A skillset executes after document cracking, when text and image content are extracted from source documents, and after any fields from the source document are (optionally) mapped to destination fields in an index or knowledge store.
17
+
A skillset defines the operations that extract and enrich data to make it searchable. It executes after text and images are extracted, and after [field mappings](search-indexer-field-mappings.md) are processed.
18
18
19
-
In this article, you'll learn the steps of creating a skillset. For reference, this article uses the [Create Skillset (REST API)](/rest/api/searchservice/create-skillset).
20
-
21
-
Some usage rules for skillsets include the following:
19
+
This article explains how to create a skillset with the [Create Skillset (REST API)](/rest/api/searchservice/create-skillset). Rules for skillset definition include:
22
20
23
21
+ A skillset is a top-level resource, which means it can be created once and referenced by many indexers.
24
22
+ A skillset must contain at least one skill.
25
23
+ A skillset can repeat skills of the same type (for example, multiple Shaper skills).
26
24
27
-
Recall that indexers drive skillset execution, which means that you will also need to create an [indexer](search-howto-create-indexers.md), [data source](search-data-sources-gallery.md), and [search index](search-what-is-an-index.md) before you can test your skillset.
25
+
An indexer drives skillset execution. You need an [indexer](search-howto-create-indexers.md), [data source](search-data-sources-gallery.md), and [search index](search-what-is-an-index.md) before you can test your skillset.
28
26
29
27
> [!TIP]
30
28
> Enable [enrichment caching](cognitive-search-incremental-indexing-conceptual.md) to reuse the content you've already processed and lower the cost of development.
31
29
32
30
## Skillset definition
33
31
34
-
Start with the basic structure. In the [REST API](/rest/api/searchservice/create-skillset), a skillset is authored in JSON and has the following sections:
32
+
Start with the basic structure. In the [Create Skillset REST API](/rest/api/searchservice/create-skillset), the body of the request is authored in JSON and has the following sections:
35
33
36
34
```json
37
35
{
@@ -61,17 +59,17 @@ Start with the basic structure. In the [REST API](/rest/api/searchservice/create
61
59
62
60
After the name and description, a skillset has four main properties:
63
61
64
-
+`skills` array, an unordered [collection of skills](cognitive-search-predefined-skills.md), for which the search service determines the sequence of execution based on the inputs required for each skill. If skills are independent, they will execute in parallel. Skills can be utilitarian (like splitting text), transformational (based on AI from Cognitive Services), or custom skills that you provide. An example of a skills array is provided in the following section.
62
+
+`skills` array, an unordered [collection of skills](cognitive-search-predefined-skills.md), for which the search service determines the sequence of execution based on the inputs required for each skill. If skills are independent, they execute in parallel. Skills can be utilitarian (like splitting text), transformational (based on AI from Cognitive Services), or custom skills that you provide. An example of a skills array is provided in the next section.
65
63
66
64
+`cognitiveServices` is used for [billable skills](cognitive-search-predefined-skills.md) that call Cognitive Services APIs. Remove this section if you aren't using billable skills or Custom Entity Lookup. [Attach a resource](cognitive-search-attach-cognitive-services.md) if you are.
67
65
68
-
+`knowledgeStore`, (optional) specifies an Azure Storage account and settings for projecting skillset output into tables, blobs, and files in Azure Storage. Remove this section if you don't need it, otherwise [specify a knowledge store](knowledge-store-create-rest.md).
66
+
+`knowledgeStore` (optional) specifies an Azure Storage account and settings for projecting skillset output into tables, blobs, and files in Azure Storage. Remove this section if you don't need it, otherwise [specify a knowledge store](knowledge-store-create-rest.md).
69
67
70
-
+`encryptionKey`, (optional) specifies an Azure Key Vault and [customer-managed keys](search-security-manage-encryption-keys.md) used to encrypt sensitive content in a skillset definition. Remove this property if you aren't using customer-managed encryption.
68
+
+`encryptionKey` (optional) specifies an Azure Key Vault and [customer-managed keys](search-security-manage-encryption-keys.md) used to encrypt sensitive content in a skillset definition. Remove this property if you aren't using customer-managed encryption.
71
69
72
70
## Add a skills array
73
71
74
-
Within a skillset definition, the skills array specifies which skills to execute. The following example introduces you to its composition by showing you two unrelated, [built-in skills](cognitive-search-predefined-skills.md). Notice that each skill has a type, context, inputs, and outputs.
72
+
Within a skillset definition, the skills array specifies which skills to execute. The following example shows two unrelated, [built-in skills](cognitive-search-predefined-skills.md). Notice that each skill has a type, context, inputs, and outputs.
75
73
76
74
```json
77
75
"skills":[
@@ -113,11 +111,11 @@ Within a skillset definition, the skills array specifies which skills to execute
113
111
```
114
112
115
113
> [!NOTE]
116
-
> You can build complex skillsets with looping and branching, using the [Conditional skill](cognitive-search-skill-conditional.md) to create the expressions. The syntax is based on the [JSON Pointer](https://tools.ietf.org/html/rfc6901) path notation, with a few modifications to identify nodes in the enrichment tree. A `"/"` traverses a level lower in the tree and`"*"` acts as a for-each operator in the context. Numerous examples in this article illustrate the syntax.
114
+
> You can build complex skillsets with looping and branching using the [Conditional skill](cognitive-search-skill-conditional.md) to create the expressions. The syntax is based on the [JSON Pointer](https://tools.ietf.org/html/rfc6901) path notation, with a few modifications to identify nodes in the enrichment tree. A `"/"` traverses a level lower in the tree and `"*"` acts as a for-each operator in the context. Numerous examples in this article illustrate the syntax.
117
115
118
116
### How built-in skills are structured
119
117
120
-
Each skill is unique in terms of its input values and the parameters it takes. The [documentation for each skill](cognitive-search-predefined-skills.md) describes all of the parameters and properties of a given skill. Although there are differences, most skills share a common set and are similarly patterned. To illustrate several points, the [Entity Recognition skill](cognitive-search-skill-entity-recognition-v3.md) provides an example:
118
+
Each skill is unique in terms of its input values and the parameters that it takes. The [documentation for each skill](cognitive-search-predefined-skills.md) describes all of the parameters and properties of a given skill. Although there are differences, most skills share a common set and are similarly patterned. To illustrate several points, the [Entity Recognition skill](cognitive-search-skill-entity-recognition-v3.md) provides an example:
121
119
122
120
```json
123
121
{
@@ -144,17 +142,17 @@ Common parameters include "odata.type", "inputs", and "outputs". The other param
144
142
145
143
+**"odata.type"** uniquely identifies each skill. You can find the type in the [skill reference documentation](cognitive-search-predefined-skills.md).
146
144
147
-
+**"context"** is a node in an enrichment tree and it represents the level at which operations take place. All skills have this property. If the "context" field is not explicitly set, the default context is `"/document"`. In the example, the context is the whole document, meaning that the entity recognition skill is called once per document.
145
+
+**"context"** is a node in an enrichment tree and it represents the level at which operations take place. All skills have this property. If the "context" field is not explicitly set, the default context is `"/document"`. In the example, the context is the whole document, which means that the entity recognition skill is called once per document.
148
146
149
-
The context also determines where outputs are also produced in the enrichment tree. In this example, the skill returns a property called `"organizations"`, captured as `orgs`, which is added as a child node of `"/document"`. In downstream skills, the path to this newly-created enrichment node is `"/document/orgs"`. For a particular document, the value of `"/document/orgs"` is an array of organizations extracted from the text (for example: `["Microsoft", "LinkedIn"]`). For more information about path syntax, see [Referencing annotations in a skillset](cognitive-search-concept-annotations-syntax.md).
147
+
The context also determines where outputs are produced in the enrichment tree. In this example, the skill returns a property called `"organizations"`, captured as `orgs`, which is added as a child node of `"/document"`. In downstream skills, the path to this node is `"/document/orgs"`. For a particular document, the value of `"/document/orgs"` is an array of organizations extracted from the text (for example: `["Microsoft", "LinkedIn"]`). For more information about path syntax, see [How to reference annotations in a skillset](cognitive-search-concept-annotations-syntax.md).
150
148
151
-
+**"inputs"** specify the origin of the incoming data and how it will be used. In the case of Entity Recognition, one of the inputs is `"text"`, which is the content to be analyzed for entities. The content is sourced from the `"/document/content"` node in an enrichment tree. In an enrichment tree, `"/document"` is the root node. For documents retrieved using an Azure Blob indexer, the `content` field of each document is a standard field created by the indexer.
149
+
+**"inputs"** specify the origin of the incoming data and how it's used. In the [Entity Recognition](cognitive-search-skill-entity-recognition-v3.md) skill, one of the inputs is `"text"`, which is the content to be analyzed for entities. The content is sourced from the `"/document/content"` node in an enrichment tree. In an enrichment tree, `"/document"` is the root node. For documents retrieved by an Azure Blob indexer, the `content` field of each document is a standard field created by the indexer.
152
150
153
-
+**"outputs"** represent the output of the skill. Each skill is designed to emit specific kinds of output, which are referenced by name in the skillset. In the case of Entity Recognition, `"organizations"` is one of the outputs it supports. The documentation for each skill describes the outputs it can produce.
151
+
+**"outputs"** represent the output of the skill. Each skill is designed to emit specific kinds of output, which are referenced by name in the skillset. In Entity Recognition, `"organizations"` is one of the outputs it supports. The documentation for each skill describes the outputs it can produce.
154
152
155
-
Outputs exist only during processing. To chain this output to a downstream skill's input, reference the output as `"/document/orgs"`. To send output to a field in a search index, [create an output field mapping](cognitive-search-output-field-mapping.md) in an indexer. To send output to a knowledge store, [create a projection](knowledge-store-projection-overview.md).
153
+
Outputs exist only during processing. To chain this output to the input of a downstream skill, reference the output as `"/document/orgs"`. To send output to a field in a search index, [create an output field mapping](cognitive-search-output-field-mapping.md) in an indexer. To send output to a knowledge store, [create a projection](knowledge-store-projection-overview.md).
156
154
157
-
Outputs from the one skill can conflict with outputs from a different skill. If you have multiple skills returning the same output, use the `"targetName"` for name disambiguation in enrichment node paths.
155
+
Outputs from the one skill can conflict with outputs from a different skill. If you have multiple skills that return the same output, use the `"targetName"` for name disambiguation in enrichment node paths.
158
156
159
157
Some situations call for referencing each element of an array separately. For example, suppose you want to pass *each element* of `"/document/orgs"` separately to another skill. To do so, add an asterisk to the path: `"/document/orgs/*"`
160
158
@@ -178,13 +176,13 @@ The second skill for sentiment analysis follows the same pattern as the first en
178
176
}
179
177
```
180
178
181
-
## Adding a custom skill
179
+
## Add a custom skill
182
180
183
-
Below is an example of a [custom skill](cognitive-search-custom-skill-web-api.md). The URI points to an Azure Function, which in turn invokes the model or transformation that you provide. For more information, see [Define a custom interface](cognitive-search-custom-skill-interface.md).
181
+
This section includes an example of a [custom skill](cognitive-search-custom-skill-web-api.md). The URI points to an Azure Function, which in turn invokes the model or transformation that you provide. For more information, see [Define a custom interface](cognitive-search-custom-skill-interface.md).
184
182
185
183
Although the custom skill is executing code that is external to the pipeline, in a skills array, it's just another skill. Like the built-in skills, it has a type, context, inputs, and outputs. It also reads and writes to an enrichment tree, just as the built-in skills do. Notice that the "context" field is set to `"/document/orgs/*"` with an asterisk, meaning the enrichment step is called *for each* organization under `"/document/orgs"`.
186
184
187
-
Output, in this case a company description, is generated for each organization identified. When referring to the description in a downstream step (for example, in key phrase extraction), you would use the path `"/document/orgs/*/companyDescription"` to do so.
185
+
Output, such as the company description in this example, is generated for each organization that's identified. When referring to the node in a downstream step (for example, in key phrase extraction), you would use the path `"/document/orgs/*/companyDescription"` to do so.
188
186
189
187
```json
190
188
{
@@ -212,29 +210,29 @@ Output, in this case a company description, is generated for each organization i
212
210
213
211
## Send output to an index
214
212
215
-
As each skill executes, its output is added as nodes in a document's enrichment tree. Enriched documents exist in the pipeline as temporary data structures. To create a permanent data structure, and gain full visibility into what a skill is actually producing, you will need to send the output to a search index or a [knowledge store](knowledge-store-concept-intro.md).
213
+
As each skill executes, its output is added as nodes in a document's enrichment tree. Enriched documents exist in the pipeline as temporary data structures. To create a permanent data structure, and gain full visibility into what a skill is actually producing, send the output to a search index or a [knowledge store](knowledge-store-concept-intro.md).
216
214
217
-
In the early stages of skillset evaluation, you'll want to check preliminary results with minimal effort. We recommend the search index because it's simpler to set up. For each skill output, [define an output field mapping](cognitive-search-output-field-mapping.md) in the indexer, and a field in the index.
215
+
In the early stages of skillset evaluation, checking preliminary results is important. We recommend a search index over a knowledge store because it's simpler to set up. For each skill output, [define an output field mapping](cognitive-search-output-field-mapping.md) in the indexer, and a field in the search index.
218
216
219
-
:::image type="content" source="media/cognitive-search-defining-skillset/skillset-indexer-index-combo.png" alt-text="Object diagram showing how a persons entity is defined in skill output, indexer field mapping, and index field.":::
217
+
:::image type="content" source="media/cognitive-search-defining-skillset/skillset-indexer-index-combo.png" alt-text="Object diagram that shows the person entity as a skill output, indexer field mapping, and index field.":::
220
218
221
-
After running the indexer, you can use [Search Explorer](search-explorer.md) to return documents from the index and check the contents of each field to determine what the skillset detected or created.
219
+
After you run the indexer, use [Search Explorer](search-explorer.md) to return documents from the index and check the contents of each field to determine what the skillset detected or created.
222
220
223
-
The following example shows the results of an entity recognition skill that detected persons, locations, organizations, and other entities in a chunk of text. Viewing the results in Search Explorer can help you determine whether a skill adds value to your solution.
221
+
This screenshot shows the results of an entity recognition skill that detected persons, locations, organizations, and other entities in a chunk of text. You can view the results to decide whether a skill adds value to your solution.
224
222
225
223
:::image type="content" source="media/cognitive-search-defining-skillset/doc-in-search-explorer.png" alt-text="Screenshot of a document in Search Explorer.":::
226
224
227
225
## Tips for a first skillset
228
226
229
-
+ Assemble a representative sample of your content in Blob Storage or another supported indexer data source and run the **Import data** wizard to create the skillset, index, indexer, and data source object.
227
+
+ Assemble a representative sample of your content in Blob Storage or another supported data source and run the [**Import data** wizard](search-import-data-portal.md).
230
228
231
-
The wizard automates several steps that can be challenging the first time around, including defining the fields in an index, defining output filed mappings in an indexer, and projections in a knowledge store if you are using one. For some skills, such as OCR or image analysis, the wizard will add utility skills that merge image and text content that was separated during document cracking.
229
+
The wizard automates several steps that can be challenging the first time around. It defines fields in an index, field mappings in an indexer, and projections in a knowledge store if you are using one. For some skills, such as OCR or image analysis, the wizard adds utility skills that merge the image and text content that was separated during document cracking.
232
230
233
-
+ Alternatively, you can import skill Postman collections that provide full examples of all of the object definitions required to evaluate a skill, from skillset to an index that you can query to view the results of a transformation.
231
+
+ Alternatively, you can [import sample Postman collections](https://github.com/Azure-Samples/azure-search-postman-samples) that provide a full articulation of the object definitions required to evaluate a skill.
234
232
235
233
## Next steps
236
234
237
-
Context and input source fields are paths to nodes in an enrichment tree. As a next step, learn more about the syntax for setting up paths to nodes in an enrichment tree.
235
+
Context and input source fields are paths to nodes in an enrichment tree. As a next step, learn more about the path syntax for nodes in an enrichment tree.
238
236
239
237
> [!div class="nextstepaction"]
240
238
> [Referencing annotations in a skillset](cognitive-search-concept-annotations-syntax.md)
0 commit comments