Skip to content

Commit 6f8fdd6

Browse files
authored
Merge pull request #186960 from HeidiSteen/heidist-fresh2
acrolinx score
2 parents e02e987 + 8082bae commit 6f8fdd6

File tree

1 file changed

+30
-32
lines changed

1 file changed

+30
-32
lines changed

articles/search/cognitive-search-defining-skillset.md

Lines changed: 30 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -1,37 +1,35 @@
11
---
22
title: Create a skillset
33
titleSuffix: Azure Cognitive Search
4-
description: Define data extraction, natural language processing, or image analysis steps to enrich and extract structured information from your data for use in Azure Cognitive Search.
4+
description: A skillset defines data extraction, natural language processing, and image analysis steps. A skillset is attached to indexer. It's used to enrich and extract information from source data for use in Azure Cognitive Search.
55

66
author: HeidiSteen
77
ms.author: heidist
88
ms.service: cognitive-search
99
ms.topic: conceptual
10-
ms.date: 08/15/2021
10+
ms.date: 01/31/2022
1111
---
1212

1313
# Create a skillset in Azure Cognitive Search
1414

1515
![indexer stages](media/cognitive-search-defining-skillset/indexer-stages-skillset.png "indexer stages")
1616

17-
A skillset defines the operations that extract and enrich data to make it searchable. A skillset executes after document cracking, when text and image content are extracted from source documents, and after any fields from the source document are (optionally) mapped to destination fields in an index or knowledge store.
17+
A skillset defines the operations that extract and enrich data to make it searchable. It executes after text and images are extracted, and after [field mappings](search-indexer-field-mappings.md) are processed.
1818

19-
In this article, you'll learn the steps of creating a skillset. For reference, this article uses the [Create Skillset (REST API)](/rest/api/searchservice/create-skillset).
20-
21-
Some usage rules for skillsets include the following:
19+
This article explains how to create a skillset with the [Create Skillset (REST API)](/rest/api/searchservice/create-skillset). Rules for skillset definition include:
2220

2321
+ A skillset is a top-level resource, which means it can be created once and referenced by many indexers.
2422
+ A skillset must contain at least one skill.
2523
+ A skillset can repeat skills of the same type (for example, multiple Shaper skills).
2624

27-
Recall that indexers drive skillset execution, which means that you will also need to create an [indexer](search-howto-create-indexers.md), [data source](search-data-sources-gallery.md), and [search index](search-what-is-an-index.md) before you can test your skillset.
25+
An indexer drives skillset execution. You need an [indexer](search-howto-create-indexers.md), [data source](search-data-sources-gallery.md), and [search index](search-what-is-an-index.md) before you can test your skillset.
2826

2927
> [!TIP]
3028
> Enable [enrichment caching](cognitive-search-incremental-indexing-conceptual.md) to reuse the content you've already processed and lower the cost of development.
3129
3230
## Skillset definition
3331

34-
Start with the basic structure. In the [REST API](/rest/api/searchservice/create-skillset), a skillset is authored in JSON and has the following sections:
32+
Start with the basic structure. In the [Create Skillset REST API](/rest/api/searchservice/create-skillset), the body of the request is authored in JSON and has the following sections:
3533

3634
```json
3735
{
@@ -61,17 +59,17 @@ Start with the basic structure. In the [REST API](/rest/api/searchservice/create
6159

6260
After the name and description, a skillset has four main properties:
6361

64-
+ `skills` array, an unordered [collection of skills](cognitive-search-predefined-skills.md), for which the search service determines the sequence of execution based on the inputs required for each skill. If skills are independent, they will execute in parallel. Skills can be utilitarian (like splitting text), transformational (based on AI from Cognitive Services), or custom skills that you provide. An example of a skills array is provided in the following section.
62+
+ `skills` array, an unordered [collection of skills](cognitive-search-predefined-skills.md), for which the search service determines the sequence of execution based on the inputs required for each skill. If skills are independent, they execute in parallel. Skills can be utilitarian (like splitting text), transformational (based on AI from Cognitive Services), or custom skills that you provide. An example of a skills array is provided in the next section.
6563

6664
+ `cognitiveServices` is used for [billable skills](cognitive-search-predefined-skills.md) that call Cognitive Services APIs. Remove this section if you aren't using billable skills or Custom Entity Lookup. [Attach a resource](cognitive-search-attach-cognitive-services.md) if you are.
6765

68-
+ `knowledgeStore`, (optional) specifies an Azure Storage account and settings for projecting skillset output into tables, blobs, and files in Azure Storage. Remove this section if you don't need it, otherwise [specify a knowledge store](knowledge-store-create-rest.md).
66+
+ `knowledgeStore` (optional) specifies an Azure Storage account and settings for projecting skillset output into tables, blobs, and files in Azure Storage. Remove this section if you don't need it, otherwise [specify a knowledge store](knowledge-store-create-rest.md).
6967

70-
+ `encryptionKey`, (optional) specifies an Azure Key Vault and [customer-managed keys](search-security-manage-encryption-keys.md) used to encrypt sensitive content in a skillset definition. Remove this property if you aren't using customer-managed encryption.
68+
+ `encryptionKey` (optional) specifies an Azure Key Vault and [customer-managed keys](search-security-manage-encryption-keys.md) used to encrypt sensitive content in a skillset definition. Remove this property if you aren't using customer-managed encryption.
7169

7270
## Add a skills array
7371

74-
Within a skillset definition, the skills array specifies which skills to execute. The following example introduces you to its composition by showing you two unrelated, [built-in skills](cognitive-search-predefined-skills.md). Notice that each skill has a type, context, inputs, and outputs.
72+
Within a skillset definition, the skills array specifies which skills to execute. The following example shows two unrelated, [built-in skills](cognitive-search-predefined-skills.md). Notice that each skill has a type, context, inputs, and outputs.
7573

7674
```json
7775
"skills":[
@@ -113,11 +111,11 @@ Within a skillset definition, the skills array specifies which skills to execute
113111
```
114112

115113
> [!NOTE]
116-
> You can build complex skillsets with looping and branching, using the [Conditional skill](cognitive-search-skill-conditional.md) to create the expressions. The syntax is based on the [JSON Pointer](https://tools.ietf.org/html/rfc6901) path notation, with a few modifications to identify nodes in the enrichment tree. A `"/"` traverses a level lower in the tree and `"*"` acts as a for-each operator in the context. Numerous examples in this article illustrate the syntax.
114+
> You can build complex skillsets with looping and branching using the [Conditional skill](cognitive-search-skill-conditional.md) to create the expressions. The syntax is based on the [JSON Pointer](https://tools.ietf.org/html/rfc6901) path notation, with a few modifications to identify nodes in the enrichment tree. A `"/"` traverses a level lower in the tree and `"*"` acts as a for-each operator in the context. Numerous examples in this article illustrate the syntax.
117115
118116
### How built-in skills are structured
119117

120-
Each skill is unique in terms of its input values and the parameters it takes. The [documentation for each skill](cognitive-search-predefined-skills.md) describes all of the parameters and properties of a given skill. Although there are differences, most skills share a common set and are similarly patterned. To illustrate several points, the [Entity Recognition skill](cognitive-search-skill-entity-recognition-v3.md) provides an example:
118+
Each skill is unique in terms of its input values and the parameters that it takes. The [documentation for each skill](cognitive-search-predefined-skills.md) describes all of the parameters and properties of a given skill. Although there are differences, most skills share a common set and are similarly patterned. To illustrate several points, the [Entity Recognition skill](cognitive-search-skill-entity-recognition-v3.md) provides an example:
121119

122120
```json
123121
{
@@ -144,17 +142,17 @@ Common parameters include "odata.type", "inputs", and "outputs". The other param
144142

145143
+ **"odata.type"** uniquely identifies each skill. You can find the type in the [skill reference documentation](cognitive-search-predefined-skills.md).
146144

147-
+ **"context"** is a node in an enrichment tree and it represents the level at which operations take place. All skills have this property. If the "context" field is not explicitly set, the default context is `"/document"`. In the example, the context is the whole document, meaning that the entity recognition skill is called once per document.
145+
+ **"context"** is a node in an enrichment tree and it represents the level at which operations take place. All skills have this property. If the "context" field is not explicitly set, the default context is `"/document"`. In the example, the context is the whole document, which means that the entity recognition skill is called once per document.
148146

149-
The context also determines where outputs are also produced in the enrichment tree. In this example, the skill returns a property called `"organizations"`, captured as `orgs`, which is added as a child node of `"/document"`. In downstream skills, the path to this newly-created enrichment node is `"/document/orgs"`. For a particular document, the value of `"/document/orgs"` is an array of organizations extracted from the text (for example: `["Microsoft", "LinkedIn"]`). For more information about path syntax, see [Referencing annotations in a skillset](cognitive-search-concept-annotations-syntax.md).
147+
The context also determines where outputs are produced in the enrichment tree. In this example, the skill returns a property called `"organizations"`, captured as `orgs`, which is added as a child node of `"/document"`. In downstream skills, the path to this node is `"/document/orgs"`. For a particular document, the value of `"/document/orgs"` is an array of organizations extracted from the text (for example: `["Microsoft", "LinkedIn"]`). For more information about path syntax, see [How to reference annotations in a skillset](cognitive-search-concept-annotations-syntax.md).
150148

151-
+ **"inputs"** specify the origin of the incoming data and how it will be used. In the case of Entity Recognition, one of the inputs is `"text"`, which is the content to be analyzed for entities. The content is sourced from the `"/document/content"` node in an enrichment tree. In an enrichment tree, `"/document"` is the root node. For documents retrieved using an Azure Blob indexer, the `content` field of each document is a standard field created by the indexer.
149+
+ **"inputs"** specify the origin of the incoming data and how it's used. In the [Entity Recognition](cognitive-search-skill-entity-recognition-v3.md) skill, one of the inputs is `"text"`, which is the content to be analyzed for entities. The content is sourced from the `"/document/content"` node in an enrichment tree. In an enrichment tree, `"/document"` is the root node. For documents retrieved by an Azure Blob indexer, the `content` field of each document is a standard field created by the indexer.
152150

153-
+ **"outputs"** represent the output of the skill. Each skill is designed to emit specific kinds of output, which are referenced by name in the skillset. In the case of Entity Recognition, `"organizations"` is one of the outputs it supports. The documentation for each skill describes the outputs it can produce.
151+
+ **"outputs"** represent the output of the skill. Each skill is designed to emit specific kinds of output, which are referenced by name in the skillset. In Entity Recognition, `"organizations"` is one of the outputs it supports. The documentation for each skill describes the outputs it can produce.
154152

155-
Outputs exist only during processing. To chain this output to a downstream skill's input, reference the output as `"/document/orgs"`. To send output to a field in a search index, [create an output field mapping](cognitive-search-output-field-mapping.md) in an indexer. To send output to a knowledge store, [create a projection](knowledge-store-projection-overview.md).
153+
Outputs exist only during processing. To chain this output to the input of a downstream skill, reference the output as `"/document/orgs"`. To send output to a field in a search index, [create an output field mapping](cognitive-search-output-field-mapping.md) in an indexer. To send output to a knowledge store, [create a projection](knowledge-store-projection-overview.md).
156154

157-
Outputs from the one skill can conflict with outputs from a different skill. If you have multiple skills returning the same output, use the `"targetName"` for name disambiguation in enrichment node paths.
155+
Outputs from the one skill can conflict with outputs from a different skill. If you have multiple skills that return the same output, use the `"targetName"` for name disambiguation in enrichment node paths.
158156

159157
Some situations call for referencing each element of an array separately. For example, suppose you want to pass *each element* of `"/document/orgs"` separately to another skill. To do so, add an asterisk to the path: `"/document/orgs/*"`
160158

@@ -178,13 +176,13 @@ The second skill for sentiment analysis follows the same pattern as the first en
178176
}
179177
```
180178

181-
## Adding a custom skill
179+
## Add a custom skill
182180

183-
Below is an example of a [custom skill](cognitive-search-custom-skill-web-api.md). The URI points to an Azure Function, which in turn invokes the model or transformation that you provide. For more information, see [Define a custom interface](cognitive-search-custom-skill-interface.md).
181+
This section includes an example of a [custom skill](cognitive-search-custom-skill-web-api.md). The URI points to an Azure Function, which in turn invokes the model or transformation that you provide. For more information, see [Define a custom interface](cognitive-search-custom-skill-interface.md).
184182

185183
Although the custom skill is executing code that is external to the pipeline, in a skills array, it's just another skill. Like the built-in skills, it has a type, context, inputs, and outputs. It also reads and writes to an enrichment tree, just as the built-in skills do. Notice that the "context" field is set to `"/document/orgs/*"` with an asterisk, meaning the enrichment step is called *for each* organization under `"/document/orgs"`.
186184

187-
Output, in this case a company description, is generated for each organization identified. When referring to the description in a downstream step (for example, in key phrase extraction), you would use the path `"/document/orgs/*/companyDescription"` to do so.
185+
Output, such as the company description in this example, is generated for each organization that's identified. When referring to the node in a downstream step (for example, in key phrase extraction), you would use the path `"/document/orgs/*/companyDescription"` to do so.
188186

189187
```json
190188
{
@@ -212,29 +210,29 @@ Output, in this case a company description, is generated for each organization i
212210

213211
## Send output to an index
214212

215-
As each skill executes, its output is added as nodes in a document's enrichment tree. Enriched documents exist in the pipeline as temporary data structures. To create a permanent data structure, and gain full visibility into what a skill is actually producing, you will need to send the output to a search index or a [knowledge store](knowledge-store-concept-intro.md).
213+
As each skill executes, its output is added as nodes in a document's enrichment tree. Enriched documents exist in the pipeline as temporary data structures. To create a permanent data structure, and gain full visibility into what a skill is actually producing, send the output to a search index or a [knowledge store](knowledge-store-concept-intro.md).
216214

217-
In the early stages of skillset evaluation, you'll want to check preliminary results with minimal effort. We recommend the search index because it's simpler to set up. For each skill output, [define an output field mapping](cognitive-search-output-field-mapping.md) in the indexer, and a field in the index.
215+
In the early stages of skillset evaluation, checking preliminary results is important. We recommend a search index over a knowledge store because it's simpler to set up. For each skill output, [define an output field mapping](cognitive-search-output-field-mapping.md) in the indexer, and a field in the search index.
218216

219-
:::image type="content" source="media/cognitive-search-defining-skillset/skillset-indexer-index-combo.png" alt-text="Object diagram showing how a persons entity is defined in skill output, indexer field mapping, and index field.":::
217+
:::image type="content" source="media/cognitive-search-defining-skillset/skillset-indexer-index-combo.png" alt-text="Object diagram that shows the person entity as a skill output, indexer field mapping, and index field.":::
220218

221-
After running the indexer, you can use [Search Explorer](search-explorer.md) to return documents from the index and check the contents of each field to determine what the skillset detected or created.
219+
After you run the indexer, use [Search Explorer](search-explorer.md) to return documents from the index and check the contents of each field to determine what the skillset detected or created.
222220

223-
The following example shows the results of an entity recognition skill that detected persons, locations, organizations, and other entities in a chunk of text. Viewing the results in Search Explorer can help you determine whether a skill adds value to your solution.
221+
This screenshot shows the results of an entity recognition skill that detected persons, locations, organizations, and other entities in a chunk of text. You can view the results to decide whether a skill adds value to your solution.
224222

225223
:::image type="content" source="media/cognitive-search-defining-skillset/doc-in-search-explorer.png" alt-text="Screenshot of a document in Search Explorer.":::
226224

227225
## Tips for a first skillset
228226

229-
+ Assemble a representative sample of your content in Blob Storage or another supported indexer data source and run the **Import data** wizard to create the skillset, index, indexer, and data source object.
227+
+ Assemble a representative sample of your content in Blob Storage or another supported data source and run the [**Import data** wizard](search-import-data-portal.md).
230228

231-
The wizard automates several steps that can be challenging the first time around, including defining the fields in an index, defining output filed mappings in an indexer, and projections in a knowledge store if you are using one. For some skills, such as OCR or image analysis, the wizard will add utility skills that merge image and text content that was separated during document cracking.
229+
The wizard automates several steps that can be challenging the first time around. It defines fields in an index, field mappings in an indexer, and projections in a knowledge store if you are using one. For some skills, such as OCR or image analysis, the wizard adds utility skills that merge the image and text content that was separated during document cracking.
232230

233-
+ Alternatively, you can import skill Postman collections that provide full examples of all of the object definitions required to evaluate a skill, from skillset to an index that you can query to view the results of a transformation.
231+
+ Alternatively, you can [import sample Postman collections](https://github.com/Azure-Samples/azure-search-postman-samples) that provide a full articulation of the object definitions required to evaluate a skill.
234232

235233
## Next steps
236234

237-
Context and input source fields are paths to nodes in an enrichment tree. As a next step, learn more about the syntax for setting up paths to nodes in an enrichment tree.
235+
Context and input source fields are paths to nodes in an enrichment tree. As a next step, learn more about the path syntax for nodes in an enrichment tree.
238236

239237
> [!div class="nextstepaction"]
240238
> [Referencing annotations in a skillset](cognitive-search-concept-annotations-syntax.md)

0 commit comments

Comments
 (0)