You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/search/cognitive-search-concept-intro.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -65,7 +65,7 @@ A [skillset](cognitive-search-defining-skillset.md) that's assembled using built
65
65
66
66
+ PDFs with combined image and text. Embedded text can be extracted without AI enrichment, but adding image and language skills can unlock more information than what could be obtained through standard text-based indexing.
67
67
68
-
+ Unstructured or semi-structured documents containing content that has inherent meaning or context that is hidden in the larger document.
68
+
+ Unstructured or semi-structured documents containing content that has inherent meaning or organization that is hidden in the larger document.
69
69
70
70
Blobs in particular often contain a large body of content that is packed into a single "field". By attaching image and natural language processing skills to an indexer, you can create information that is extant in the raw content, but not otherwise surfaced as distinct fields.
Copy file name to clipboardExpand all lines: articles/search/cognitive-search-defining-skillset.md
+29-9Lines changed: 29 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -27,7 +27,7 @@ An indexer drives skillset execution. You need an [indexer](search-howto-create-
27
27
> [!TIP]
28
28
> Enable [enrichment caching](cognitive-search-incremental-indexing-conceptual.md) to reuse the content you've already processed and lower the cost of development.
29
29
30
-
## Skillset definition
30
+
## Add a skillset definition
31
31
32
32
Start with the basic structure. In the [Create Skillset REST API](/rest/api/searchservice/create-skillset), the body of the request is authored in JSON and has the following sections:
33
33
@@ -36,7 +36,7 @@ Start with the basic structure. In the [Create Skillset REST API](/rest/api/sear
36
36
"name":"skillset-template",
37
37
"description":"A description makes the skillset self-documenting (comments aren't allowed in JSON itself)",
@@ -67,9 +67,9 @@ After the name and description, a skillset has four main properties:
67
67
68
68
+`encryptionKey` (optional) specifies an Azure Key Vault and [customer-managed keys](search-security-manage-encryption-keys.md) used to encrypt sensitive content in a skillset definition. Remove this property if you aren't using customer-managed encryption.
69
69
70
-
## Add a skills array
70
+
## Insert a skills array
71
71
72
-
Within a skillset definition, the skills array specifies which skills to execute. The following example shows two unrelated, [built-in skills](cognitive-search-predefined-skills.md). Notice that each skill has a type, context, inputs, and outputs.
72
+
Inside the skillset definition, the skills array specifies which skills to execute. All skills have a type, context, inputs, and outputs. The following example shows two unrelated, [built-in skills](cognitive-search-predefined-skills.md). Notice that each skill has a type, context, inputs, and outputs.
73
73
74
74
```json
75
75
"skills":[
@@ -140,9 +140,9 @@ Each skill is unique in terms of its input values and the parameters that it tak
140
140
141
141
Common parameters include "odata.type", "inputs", and "outputs". The other parameters, namely "categories" and "defaultLanguageCode", are examples of parameters that are specific to Entity Recognition.
142
142
143
-
+**"odata.type"** uniquely identifies each skill. You can find the type in the [skill reference documentation](cognitive-search-predefined-skills.md).
143
+
+**"odata.type"** uniquely identifies each skill. You can find the type in the [skill reference documentation](cognitive-search-predefined-skills.md).
144
144
145
-
+**"context"** is a node in an enrichment tree and it represents the level at which operations take place. All skills have this property. If the "context" field is not explicitly set, the default context is `"/document"`. In the example, the context is the whole document, which means that the entity recognition skill is called once per document.
145
+
+**"context"** is a node in an enrichment tree and it represents the level at which operations take place. All skills have this property. If the "context" field isn't explicitly set, the default context is `"/document"`. In the example, the context is the whole document, which means that the entity recognition skill is called once per document.
146
146
147
147
The context also determines where outputs are produced in the enrichment tree. In this example, the skill returns a property called `"organizations"`, captured as `orgs`, which is added as a child node of `"/document"`. In downstream skills, the path to this node is `"/document/orgs"`. For a particular document, the value of `"/document/orgs"` is an array of organizations extracted from the text (for example: `["Microsoft", "LinkedIn"]`). For more information about path syntax, see [How to reference annotations in a skillset](cognitive-search-concept-annotations-syntax.md).
148
148
@@ -154,9 +154,9 @@ Outputs exist only during processing. To chain this output to the input of a dow
154
154
155
155
Outputs from the one skill can conflict with outputs from a different skill. If you have multiple skills that return the same output, use the `"targetName"` for name disambiguation in enrichment node paths.
156
156
157
-
Some situations call for referencing each element of an array separately. For example, suppose you want to pass *each element* of `"/document/orgs"` separately to another skill. To do so, add an asterisk to the path: `"/document/orgs/*"`
157
+
Some situations call for referencing each element of an array separately. For example, suppose you want to pass *each element* of `"/document/orgs"` separately to another skill. To do so, add an asterisk to the path: `"/document/orgs/*"`.
158
158
159
-
The second skill for sentiment analysis follows the same pattern as the first enricher. It takes `"/document/content"` as input, and returns a sentiment score for each content instance. Since you did not set the "context" field explicitly, the output (mySentiment) is now a child of `"/document"`.
159
+
The second skill for sentiment analysis follows the same pattern as the first enricher. It takes `"/document/content"` as input, and returns a sentiment score for each content instance. Since you didn't set the "context" field explicitly, the output (mySentiment) is now a child of `"/document"`.
160
160
161
161
```json
162
162
{
@@ -176,6 +176,26 @@ The second skill for sentiment analysis follows the same pattern as the first en
176
176
}
177
177
```
178
178
179
+
## Set context and input source
180
+
181
+
1. Set the skill's [context property](cognitive-search-working-with-skillsets.md#context). Context determines the level at which operations take place, and where outputs are produced in the enrichment tree. It's usually one of the following examples:
182
+
183
+
| Context example | Description |
184
+
|-----------------|-------------|
185
+
| "context": "/document" | (Default) Inputs and outputs are at the document level. |
186
+
| "context": "/document/pages/*" | Some skills like sentiment analysis perform better over smaller chunks of text. If you're splitting a large content field into pages or sentences, the context should be over each component part. |
187
+
| "context": "/document/normalized_images/*" | Inputs and outputs are one per image in the parent document. |
188
+
189
+
1. Set the skill's input source to the node that's providing the data to be processed. For text-based skills, it's a field in the document or row that provides text. For image-based skills, the node providing the input is normalized images.
190
+
191
+
| Source example | Description |
192
+
|-----------------|-------------|
193
+
| "source": "/document/content" | For blobs, the source is usually the blob's content property. |
194
+
| "source": "/document/some-named-field" | For text-based skills, such as entity recognition or key phrase extraction, the origin should be a field that contains sufficient text to be analyzed, such as a "description" or "summary". |
195
+
| "source": "/document/normalized_images/*" | For image content, the source is image that's been normalized during document cracking. |
196
+
197
+
If the skill iterates over an array, both context and input source should include `/*` in the correct positions.
198
+
179
199
## Add a custom skill
180
200
181
201
This section includes an example of a [custom skill](cognitive-search-custom-skill-web-api.md). The URI points to an Azure Function, which in turn invokes the model or transformation that you provide. For more information, see [Define a custom interface](cognitive-search-custom-skill-interface.md).
@@ -226,7 +246,7 @@ This screenshot shows the results of an entity recognition skill that detected p
226
246
227
247
+ Assemble a representative sample of your content in Blob Storage or another supported data source and run the [**Import data** wizard](search-import-data-portal.md).
228
248
229
-
The wizard automates several steps that can be challenging the first time around. It defines fields in an index, field mappings in an indexer, and projections in a knowledge store if you are using one. For some skills, such as OCR or image analysis, the wizard adds utility skills that merge the image and text content that was separated during document cracking.
249
+
The wizard automates several steps that can be challenging the first time around. It defines fields in an index, field mappings in an indexer, and projections in a knowledge store if you're using one. For some skills, such as OCR or image analysis, the wizard adds utility skills that merge the image and text content that was separated during document cracking.
230
250
231
251
+ Alternatively, you can [import sample Postman collections](https://github.com/Azure-Samples/azure-search-postman-samples) that provide a full articulation of the object definitions required to evaluate a skill.
Copy file name to clipboardExpand all lines: articles/search/cognitive-search-working-with-skillsets.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -110,7 +110,7 @@ Because a skill's inputs and outputs are reading from and writing to enrichment
110
110
111
111
## Context
112
112
113
-
Each skill has a context, which can be the entire document (`/document`) or a node lower in the tree (`/document/countries/`). A context determines:
113
+
Each skill has a context, which can be the entire document (`/document`) or a node lower in the tree (`/document/countries/*`). A context determines:
114
114
115
115
+ The number of times the skill executes, over a single value (once per field, per document), or for context values of type collection, where adding an `/*` results in skill invocation, once for each instance in the collection.
Azure provides a global [role-based access control (RBAC) authorization system](../role-based-access-control/role-assignments-portal.md) for all services running on the platform. In Cognitive Search, you can:
18
18
19
-
+ Use generally available roles for service administration
19
+
+ Use generally available roles for service administration.
20
20
21
-
+ Use new preview roles for data requests, including creating, loading, and querying indexes
21
+
+ Use new preview roles for data requests, including creating, loading, and querying indexes.
22
22
23
23
Per-user access over search results (sometimes referred to as row-level security or document-level security) is not supported. As a workaround, [create security filters](search-security-trimming-for-azure-search.md) that trim results by user identity, removing documents for which the requestor should not have access.
24
24
@@ -38,13 +38,15 @@ Built-in roles include generally available and preview roles.
38
38
> [!NOTE]
39
39
> Azure resources have the concept of [control plane and data plane](../azure-resource-manager/management/control-plane-and-data-plane.md) categories of operations. In Cognitive Search, "control plane" refers to any operation supported in the [Management REST API](/rest/api/searchmanagement/) or equivalent client libraries. The "data plane" refers to operations against the search service endpoint, such as indexing or queries, or any other operation specified in the [Search REST API](/rest/api/searchservice/) or equivalent client libraries. Most roles apply to just one plane. The exception is Search Service Contributor which supports actions across both.
40
40
41
-
## Preview limitations
41
+
<aname="preview-limitations"></a>
42
42
43
-
+ There are no regional, tier, or pricing restrictions for using Azure RBAC preview , but your search service must be in the Azure public cloud. The preview isn't available in Azure Government, Azure Germany, or Azure China 21Vianet.
43
+
## Preview capabilities and limitations
44
+
45
+
+ Role-based access control for data plane operations, such as creating an index or querying an index, is currently in public preview and available under [supplemental terms of use](https://azure.microsoft.com/support/legal/preview-supplemental-terms/).
44
46
45
-
+This preview capability is available under [supplemental terms of use](https://azure.microsoft.com/support/legal/preview-supplemental-terms/) and should not be rolled into a production environment.
47
+
+There are no regional, tier, or pricing restrictions for using Azure RBAC preview , but your search service must be in the Azure public cloud. The preview isn't available in Azure Government, Azure Germany, or Azure China 21Vianet.
46
48
47
-
+ If a subscription is migrated to a new tenant, the RBAC preview will need to be re-enabled.
49
+
+ If you migrate your Azure subscription to a new tenant, the RBAC preview will need to be re-enabled.
48
50
49
51
+ Adoption of Azure RBAC might increase the latency of some requests. Each unique combination of service resource (index, indexer, etc.) and service principal used on a request will trigger an authorization check. These authorization checks can add up to 200 milliseconds of latency to a request.
0 commit comments