Skip to content

Commit 08a439d

Browse files
committed
attach cog svc, reference annotation
1 parent aa2d928 commit 08a439d

File tree

5 files changed

+44
-15
lines changed

5 files changed

+44
-15
lines changed

articles/search/cognitive-search-attach-cognitive-services.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ author: HeidiSteen
77
ms.author: heidist
88
ms.service: cognitive-search
99
ms.topic: how-to
10-
ms.date: 12/09/2021
10+
ms.date: 09/16/2022
1111

1212
---
1313

@@ -28,7 +28,7 @@ A multi-service resource references "Cognitive Services" as the offering, rather
2828

2929
You can use the Azure portal, REST API, or an Azure SDK to attach a billable resource to a skillset.
3030

31-
If you leave the property unspecified, execution of billable skills will stop at 20 transactions per indexer invocation and a "Time Out" message will appear in indexer execution history.
31+
If you leave the property unspecified, your search service will attempt to use the free enrichments available to your indexer on a daily basis. Execution of billable skills will stop at 20 transactions per indexer invocation and a "Time Out" message will appear in indexer execution history.
3232

3333
### [**Azure portal**](#tab/portal)
3434

@@ -120,7 +120,7 @@ Key-based billing applies when API calls to Cognitive Services resources exceed
120120

121121
The key is used for billing, but not connections. For connections, a search service [connects over the internal network](search-security-overview.md#internal-traffic) to a Cognitive Services resource that's co-located in the [same physical region](https://azure.microsoft.com/global-infrastructure/services/?products=search). Most regions that offer Cognitive Search also offer Cognitive Services.
122122

123-
If you attempt AI enrichment in a region that doesn't have both services, you'll see this message: "Provided key is not a valid CognitiveServices type key for the region of your search service."
123+
If you attempt AI enrichment in a region that doesn't have both services, you'll see this message: "Provided key isn't a valid CognitiveServices type key for the region of your search service."
124124

125125
> [!NOTE]
126126
> Some built-in skills are based on non-regional Cognitive Services (for example, the [Text Translation Skill](cognitive-search-skill-text-translation.md)). Using a non-regional skill means that your request might be serviced in a region other than the Azure Cognitive Search region. For more information on non-regional services, see the [Cognitive Services product by region](https://aka.ms/allinoneregioninfo) page.
@@ -135,9 +135,9 @@ AI enrichment offers a small quantity of free processing of billable enrichment
135135

136136
Some enrichments are always free:
137137

138-
+ Utility skills that do not call Cognitive Services (namely, [Conditional](cognitive-search-skill-conditional.md), [Document Extraction](cognitive-search-skill-document-extraction.md), [Shaper](cognitive-search-skill-shaper.md), [Text Merge](cognitive-search-skill-textmerger.md), and [Text Split skills](cognitive-search-skill-textsplit.md)) are not billable.
138+
+ Utility skills that don't call Cognitive Services (namely, [Conditional](cognitive-search-skill-conditional.md), [Document Extraction](cognitive-search-skill-document-extraction.md), [Shaper](cognitive-search-skill-shaper.md), [Text Merge](cognitive-search-skill-textmerger.md), and [Text Split skills](cognitive-search-skill-textsplit.md)) aren't billable.
139139

140-
+ Text extraction from PDF documents and other application files is non-billable. Text extraction occurs during the [document cracking](search-indexer-overview.md#document-cracking) phase and is not an enrichment per se, but it occurs during AI enrichment and is thus noted here.
140+
+ Text extraction from PDF documents and other application files is non-billable. Text extraction occurs during the [document cracking](search-indexer-overview.md#document-cracking) phase and isn't an enrichment in itself, but it occurs during AI enrichment and is thus noted here.
141141

142142
## Billable enrichments
143143

articles/search/cognitive-search-concept-annotations-syntax.md

Lines changed: 37 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,27 @@
11
---
22
title: Reference inputs and outputs in skillsets
33
titleSuffix: Azure Cognitive Search
4-
description: Explains the annotation syntax and how to reference an annotation in the inputs and outputs of a skillset in an AI enrichment pipeline in Azure Cognitive Search.
4+
description: Explains the annotation syntax and how to reference inputs and outputs of a skillset in an AI enrichment pipeline in Azure Cognitive Search.
55

66
author: HeidiSteen
77
ms.author: heidist
88
ms.service: cognitive-search
99
ms.topic: conceptual
10-
ms.date: 09/24/2021
10+
ms.date: 09/16/2022
1111
---
12-
# Reference annotations in an Azure Cognitive Search skillset
12+
# Reference an annotation in an Azure Cognitive Search skillset
1313

14-
In this article, you learn how to reference annotations in skill definitions, using examples to illustrate various scenarios. As the content of a document flows through a set of skills, it gets enriched with annotations. Annotations can be used as inputs for further downstream enrichment, or mapped to an output field in an index.
15-
16-
Examples in this article are based on the *content* field generated automatically by [Azure Blob indexers](search-howto-indexing-azure-blob-storage.md) as part of the [document cracking](search-indexer-overview.md#document-cracking) phase. When referring to documents from a Blob container, use a format such as `"/document/content"`, where the *content* field is part of the *document*.
14+
In this article, you'll learn how to reference *annotations* (or an enrichment node) in skill definitions, using examples to illustrate various scenarios. Skills read inputs and write outputs to nodes in an [enriched document](cognitive-search-working-with-skillsets#enrichment-tree) tree, building the tree as the enrichments progress. Any node can be used as an input for further downstream enrichment, or mapped to an output field in an index. This article introduces the syntax and provides examples for specifying a path. For the full syntax, see [Skill context and input annotation language language](cognitive-search-skill-annotation-language.md).
15+
16+
Paths to an annotation are specified in the "context" and "source" properties:
17+
18+
:::image type="content" source="media/cognitive-search-annotations-syntax/content-source-annotation-path.png" alt-text="Screenshot of a skillset definition with context and source elements highlighted.":::
19+
20+
The example in the screenshot is for an item in a Cosmos DB collection.
21+
22+
+ "context" is `/document/HotelId` because the collection is partitioned into documents by the `/HotelId` field. For a document in a Cosmos DB collection, it's also the root node of the enrichment document.
23+
24+
+ "source" is `/document/Description` because the skill is a translation skill, and the field that you'll want the skill to translate is the `Description` field in each document.
1725

1826
## Background concepts
1927

@@ -25,7 +33,21 @@ Before reviewing the syntax, let's revisit a few important concepts to better un
2533
| "annotation" | Within an enriched document, a node that is created and populated by a skill, such as "text" and "layoutText" in the OCR skill, is called an annotation. An enriched document is populated with both annotations and unchanged field values or metadata copied from the source. |
2634
| "context" | The context in which the enrichment takes place, in terms of which element or component of the document is enriched. By default, the enrichment context is at the `"/document"` level, scoped to individual documents contained in the data source. When a skill runs, the outputs of that skill become [properties of the defined context](#example-2). |
2735

36+
## Root nodes and context
37+
38+
An enriched document is created in the "document cracking" stage of indexer execution, when the indexer opens a document or reads in a row from the data source. Initially, the only node in an enriched document is the [root node (`/document`)](cognitive-search-skill-annotation-language.md#document-root), and it's the node from which all other enrichments occur.
39+
40+
The following tables shows several well-known paths:
41+
42+
+ `/document` is the root node and indicates an entire blob in Azure Storage, or a row in SQL table.
43+
+ `/document/content` is the "content" property of a JSON blob.
44+
+ `/document/pages/*` or `/document/sentences/*` become the context if you're breaking a large document into smaller chunks for processing.
45+
+ `/document/normalized_images/*` is created during document cracking if the document contains images. All paths to images start with normalized_images.
46+
47+
Examples in this article are based on the *content* field generated automatically by [Azure Blob indexers](search-howto-indexing-azure-blob-storage.md) as part of the [document cracking](search-indexer-overview.md#document-cracking) phase. When referring to documents from a Blob container, use a format such as `"/document/content"`, where the *content* field is part of the *document*.
48+
2849
<a name="example-1"></a>
50+
2951
## Example 1: Simple annotation reference
3052

3153
In Azure Blob Storage, suppose you have a variety of files containing references to people's names that you want to extract using entity recognition. In the skill definition below, `"/document/content"` is the textual representation of the entire document, and "people" is an extraction of full names for entities identified as persons.
@@ -58,7 +80,7 @@ Because the default context is `"/document"`, the list of people can now be refe
5880

5981
This example builds on the previous one, showing you how to invoke an enrichment step multiple times over the same document. Assume the previous example generated an array of strings with 10 people names from a single document. A reasonable next step might be a second enrichment that extracts the last name from a full name. Because there are 10 names, you want this step to be called 10 times in this document, once for each person.
6082

61-
To invoke the right number of iterations, set the context as `"/document/people/*"`, where the asterisk (`"*"`) represents all the nodes in the enriched document as descendants of `"/document/people"`. Although this skill is only defined once in the skills array, it is called for each member within the document until all members are processed.
83+
To invoke the right number of iterations, set the context as `"/document/people/*"`, where the asterisk (`"*"`) represents all the nodes in the enriched document as descendants of `"/document/people"`. Although this skill is only defined once in the skills array, it's called for each member within the document until all members are processed.
6284

6385
```json
6486
{
@@ -90,7 +112,7 @@ When annotations are arrays or collections of strings, you might want to target
90112

91113
Sometimes you need to group all annotations of a particular type to pass them to a particular skill. Consider a hypothetical custom skill that identifies the most common last name from all the last names extracted in Example 2. To provide just the last names to the custom skill, specify the context as `"/document"` and the input as `"/document/people/*/lastname"`.
92114

93-
Notice that the cardinality of `"/document/people/*/lastname"` is larger than that of document. There may be 10 lastname nodes while there is only one document node for this document. In that case, the system will automatically create an array of `"/document/people/*/lastname"` containing all of the elements in the document.
115+
Notice that the cardinality of `"/document/people/*/lastname"` is larger than that of document. There may be 10 lastname nodes while there's only one document node for this document. In that case, the system will automatically create an array of `"/document/people/*/lastname"` containing all of the elements in the document.
94116

95117
```json
96118
{
@@ -113,9 +135,16 @@ Notice that the cardinality of `"/document/people/*/lastname"` is larger than th
113135
}
114136
```
115137

138+
## Tips for annotation path troubleshooting
116139

140+
If you're having trouble with specifying skill inputs, these tips might help you move forward:
141+
142+
+ [Run the Import data wizard](search-import-data-portal.md) over your data to review the skillset definitions and field mappings that the wizard generates.
143+
144+
+ [Start a debug session](cognitive-search-how-to-debug-skillset.md) on a skillset to view the structure of an enriched document. You can edit the paths and other parts of the skill definition, and then run the skill to validate your changes.
117145

118146
## See also
147+
119148
+ [Skill context and input annotation language](cognitive-search-skill-annotation-language.md)
120149
+ [How to integrate a custom skill into an enrichment pipeline](cognitive-search-custom-skill-interface.md)
121150
+ [How to define a skillset](cognitive-search-defining-skillset.md)

articles/search/cognitive-search-working-with-skillsets.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -129,7 +129,7 @@ An enriched document exists for the duration of skillset execution, but can be [
129129

130130
Initially, an enriched document is simply the content extracted from a data source during [*document cracking*](search-indexer-overview.md#document-cracking), where text and images are extracted from the source and made available for language or image analysis.
131131

132-
The initial content is metadata and the *root node* (`document\content`). The root node is usually a whole document or a normalized image that is extracted from a data source during document cracking. How it's articulated in an enrichment tree varies for each data source type. The following table shows the state of a document entering into the enrichment pipeline for several supported data sources:
132+
The initial content is metadata and the *root node* (`document/content`). The root node is usually a whole document or a normalized image that is extracted from a data source during document cracking. How it's articulated in an enrichment tree varies for each data source type. The following table shows the state of a document entering into the enrichment pipeline for several supported data sources:
133133

134134
|Data Source\Parsing Mode|Default|JSON, JSON Lines & CSV|
135135
|---|---|---|
12.4 KB
Loading

articles/search/search-modeling-multitenant-saas-applications.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -84,7 +84,7 @@ In the case of a multitenant scenario, the application developer consumes one or
8484

8585
## Model 1: One index per tenant
8686

87-
:::image type="content" source="media/search-modeling-multitenant-saas-applications/azure-search-index-per-tenant.png" alt-text="A portrayal of the index-per-tenant model" border="false":::
87+
:::image type="content" source="media/search-modeling-multitenant-saas-applications/azure-search-index-per-tenant.png" alt-text="A portrayal of the index-per-tenant model" border="false":::
8888

8989
In an index-per-tenant model, multiple tenants occupy a single Azure Cognitive Search service where each tenant has their own index.
9090

0 commit comments

Comments
 (0)