Skip to content

Commit 628a330

Browse files
authored
Merge pull request #211506 from HeidiSteen/heidist-fresh
[azure search] Freshness pass over attach cog-svc, reference annotation
2 parents b00f781 + 52a7e03 commit 628a330

File tree

4 files changed

+50
-17
lines changed

4 files changed

+50
-17
lines changed

articles/search/cognitive-search-attach-cognitive-services.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ author: HeidiSteen
77
ms.author: heidist
88
ms.service: cognitive-search
99
ms.topic: how-to
10-
ms.date: 12/09/2021
10+
ms.date: 09/16/2022
1111

1212
---
1313

@@ -28,7 +28,7 @@ A multi-service resource references "Cognitive Services" as the offering, rather
2828

2929
You can use the Azure portal, REST API, or an Azure SDK to attach a billable resource to a skillset.
3030

31-
If you leave the property unspecified, execution of billable skills will stop at 20 transactions per indexer invocation and a "Time Out" message will appear in indexer execution history.
31+
If you leave the property unspecified, your search service will attempt to use the free enrichments available to your indexer on a daily basis. Execution of billable skills will stop at 20 transactions per indexer invocation and a "Time Out" message will appear in indexer execution history.
3232

3333
### [**Azure portal**](#tab/portal)
3434

@@ -120,7 +120,7 @@ Key-based billing applies when API calls to Cognitive Services resources exceed
120120

121121
The key is used for billing, but not connections. For connections, a search service [connects over the internal network](search-security-overview.md#internal-traffic) to a Cognitive Services resource that's co-located in the [same physical region](https://azure.microsoft.com/global-infrastructure/services/?products=search). Most regions that offer Cognitive Search also offer Cognitive Services.
122122

123-
If you attempt AI enrichment in a region that doesn't have both services, you'll see this message: "Provided key is not a valid CognitiveServices type key for the region of your search service."
123+
If you attempt AI enrichment in a region that doesn't have both services, you'll see this message: "Provided key isn't a valid CognitiveServices type key for the region of your search service."
124124

125125
> [!NOTE]
126126
> Some built-in skills are based on non-regional Cognitive Services (for example, the [Text Translation Skill](cognitive-search-skill-text-translation.md)). Using a non-regional skill means that your request might be serviced in a region other than the Azure Cognitive Search region. For more information on non-regional services, see the [Cognitive Services product by region](https://aka.ms/allinoneregioninfo) page.
@@ -135,17 +135,17 @@ AI enrichment offers a small quantity of free processing of billable enrichment
135135

136136
Some enrichments are always free:
137137

138-
+ Utility skills that do not call Cognitive Services (namely, [Conditional](cognitive-search-skill-conditional.md), [Document Extraction](cognitive-search-skill-document-extraction.md), [Shaper](cognitive-search-skill-shaper.md), [Text Merge](cognitive-search-skill-textmerger.md), and [Text Split skills](cognitive-search-skill-textsplit.md)) are not billable.
138+
+ Utility skills that don't call Cognitive Services (namely, [Conditional](cognitive-search-skill-conditional.md), [Document Extraction](cognitive-search-skill-document-extraction.md), [Shaper](cognitive-search-skill-shaper.md), [Text Merge](cognitive-search-skill-textmerger.md), and [Text Split skills](cognitive-search-skill-textsplit.md)) aren't billable.
139139

140-
+ Text extraction from PDF documents and other application files is non-billable. Text extraction occurs during the [document cracking](search-indexer-overview.md#document-cracking) phase and is not an enrichment per se, but it occurs during AI enrichment and is thus noted here.
140+
+ Text extraction from PDF documents and other application files is non-billable. Text extraction occurs during the [document cracking](search-indexer-overview.md#document-cracking) phase and isn't technically an enrichment, but it occurs during AI enrichment and is thus noted here.
141141

142142
## Billable enrichments
143143

144144
During AI enrichment, Cognitive Search calls the Cognitive Services APIs for [built-in skills](cognitive-search-predefined-skills.md) that are based on Computer Vision, Translator, and Azure Cognitive Services for Language.
145145

146-
Billable built-in skills that make backend calls to Cognitive Services include [Entity Linking](cognitive-search-skill-entity-linking-v3.md), [Entity Recognition](cognitive-search-skill-entity-recognition-v3.md), [Image Analysis](cognitive-search-skill-image-analysis.md), [Key Phrase Extraction](cognitive-search-skill-keyphrases.md), [Language Detection](cognitive-search-skill-language-detection.md), [OCR](cognitive-search-skill-ocr.md), [PII Detection](cognitive-search-skill-pii-detection.md), [Sentiment](cognitive-search-skill-sentiment-v3.md), and [Text Translation](cognitive-search-skill-text-translation.md).
146+
Billable built-in skills that make backend calls to Cognitive Services include [Entity Linking](cognitive-search-skill-entity-linking-v3.md), [Entity Recognition](cognitive-search-skill-entity-recognition-v3.md), [Image Analysis](cognitive-search-skill-image-analysis.md), [Key Phrase Extraction](cognitive-search-skill-keyphrases.md), [Language Detection](cognitive-search-skill-language-detection.md), [OCR](cognitive-search-skill-ocr.md), [Personally Identifiable Information (PII) Detection](cognitive-search-skill-pii-detection.md), [Sentiment](cognitive-search-skill-sentiment-v3.md), and [Text Translation](cognitive-search-skill-text-translation.md).
147147

148-
Image extraction is an Azure Cognitive Search operation that occurs when documents are cracked prior to enrichment. Image extraction is billable on all tiers, with the exception of 20 free daily extractions on the free tier. Image extraction costs apply to image files inside blobs, embedded images in other files (PDF and other app files), and for images extracted using [Document Extraction](cognitive-search-skill-document-extraction.md). For image extraction pricing, see the [Azure Cognitive Search pricing page](https://azure.microsoft.com/pricing/details/search/).
148+
Image extraction is an Azure Cognitive Search operation that occurs when documents are cracked prior to enrichment. Image extraction is billable on all tiers, except for 20 free daily extractions on the free tier. Image extraction costs apply to image files inside blobs, embedded images in other files (PDF and other app files), and for images extracted using [Document Extraction](cognitive-search-skill-document-extraction.md). For image extraction pricing, see the [Azure Cognitive Search pricing page](https://azure.microsoft.com/pricing/details/search/).
149149

150150
> [!TIP]
151151
> To lower the cost of skillset processing, enable [incremental enrichment (preview)](cognitive-search-incremental-indexing-conceptual.md) to cache and reuse any enrichments that are unaffected by changes made to a skillset. Caching requires Azure Storage (see [pricing](https://azure.microsoft.com/pricing/details/storage/blobs/) but the cumulative cost of skillset execution is lower if existing enrichments can be reused, especially for skillsets that use image extraction and analysis.
@@ -167,7 +167,7 @@ The prices shown in this article are hypothetical. They're used to illustrate th
167167

168168
1. For OCR of 6,000 images in English, the OCR cognitive skill uses the best algorithm (DescribeText). Assuming a cost of $2.50 per 1,000 images to be analyzed, you would pay $15.00 for this step.
169169

170-
1. For entity extraction, you'd have a total of three text records per page. Each record is 1,000 characters. Three text records per page multiplied by 6,000 pages equals 18,000 text records. Assuming $2.00 per 1,000 text records, this step would cost $36.00.
170+
1. For entity extraction, you'd have a total of three text records per page. Each record is 1,000 characters. Three text records per page multiplied by 6,000 pages equal 18,000 text records. Assuming $2.00 per 1,000 text records, this step would cost $36.00.
171171

172172
Putting it all together, you'd pay about $57.00 to ingest 1,000 PDF documents of this type with the described skillset.
173173

articles/search/cognitive-search-concept-annotations-syntax.md

Lines changed: 41 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,29 @@
11
---
22
title: Reference inputs and outputs in skillsets
33
titleSuffix: Azure Cognitive Search
4-
description: Explains the annotation syntax and how to reference an annotation in the inputs and outputs of a skillset in an AI enrichment pipeline in Azure Cognitive Search.
4+
description: Explains the annotation syntax and how to reference inputs and outputs of a skillset in an AI enrichment pipeline in Azure Cognitive Search.
55

66
author: HeidiSteen
77
ms.author: heidist
88
ms.service: cognitive-search
99
ms.topic: conceptual
10-
ms.date: 09/24/2021
10+
ms.date: 09/16/2022
1111
---
12-
# Reference annotations in an Azure Cognitive Search skillset
12+
# Reference an annotation in an Azure Cognitive Search skillset
1313

14-
In this article, you learn how to reference annotations in skill definitions, using examples to illustrate various scenarios. As the content of a document flows through a set of skills, it gets enriched with annotations. Annotations can be used as inputs for further downstream enrichment, or mapped to an output field in an index.
15-
16-
Examples in this article are based on the *content* field generated automatically by [Azure Blob indexers](search-howto-indexing-azure-blob-storage.md) as part of the [document cracking](search-indexer-overview.md#document-cracking) phase. When referring to documents from a Blob container, use a format such as `"/document/content"`, where the *content* field is part of the *document*.
14+
In this article, you'll learn how to reference *annotations* (or an enrichment node) in skill definitions, using examples to illustrate various scenarios.
15+
16+
Skills read inputs and write outputs to nodes in an [enriched document](cognitive-search-working-with-skillsets.md#enrichment-tree) tree, building the tree as the enrichments progress. Any node can be referenced in an input for further downstream enrichment, or mapped to an output field in an index. This article introduces the syntax and provides examples for specifying a path to a node. For the full syntax, see [Skill context and input annotation language language](cognitive-search-skill-annotation-language.md).
17+
18+
Paths to an annotation are specified in the "context" and "source" properties of a skillset, and in [output field mappings](cognitive-search-output-field-mapping.md) in an indexer. Here's an example of paths in a skillset:
19+
20+
:::image type="content" source="media/cognitive-search-annotations-syntax/content-source-annotation-path.png" alt-text="Screenshot of a skillset definition with context and source elements highlighted.":::
21+
22+
The example in the screenshot illustrates the path for an item in a Cosmos DB collection.
23+
24+
+ "context" is `/document/HotelId` because the collection is partitioned into documents by the `/HotelId` field.
25+
26+
+ "source" is `/document/Description` because the skill is a translation skill, and the field that you'll want the skill to translate is the `Description` field in each document.
1727

1828
## Background concepts
1929

@@ -25,7 +35,23 @@ Before reviewing the syntax, let's revisit a few important concepts to better un
2535
| "annotation" | Within an enriched document, a node that is created and populated by a skill, such as "text" and "layoutText" in the OCR skill, is called an annotation. An enriched document is populated with both annotations and unchanged field values or metadata copied from the source. |
2636
| "context" | The context in which the enrichment takes place, in terms of which element or component of the document is enriched. By default, the enrichment context is at the `"/document"` level, scoped to individual documents contained in the data source. When a skill runs, the outputs of that skill become [properties of the defined context](#example-2). |
2737

38+
## Common examples
39+
40+
An enriched document is created in the "document cracking" stage of indexer execution, when the indexer opens a document or reads in a row from the data source. Initially, the only node in an enriched document is the [root node (`/document`)](cognitive-search-skill-annotation-language.md#document-root), and it's the node from which all other enrichments occur. All paths start with `/document`.
41+
42+
The following list identifies several well-known paths:
43+
44+
+ `/document` is the root node and indicates an entire blob in Azure Storage, or a row in a SQL table.
45+
+ `/document/{key}` is the syntax for a document or item in a Cosmos DB collection, where `{key}` is the actual key, such as "HotelId" in the previous example.
46+
+ `/document/content` is the "content" property of a JSON blob.
47+
+ `/document/{field}` is the syntax for an operation that's performed on a specific field, such as the "/document/Description" field in the previous example.
48+
+ `/document/pages/*` or `/document/sentences/*` become the context if you're breaking a large document into smaller chunks for processing.
49+
+ `/document/normalized_images/*` is created during document cracking if the document contains images. All paths to images start with normalized_images.
50+
51+
Examples in the remainder of this article are based on the "content" field generated automatically by [Azure Blob indexers](search-howto-indexing-azure-blob-storage.md) as part of the [document cracking](search-indexer-overview.md#document-cracking) phase. When referring to documents from a Blob container, use a format such as `"/document/content"`, where the "content" field is part of the "document".
52+
2853
<a name="example-1"></a>
54+
2955
## Example 1: Simple annotation reference
3056

3157
In Azure Blob Storage, suppose you have a variety of files containing references to people's names that you want to extract using entity recognition. In the skill definition below, `"/document/content"` is the textual representation of the entire document, and "people" is an extraction of full names for entities identified as persons.
@@ -58,7 +84,7 @@ Because the default context is `"/document"`, the list of people can now be refe
5884

5985
This example builds on the previous one, showing you how to invoke an enrichment step multiple times over the same document. Assume the previous example generated an array of strings with 10 people names from a single document. A reasonable next step might be a second enrichment that extracts the last name from a full name. Because there are 10 names, you want this step to be called 10 times in this document, once for each person.
6086

61-
To invoke the right number of iterations, set the context as `"/document/people/*"`, where the asterisk (`"*"`) represents all the nodes in the enriched document as descendants of `"/document/people"`. Although this skill is only defined once in the skills array, it is called for each member within the document until all members are processed.
87+
To invoke the right number of iterations, set the context as `"/document/people/*"`, where the asterisk (`"*"`) represents all the nodes in the enriched document as descendants of `"/document/people"`. Although this skill is only defined once in the skills array, it's called for each member within the document until all members are processed.
6288

6389
```json
6490
{
@@ -90,7 +116,7 @@ When annotations are arrays or collections of strings, you might want to target
90116

91117
Sometimes you need to group all annotations of a particular type to pass them to a particular skill. Consider a hypothetical custom skill that identifies the most common last name from all the last names extracted in Example 2. To provide just the last names to the custom skill, specify the context as `"/document"` and the input as `"/document/people/*/lastname"`.
92118

93-
Notice that the cardinality of `"/document/people/*/lastname"` is larger than that of document. There may be 10 lastname nodes while there is only one document node for this document. In that case, the system will automatically create an array of `"/document/people/*/lastname"` containing all of the elements in the document.
119+
Notice that the cardinality of `"/document/people/*/lastname"` is larger than that of document. There may be 10 lastname nodes while there's only one document node for this document. In that case, the system will automatically create an array of `"/document/people/*/lastname"` containing all of the elements in the document.
94120

95121
```json
96122
{
@@ -113,9 +139,16 @@ Notice that the cardinality of `"/document/people/*/lastname"` is larger than th
113139
}
114140
```
115141

142+
## Tips for annotation path troubleshooting
116143

144+
If you're having trouble with specifying skill inputs, these tips might help you move forward:
145+
146+
+ [Run the Import data wizard](search-import-data-portal.md) over your data to review the skillset definitions and field mappings that the wizard generates.
147+
148+
+ [Start a debug session](cognitive-search-how-to-debug-skillset.md) on a skillset to view the structure of an enriched document. You can edit the paths and other parts of the skill definition, and then run the skill to validate your changes.
117149

118150
## See also
151+
119152
+ [Skill context and input annotation language](cognitive-search-skill-annotation-language.md)
120153
+ [How to integrate a custom skill into an enrichment pipeline](cognitive-search-custom-skill-interface.md)
121154
+ [How to define a skillset](cognitive-search-defining-skillset.md)

articles/search/cognitive-search-working-with-skillsets.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -129,7 +129,7 @@ An enriched document exists for the duration of skillset execution, but can be [
129129

130130
Initially, an enriched document is simply the content extracted from a data source during [*document cracking*](search-indexer-overview.md#document-cracking), where text and images are extracted from the source and made available for language or image analysis.
131131

132-
The initial content is metadata and the *root node* (`document\content`). The root node is usually a whole document or a normalized image that is extracted from a data source during document cracking. How it's articulated in an enrichment tree varies for each data source type. The following table shows the state of a document entering into the enrichment pipeline for several supported data sources:
132+
The initial content is metadata and the *root node* (`document/content`). The root node is usually a whole document or a normalized image that is extracted from a data source during document cracking. How it's articulated in an enrichment tree varies for each data source type. The following table shows the state of a document entering into the enrichment pipeline for several supported data sources:
133133

134134
|Data Source\Parsing Mode|Default|JSON, JSON Lines & CSV|
135135
|---|---|---|
12.4 KB
Loading

0 commit comments

Comments
 (0)