You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
+ Text extraction from PDF documents and other application files is non-billable. Text extraction occurs during the [document cracking](search-indexer-overview.md#document-cracking) phase and isn't an enrichment in itself, but it occurs during AI enrichment and is thus noted here.
140
+
+ Text extraction from PDF documents and other application files is non-billable. Text extraction occurs during the [document cracking](search-indexer-overview.md#document-cracking) phase and isn't technically an enrichment, but it occurs during AI enrichment and is thus noted here.
141
141
142
142
## Billable enrichments
143
143
144
144
During AI enrichment, Cognitive Search calls the Cognitive Services APIs for [built-in skills](cognitive-search-predefined-skills.md) that are based on Computer Vision, Translator, and Azure Cognitive Services for Language.
145
145
146
-
Billable built-in skills that make backend calls to Cognitive Services include [Entity Linking](cognitive-search-skill-entity-linking-v3.md), [Entity Recognition](cognitive-search-skill-entity-recognition-v3.md), [Image Analysis](cognitive-search-skill-image-analysis.md), [Key Phrase Extraction](cognitive-search-skill-keyphrases.md), [Language Detection](cognitive-search-skill-language-detection.md), [OCR](cognitive-search-skill-ocr.md), [PII Detection](cognitive-search-skill-pii-detection.md), [Sentiment](cognitive-search-skill-sentiment-v3.md), and [Text Translation](cognitive-search-skill-text-translation.md).
146
+
Billable built-in skills that make backend calls to Cognitive Services include [Entity Linking](cognitive-search-skill-entity-linking-v3.md), [Entity Recognition](cognitive-search-skill-entity-recognition-v3.md), [Image Analysis](cognitive-search-skill-image-analysis.md), [Key Phrase Extraction](cognitive-search-skill-keyphrases.md), [Language Detection](cognitive-search-skill-language-detection.md), [OCR](cognitive-search-skill-ocr.md), [ersonally Identifiable Information (PII) Detection](cognitive-search-skill-pii-detection.md), [Sentiment](cognitive-search-skill-sentiment-v3.md), and [Text Translation](cognitive-search-skill-text-translation.md).
147
147
148
-
Image extraction is an Azure Cognitive Search operation that occurs when documents are cracked prior to enrichment. Image extraction is billable on all tiers, with the exception of 20 free daily extractions on the free tier. Image extraction costs apply to image files inside blobs, embedded images in other files (PDF and other app files), and for images extracted using [Document Extraction](cognitive-search-skill-document-extraction.md). For image extraction pricing, see the [Azure Cognitive Search pricing page](https://azure.microsoft.com/pricing/details/search/).
148
+
Image extraction is an Azure Cognitive Search operation that occurs when documents are cracked prior to enrichment. Image extraction is billable on all tiers, except for 20 free daily extractions on the free tier. Image extraction costs apply to image files inside blobs, embedded images in other files (PDF and other app files), and for images extracted using [Document Extraction](cognitive-search-skill-document-extraction.md). For image extraction pricing, see the [Azure Cognitive Search pricing page](https://azure.microsoft.com/pricing/details/search/).
149
149
150
150
> [!TIP]
151
151
> To lower the cost of skillset processing, enable [incremental enrichment (preview)](cognitive-search-incremental-indexing-conceptual.md) to cache and reuse any enrichments that are unaffected by changes made to a skillset. Caching requires Azure Storage (see [pricing](https://azure.microsoft.com/pricing/details/storage/blobs/) but the cumulative cost of skillset execution is lower if existing enrichments can be reused, especially for skillsets that use image extraction and analysis.
@@ -167,7 +167,7 @@ The prices shown in this article are hypothetical. They're used to illustrate th
167
167
168
168
1. For OCR of 6,000 images in English, the OCR cognitive skill uses the best algorithm (DescribeText). Assuming a cost of $2.50 per 1,000 images to be analyzed, you would pay $15.00 for this step.
169
169
170
-
1. For entity extraction, you'd have a total of three text records per page. Each record is 1,000 characters. Three text records per page multiplied by 6,000 pages equals 18,000 text records. Assuming $2.00 per 1,000 text records, this step would cost $36.00.
170
+
1. For entity extraction, you'd have a total of three text records per page. Each record is 1,000 characters. Three text records per page multiplied by 6,000 pages equal 18,000 text records. Assuming $2.00 per 1,000 text records, this step would cost $36.00.
171
171
172
172
Putting it all together, you'd pay about $57.00 to ingest 1,000 PDF documents of this type with the described skillset.
Copy file name to clipboardExpand all lines: articles/search/cognitive-search-concept-annotations-syntax.md
+5-4Lines changed: 5 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -17,9 +17,9 @@ Paths to an annotation are specified in the "context" and "source" properties:
17
17
18
18
:::image type="content" source="media/cognitive-search-annotations-syntax/content-source-annotation-path.png" alt-text="Screenshot of a skillset definition with context and source elements highlighted.":::
19
19
20
-
The example in the screenshot is for an item in a Cosmos DB collection.
20
+
The example in the screenshot illustrates annotation syntax for an item in a Cosmos DB collection.
21
21
22
-
+ "context" is `/document/HotelId` because the collection is partitioned into documents by the `/HotelId` field. For a document in a Cosmos DB collection, it's also the root node of the enrichment document.
22
+
+ "context" is `/document/HotelId` because the collection is partitioned into documents by the `/HotelId` field.
23
23
24
24
+ "source" is `/document/Description` because the skill is a translation skill, and the field that you'll want the skill to translate is the `Description` field in each document.
25
25
@@ -37,9 +37,10 @@ Before reviewing the syntax, let's revisit a few important concepts to better un
37
37
38
38
An enriched document is created in the "document cracking" stage of indexer execution, when the indexer opens a document or reads in a row from the data source. Initially, the only node in an enriched document is the [root node (`/document`)](cognitive-search-skill-annotation-language.md#document-root), and it's the node from which all other enrichments occur.
39
39
40
-
The following tables shows several well-known paths:
40
+
The following list identifies several well-known paths:
41
41
42
-
+`/document` is the root node and indicates an entire blob in Azure Storage, or a row in SQL table.
42
+
+`/document` is the root node and indicates an entire blob in Azure Storage, or a row in a SQL table.
43
+
+`/document/{key}` is the syntax for a document or item in a Cosmos DB collection.
43
44
+`/document/content` is the "content" property of a JSON blob.
44
45
+`/document/pages/*` or `/document/sentences/*` become the context if you're breaking a large document into smaller chunks for processing.
45
46
+`/document/normalized_images/*` is created during document cracking if the document contains images. All paths to images start with normalized_images.
0 commit comments