You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/search/cognitive-search-concept-annotations-syntax.md
+14-12Lines changed: 14 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -15,15 +15,15 @@ In this article, you'll learn how to reference *annotations* (or an enrichment n
15
15
16
16
Skills read inputs and write outputs to nodes in an [enriched document](cognitive-search-working-with-skillsets.md#enrichment-tree) tree, building the tree as the enrichments progress. Any node can be referenced in an input for further downstream enrichment, or mapped to an output field in an index. This article introduces the syntax and provides examples for specifying a path to a node. For the full syntax, see [Skill context and input annotation language language](cognitive-search-skill-annotation-language.md).
17
17
18
-
Paths to an annotation are specified in the "context" and "source" properties of a skillset, and in [output field mappings](cognitive-search-output-field-mapping.md) in an indexer. Here's an example of paths in a skillset:
18
+
Paths to an annotation are specified in the "context" and "source" properties of a skillset, and in [output field mappings](cognitive-search-output-field-mapping.md) in an indexer. Here's an example of what paths might look like in a skillset:
19
19
20
20
:::image type="content" source="media/cognitive-search-annotations-syntax/content-source-annotation-path.png" alt-text="Screenshot of a skillset definition with context and source elements highlighted.":::
21
21
22
22
The example in the screenshot illustrates the path for an item in a Cosmos DB collection.
23
23
24
-
+ "context" is `/document/HotelId` because the collection is partitioned into documents by the `/HotelId` field.
24
+
+ "context" path is `/document/HotelId` because the collection is partitioned into documents by the `/HotelId` field.
25
25
26
-
+ "source" is `/document/Description` because the skill is a translation skill, and the field that you'll want the skill to translate is the `Description` field in each document.
26
+
+ "source" path is `/document/Description` because the skill is a translation skill, and the field that you'll want the skill to translate is the `Description` field in each document.
27
27
28
28
## Background concepts
29
29
@@ -33,20 +33,22 @@ Before reviewing the syntax, let's revisit a few important concepts to better un
33
33
|------|-------------|
34
34
| "enriched document" | An enriched document is an internal structure that collects skill output as it's created and it holds all annotations related to a document. Think of an enriched document as a tree of annotations. Generally, an annotation created from a previous annotation becomes its child. </p>Enriched documents only exist for the duration of skillset execution. Once content is mapped to the search index, the enriched document is no longer needed. Although you don't interact with enriched documents directly, it's useful to have a mental model of the documents when creating a skillset. |
35
35
| "annotation" | Within an enriched document, a node that is created and populated by a skill, such as "text" and "layoutText" in the OCR skill, is called an annotation. An enriched document is populated with both annotations and unchanged field values or metadata copied from the source. |
36
-
| "context" | The context in which the enrichment takes place, in terms of which element or component of the document is enriched. By default, the enrichment context is at the `"/document"` level, scoped to individual documents contained in the data source. When a skill runs, the outputs of that skill become [properties of the defined context](#example-2). |
36
+
| "context" | The scope of enrichment, which is either the entire document, a portion of a document, or if you're working with images, the extracted images from a document. By default, the enrichment context is at the `"/document"` level, scoped to individual documents contained in the data source. When a skill runs, the outputs of that skill become [properties of the defined context](#example-2). |
37
37
38
-
## Common examples
38
+
## Paths for different scenarios
39
39
40
-
An enriched document is created in the "document cracking" stage of indexer execution, when the indexer opens a document or reads in a row from the data source. Initially, the only node in an enriched document is the [root node (`/document`)](cognitive-search-skill-annotation-language.md#document-root), and it's the node from which all other enrichments occur. All paths start with `/document`.
40
+
Paths are specified in the "context" and "source" properties of a skillset, and in the [output field mappings](cognitive-search-output-field-mapping.md) in an indexer.
41
41
42
-
The following list identifies several well-known paths:
42
+
All paths start with `/document`. An enriched document is created in the "document cracking" stage of indexer execution, when the indexer opens a document or reads in a row from the data source. Initially, the only node in an enriched document is the [root node (`/document`)](cognitive-search-skill-annotation-language.md#document-root), and it's the node from which all other enrichments occur.
43
+
44
+
The following list includes several common examples:
43
45
44
46
+`/document` is the root node and indicates an entire blob in Azure Storage, or a row in a SQL table.
45
-
+`/document/{key}` is the syntax for a document or item in a Cosmos DB collection, where `{key}` is the actual key, such as "HotelId" in the previous example.
46
-
+`/document/content`is the "content" property of a JSON blob.
47
-
+`/document/{field}` is the syntax for an operation that's performed on a specific field, such as the "/document/Description" field in the previous example.
48
-
+`/document/pages/*` or `/document/sentences/*` become the context if you're breaking a large document into smaller chunks for processing.
49
-
+`/document/normalized_images/*` is created during document cracking if the document contains images. All paths to images start with normalized_images.
47
+
+`/document/{key}` is the syntax for a document or item in a Cosmos DB collection, where `{key}` is the actual key, such as `/document/HotelId` in the previous example.
48
+
+`/document/content`specifies the "content" property of a JSON blob.
49
+
+`/document/{field}` is the syntax for an operation performed on a specific field, such as translating the `/document/Description` field, seen in the previous example.
50
+
+`/document/pages/*` or `/document/sentences/*` become the context if you're breaking a large document into smaller chunks for processing. If "context" is `/document/pages/*`, the skill executes once over each page in the document. Because there might be more than one page or sentence, you'll append `/*` to catch them all.
51
+
+`/document/normalized_images/*` is created during document cracking if the document contains images. All paths to images start with normalized_images. Since there are often multiple images embedded in a document, append `/*`.
50
52
51
53
Examples in the remainder of this article are based on the "content" field generated automatically by [Azure Blob indexers](search-howto-indexing-azure-blob-storage.md) as part of the [document cracking](search-indexer-overview.md#document-cracking) phase. When referring to documents from a Blob container, use a format such as `"/document/content"`, where the "content" field is part of the "document".
0 commit comments