Skip to content

Commit 43d6d4a

Browse files
authored
Merge pull request #186738 from bleroy/beleroy/cognitive-search-enrichment-language
Add a topic on Azure Cognitive Search input annotation language
2 parents dfd94e4 + cd26f2f commit 43d6d4a

File tree

4 files changed

+274
-1
lines changed

4 files changed

+274
-1
lines changed

articles/search/TOC.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -568,6 +568,8 @@
568568
href: search-more-like-this.md
569569
- name: Skills reference
570570
items:
571+
- name: Annotation reference language
572+
href: cognitive-search-skill-annotation-language.md
571573
- name: Built-in skills
572574
items:
573575
- name: Overview

articles/search/cognitive-search-concept-annotations-syntax.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -115,6 +115,7 @@ Notice that the cardinality of `"/document/people/*/lastname"` is larger than th
115115

116116

117117
## See also
118+
+ [Skill context and input annotation language](cognitive-search-skill-annotation-language.md)
118119
+ [How to integrate a custom skill into an enrichment pipeline](cognitive-search-custom-skill-interface.md)
119120
+ [How to define a skillset](cognitive-search-defining-skillset.md)
120121
+ [Create Skillset (REST)](/rest/api/searchservice/create-skillset)

articles/search/cognitive-search-defining-skillset.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -111,7 +111,7 @@ Inside the skillset definition, the skills array specifies which skills to execu
111111
```
112112

113113
> [!NOTE]
114-
> You can build complex skillsets with looping and branching using the [Conditional skill](cognitive-search-skill-conditional.md) to create the expressions. The syntax is based on the [JSON Pointer](https://tools.ietf.org/html/rfc6901) path notation, with a few modifications to identify nodes in the enrichment tree. A `"/"` traverses a level lower in the tree and `"*"` acts as a for-each operator in the context. Numerous examples in this article illustrate the syntax.
114+
> You can build complex skillsets with looping and branching using the [Conditional skill](cognitive-search-skill-conditional.md) to create the expressions. The syntax is based on the [JSON Pointer](https://tools.ietf.org/html/rfc6901) path notation, with a few modifications to identify nodes in the enrichment tree. A `"/"` traverses a level lower in the tree and `"*"` acts as a for-each operator in the context. Numerous examples in this article illustrate the [the syntax](cognitive-search-skill-annotation-language.md).
115115
116116
### How built-in skills are structured
117117

Lines changed: 270 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,270 @@
1+
---
2+
title: Skill context and input annotation reference language
3+
titleSuffix: Azure Cognitive Search
4+
description: Annotation syntax reference for annotation in the context, inputs and outputs of a skillset in an AI enrichment pipeline in Azure Cognitive Search.
5+
6+
author: BertrandLeRoy
7+
ms.author: beleroy
8+
ms.service: cognitive-search
9+
ms.topic: reference
10+
ms.date: 01/27/2022
11+
---
12+
# Skill context and input annotation language
13+
14+
This article is the reference documentation for skill context and input syntax. It's a full description of the expression language used to construct paths to nodes in an enriched document.
15+
16+
Azure Cognitive Search skills can use and [enrich the data coming from the data source and from the output of other skills](cognitive-search-defining-skillset.md).
17+
The data working set that represents the current state of the indexer work for the current document starts from the raw data coming from the data source and is
18+
progressively enriched with each skill iteration's output data.
19+
That data is internally organized in a tree-like structure that can be queried to be used as skill inputs or to be added to the index.
20+
The nodes in the tree can be simple values such as strings and numbers, arrays, or complex objects and even binary files.
21+
Even simple values can be enriched with additional structured information.
22+
For example, a string can be annotated with additional information that is stored beneath it in the enrichment tree.
23+
The expressions used to query that internal structure use a rich syntax that is detailed in this article.
24+
The enriched data structure can be [inspected from debug sessions](cognitive-search-debug-session.md#ai-enrichments-tab--enriched-data-structure).
25+
Expressions querying the structure can also be [tested from debug sessions](cognitive-search-debug-session.md#expression-evaluator).
26+
27+
Throughout the article, we'll use the following enriched data as an example.
28+
This data is typical of the kind of structure you would get when enriching a document using a skillset with [OCR](cognitive-search-skill-ocr.md), [key phrase extraction](cognitive-search-skill-keyphrases.md), [text translation](cognitive-search-skill-text-translation.md), [language detection](cognitive-search-skill-language-detection.md), [entity recognition](cognitive-search-skill-entity-recognition-v3.md) skills and a custom tokenizer skill.
29+
30+
|Path|Value|
31+
|---|---|
32+
|`document`||
33+
| `merged_content`|"Study of BMN 110 in Pediatric Patients"...|
34+
|  `keyphrases`||
35+
|   `[0]`|"Study of BMN"|
36+
|   `[1]`|"Syndrome"|
37+
|   `[2]`|"Pediatric Patients"|
38+
|   ...||
39+
|  `locations`||
40+
|   `[0]`|"IVA"|
41+
|  `translated_text`|"Étude de BMN 110 chez les patients pédiatriques"...|
42+
|  `entities`||
43+
|   `[0]`||
44+
|    `category`|"Organization"|
45+
|    `subcategory`|`null`|
46+
|    `confidenceScore`|0.72|
47+
|    `length`|3|
48+
|    `offset`|9|
49+
|    `text`|"BMN"|
50+
|   ...||
51+
|  `organizations`||
52+
|   `[0]`|"BMN"|
53+
|  `language`|"en"|
54+
| `normalized_images`||
55+
|  `[0]`||
56+
|   `layoutText`|...|
57+
|   `text`||
58+
|    `words`||
59+
|     `[0]`|"Study"|
60+
|     `[1]`|"of"|
61+
|     `[2]`|"BMN"|
62+
|     `[3]`|"110"|
63+
|     ...||
64+
|  `[1]`||
65+
|   `layoutText`|...|
66+
|   `text`||
67+
|    `words`||
68+
|     `[0]`|"it"|
69+
|     `[1]`|"is"|
70+
|     `[2]`|"certainly"|
71+
|     ...||
72+
|    ...
73+
|  ...||
74+
75+
## Document root
76+
77+
All the data is under one root element, for which the path is `"/document"`. The root element is the default context for skills.
78+
79+
## Simple paths
80+
81+
Simple paths through the internal enriched document can be expressed with simple tokens separated by slashes.
82+
This syntax is similar to [the JSON Pointer specification](https://datatracker.ietf.org/doc/html/rfc6901.htmlhttps://datatracker.ietf.org/doc/html/rfc6901.html).
83+
84+
### Object properties
85+
86+
The properties of nodes that represent objects add their values to the tree under the property's name.
87+
Those values can be obtained by appending the property name as a token separated by a slash:
88+
89+
|Expression|Value|
90+
|---|---|
91+
|`/document/merged_content/language`|`"en"`|
92+
93+
Property name tokens are case-sensitive.
94+
95+
### Array item index
96+
97+
Specific elements of an array can be referenced by using their numeric index like a property name:
98+
99+
|Expression|Value|
100+
|---|---|
101+
|`/document/merged_content/keyphrases/1`|`"Syndrome"`|
102+
|`/document/merged_content/entities/0/text`|`"BMN"`|
103+
104+
### Escape sequences
105+
106+
There are two characters that have special meaning and need to be escaped if they appear in an expression and must be interpreted as is instead of as their special meaning: `'/'` and `'~'`.
107+
Those characters must be escaped respectively as `'~0'` and `'~1'`.
108+
109+
## Array enumeration
110+
111+
An array of values can be obtained using the `'*'` token:
112+
113+
|Expression|Value|
114+
|---|---|
115+
|`/document/normalized_images/0/text/words/*`|`["Study", "of", "BMN", "110" ...]`|
116+
117+
The `'*'` token doesn't have to be at the end of the path. It's possible to enumerate all nodes matching a path with a star in the middle or with multiple stars:
118+
119+
|Expression|Value|
120+
|---|---|
121+
|`/document/normalized_images/*/text/words/*`|`["Study", "of", "BMN", "110" ... "it", "is", "certainly" ...]`|
122+
123+
This example returns a flat list of all matching nodes.
124+
125+
It's possible to maintain more structure and get a separate array for the words of each page by using a `'#'` token instead of the second `'*'` token:
126+
127+
|Expression|Value|
128+
|---|---|
129+
|`/document/normalized_images/*/text/words/#`|`[["Study", "of", "BMN", "110" ...], ["it", "is", "certainly" ...] ...]`|
130+
131+
The `'#'` token expresses that the array should be treated as a single value instead of being enumerated.
132+
133+
### Enumerating arrays in context
134+
135+
It is often useful to process each element of an array in isolation and have a different set of skill inputs and outputs for each.
136+
This can be done by setting the context of the skill to an enumeration instead of the default `"/document"`.
137+
138+
In the following example, we use one of the input expressions we used before, but with a different context that changes the resulting value.
139+
140+
|Context|Expression|Values|
141+
|---|---|---|
142+
|`/document/normalized_images/*`|`/document/normalized_images/*/text/words/*`|`["Study", "of", "BMN", "110" ...]`<br/>`["it", "is", "certainly" ...]`<br>...|
143+
144+
For this combination of context and input, the skill will get executed once for each normalized image: once for `"/document/normalized_images/0"` and once for `"/document/normalized_images/1"`. The two input values corresponding to each skill execution are detailed in the values column.
145+
146+
When enumerating an array in context, any outputs the skill produces will also be added to the document as enrichments of the context.
147+
In the above example, an output named `"out"` will have its values for each execution added to the document respectively under `"/document/normalized_images/0/out"` and `"/document/normalized_images/1/out"`.
148+
149+
## Literal values
150+
151+
Skill inputs can take literal values as their inputs instead of dynamic values queried from the existing document. This can be achieved by prefixing the value with an equal sign. Values can be numbers, strings or Boolean.
152+
String values can be enclosed in single `'` or double `"` quotes.
153+
154+
|Expression|Value|
155+
|---|---|
156+
|`=42`|`42`|
157+
|`=2.45E-4`|`0.000245`|
158+
|`="some string"`|`"some string"`|
159+
|`='some other string'`|`"some other string"`|
160+
|`="unicod\u0065"`|`"unicode"`|
161+
|`=false`|`false`|
162+
163+
## Composite expressions
164+
165+
It's possible to combine values together using unary, binary and ternary operators.
166+
Operators can combine literal values and values resulting from path evaluation.
167+
When used inside an expression, paths should be enclosed between `"$("` and `")"`.
168+
169+
### Boolean not `'!'`
170+
171+
|Expression|Value|
172+
|---|---|
173+
|`=!false`|`true`|
174+
175+
### Negative `'-'`
176+
177+
|Expression|Value|
178+
|---|---|
179+
|`=-42`|`-42`|
180+
|`=-$(/document/merged_content/entities/0/offset)`|`-9`|
181+
182+
### Addition `'+'`
183+
184+
|Expression|Value|
185+
|---|---|
186+
|`=2+2`|`4`|
187+
|`=2+$(/document/merged_content/entities/0/offset)`|`11`|
188+
189+
### Subtraction `'-'`
190+
191+
|Expression|Value|
192+
|---|---|
193+
|`=2-1`|`1`|
194+
|`=$(/document/merged_content/entities/0/offset)-2`|`7`|
195+
196+
### Multiplication `'*'`
197+
198+
|Expression|Value|
199+
|---|---|
200+
|`=2*3`|`6`|
201+
|`=$(/document/merged_content/entities/0/offset)*2`|`18`|
202+
203+
### Division `'/'`
204+
205+
|Expression|Value|
206+
|---|---|
207+
|`=3/2`|`1.5`|
208+
|`=$(/document/merged_content/entities/0/offset)/3`|`3`|
209+
210+
### Modulo `'%'`
211+
212+
|Expression|Value|
213+
|---|---|
214+
|`=15%4`|`3`|
215+
|`=$(/document/merged_content/entities/0/offset)%2`|`1`|
216+
217+
### Less than, less than or equal, greater than and greater than or equal `'<'` `'<='` `'>'` `'>='`
218+
219+
|Expression|Value|
220+
|---|---|
221+
|`=15<4`|`false`|
222+
|`=4<=4`|`true`|
223+
|`=15>4`|`true`|
224+
|`=1>=2`|`false`|
225+
226+
### Equality and non-equality `'=='` `'!='`
227+
228+
|Expression|Value|
229+
|---|---|
230+
|`=15==4`|`false`|
231+
|`=4==4`|`true`|
232+
|`=15!=4`|`true`|
233+
|`=1!=1`|`false`|
234+
235+
### Logical operations and, or and exclusive or `'&&'` `'||'` `'^'`
236+
237+
|Expression|Value|
238+
|---|---|
239+
|`=true&&true`|`true`|
240+
|`=true&&false`|`false`|
241+
|`=true||true`|`true`|
242+
|`=true||false`|`true`|
243+
|`=false||false`|`false`|
244+
|`=true^false`|`true`|
245+
|`=true^true`|`false`|
246+
247+
### Ternary operator `'?:'`
248+
249+
It is possible to give an input different values based on the evaluation of a Boolean expression using the ternary operator.
250+
251+
|Expression|Value|
252+
|---|---|
253+
|`=true?"true":"false"`|`"true"`|
254+
|`=$(/document/merged_content/entities/0/offset)==9?"nine":"not nine"`|`"nine"`|
255+
256+
### Parentheses and operator priority
257+
258+
Operators are evaluated with priorities that match usual conventions: unary operators, then multiplication, division and modulo, then addition and subtraction, then comparison, then equality, and then logical operators.
259+
Usual associativity rules also apply.
260+
261+
Parentheses can be used to change or disambiguate evaluation order.
262+
263+
|Expression|Value|
264+
|---|---|
265+
|`=3*2+5`|`11`|
266+
|`=3*(2+5)`|`21`|
267+
268+
## See also
269+
+ [Create a skillset in Azure Cognitive Search](cognitive-search-defining-skillset.md)
270+
+ [Reference annotations in an Azure Cognitive Search skillset](cognitive-search-concept-annotations-syntax.md)

0 commit comments

Comments
 (0)