You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/search/cognitive-search-aml-skill.md
+5-2Lines changed: 5 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,7 +9,7 @@ ms.service: cognitive-search
9
9
ms.custom:
10
10
- ignite-2023
11
11
ms.topic: reference
12
-
ms.date: 12/01/2022
12
+
ms.date: 05/08/2024
13
13
---
14
14
15
15
# AML skill in an Azure AI Search enrichment pipeline
@@ -19,7 +19,9 @@ ms.date: 12/01/2022
19
19
20
20
The **AML** skill allows you to extend AI enrichment with a custom [Azure Machine Learning](../machine-learning/overview-what-is-azure-machine-learning.md) (AML) model. Once an AML model is [trained and deployed](../machine-learning/concept-azure-machine-learning-architecture.md#workspace), an **AML** skill integrates it into AI enrichment.
21
21
22
-
Like built-in skills, an **AML** skill has inputs and outputs. The inputs are sent to your deployed AML online endpoint as a JSON object, which outputs a JSON payload as a response along with a success status code. The response is expected to have the outputs specified by your **AML** skill. Any other response is considered an error and no enrichments are performed.
22
+
Like other built-in skills, an **AML** skill has inputs and outputs. The inputs are sent to your deployed AML online endpoint as a JSON object, which outputs a JSON payload as a response along with a success status code. The response is expected to have the outputs specified by your **AML** skill. Any other response is considered an error and no enrichments are performed.
23
+
24
+
If you're using the [Azure AI Studio model catalog vectorizer (preview)](vector-search-vectorizer-azure-machine-learning-ai-studio-catalog.md) for integrated vectorization at query time, you should also use the **AML** skill for integrated vectorization during indexing. See [How to implement integrated vectorization using models from Azure AI Studio](vector-search-integrated-vectorization-ai-studio.md) for instructions. This scenario is supported through the 2024-05-01-preview REST API and the Azure portal.
23
25
24
26
> [!NOTE]
25
27
> The indexer will retry twice for certain standard HTTP status codes returned from the AML online endpoint. These HTTP status codes are:
@@ -165,3 +167,4 @@ For cases when the AML online endpoint is unavailable or returns an HTTP error,
165
167
166
168
+[How to define a skillset](cognitive-search-defining-skillset.md)
Copy file name to clipboardExpand all lines: articles/search/cognitive-search-predefined-skills.md
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -41,6 +41,7 @@ Skills that call the Azure AI are billed at the pay-as-you-go rate when you [att
41
41
|[Microsoft.Skills.Text.TranslationSkill](cognitive-search-skill-text-translation.md)| This skill uses a pretrained model to translate the input text into various languages for normalization or localization use cases. | Azure AI services ([pricing](https://azure.microsoft.com/pricing/details/cognitive-services/)) |
42
42
|[Microsoft.Skills.Vision.ImageAnalysisSkill](cognitive-search-skill-image-analysis.md)| This skill uses an image detection algorithm to identify the content of an image and generate a text description. | Azure AI services ([pricing](https://azure.microsoft.com/pricing/details/cognitive-services/)) |
43
43
|[Microsoft.Skills.Vision.OcrSkill](cognitive-search-skill-ocr.md)| Optical character recognition. | Azure AI services ([pricing](https://azure.microsoft.com/pricing/details/cognitive-services/)) |
44
+
|[Microsoft.Skills.Vision.VectorizeSkill](cognitive-search-skill-vision-vectorize.md)| Multimodal image and text vectorization. | Azure AI services ([pricing](https://azure.microsoft.com/pricing/details/cognitive-services/)) |
Copy file name to clipboardExpand all lines: articles/search/cognitive-search-skill-annotation-language.md
+24-8Lines changed: 24 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,7 @@
1
1
---
2
2
title: Skill context and input annotation reference language
3
3
titleSuffix: Azure AI Search
4
-
description: Annotation syntax reference for annotation in the context, inputs and outputs of a skillset in an AI enrichment pipeline in Azure AI Search.
4
+
description: Annotation syntax reference for annotation in the context, inputs, and outputs of a skillset in an AI enrichment pipeline in Azure AI Search.
5
5
6
6
author: BertrandLeRoy
7
7
ms.author: beleroy
@@ -27,7 +27,7 @@ The enriched data structure can be [inspected from debug sessions](cognitive-sea
27
27
Expressions querying the structure can also be [tested from debug sessions](cognitive-search-debug-session.md#expression-evaluator).
28
28
29
29
Throughout the article, we'll use the following enriched data as an example.
30
-
This data is typical of the kind of structure you would get when enriching a document using a skillset with [OCR](cognitive-search-skill-ocr.md), [key phrase extraction](cognitive-search-skill-keyphrases.md), [text translation](cognitive-search-skill-text-translation.md), [language detection](cognitive-search-skill-language-detection.md), [entity recognition](cognitive-search-skill-entity-recognition-v3.md) skills and a custom tokenizer skill.
30
+
This data is typical of the kind of structure you would get when enriching a document using a skillset with [OCR](cognitive-search-skill-ocr.md), [key phrase extraction](cognitive-search-skill-keyphrases.md), [text translation](cognitive-search-skill-text-translation.md), [language detection](cognitive-search-skill-language-detection.md), and [entity recognition](cognitive-search-skill-entity-recognition-v3.md) skills, as well as a custom tokenizer skill.
31
31
32
32
|Path|Value|
33
33
|---|---|
@@ -134,7 +134,7 @@ The `'#'` token expresses that the array should be treated as a single value ins
134
134
135
135
### Enumerating arrays in context
136
136
137
-
It is often useful to process each element of an array in isolation and have a different set of skill inputs and outputs for each.
137
+
It's often useful to process each element of an array in isolation and have a different set of skill inputs and outputs for each.
138
138
This can be done by setting the context of the skill to an enumeration instead of the default `"/document"`.
139
139
140
140
In the following example, we use one of the input expressions we used before, but with a different context that changes the resulting value.
@@ -143,10 +143,10 @@ In the following example, we use one of the input expressions we used before, bu
For this combination of context and input, the skill will get executed once for each normalized image: once for `"/document/normalized_images/0"` and once for `"/document/normalized_images/1"`. The two input values corresponding to each skill execution are detailed in the values column.
146
+
For this combination of context and input, the skill gets executed once for each normalized image: once for `"/document/normalized_images/0"` and once for `"/document/normalized_images/1"`. The two input values corresponding to each skill execution are detailed in the values column.
147
147
148
148
When enumerating an array in context, any outputs the skill produces will also be added to the document as enrichments of the context.
149
-
In the above example, an output named `"out"`will have its values for each execution added to the document respectively under `"/document/normalized_images/0/out"` and `"/document/normalized_images/1/out"`.
149
+
In the above example, an output named `"out"`has its values for each execution added to the document respectively under `"/document/normalized_images/0/out"` and `"/document/normalized_images/1/out"`.
150
150
151
151
## Literal values
152
152
@@ -162,9 +162,25 @@ String values can be enclosed in single `'` or double `"` quotes.
162
162
|`="unicod\u0065"`|`"unicode"`|
163
163
|`=false`|`false`|
164
164
165
+
### In line arrays
166
+
167
+
If a certain skill input requires an array of data, but the data is represented as a single value currently or you need to combine multiple different single values into an array field, then you can create an array value inline as part of a skill input expression by wrapping a comma separated list of expressions in brackets (`[` and `]`). The array value can be a combination of expression paths or literal values as needed. You can also create nested arrays within arrays this way.
If the skill has a context that explains to run the skill per an array input (that is, how `"context": "/document/pages/*"` means the skill runs once per "page" in `pages`) then passing that value as the expression as input to an in line array uses one of those values at a time.
178
+
179
+
For an example with our sample enriched data, if your skill's `context` is `/document/merged_content/keyphrases/*` and then you create an inline array of the following `=['key phrase', $(/document/merged_content/keyphrases/*)]` on an input of that skill, then the skill is executed three times, once with a value of ["key phrase", "Study of BMN"], another with a value of ["key phrase", "Syndrome"], and finally with a value of ["key phrase", "Pediatric Patients"]. The literal "key phrase" value stays the same each time, but the value of the expression path changes with each skill execution.
180
+
165
181
## Composite expressions
166
182
167
-
It's possible to combine values together using unary, binary and ternary operators.
183
+
It's possible to combine values together using unary, binary, and ternary operators.
168
184
Operators can combine literal values and values resulting from path evaluation.
169
185
When used inside an expression, paths should be enclosed between `"$("` and `")"`.
170
186
@@ -225,7 +241,7 @@ When used inside an expression, paths should be enclosed between `"$("` and `")"
225
241
|`=15>4`|`true`|
226
242
|`=1>=2`|`false`|
227
243
228
-
### Equality and non-equality`'=='``'!='`
244
+
### Equality and nonequality`'=='``'!='`
229
245
230
246
|Expression|Value|
231
247
|---|---|
@@ -248,7 +264,7 @@ When used inside an expression, paths should be enclosed between `"$("` and `")"
248
264
249
265
### Ternary operator `'?:'`
250
266
251
-
It is possible to give an input different values based on the evaluation of a Boolean expression using the ternary operator.
267
+
It's possible to give an input different values based on the evaluation of a Boolean expression using the ternary operator.
Copy file name to clipboardExpand all lines: articles/search/cognitive-search-skill-azure-openai-embedding.md
+27-4Lines changed: 27 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,17 +8,17 @@ ms.service: cognitive-search
8
8
ms.custom:
9
9
- ignite-2023
10
10
ms.topic: reference
11
-
ms.date: 03/28/2024
11
+
ms.date: 05/08/2024
12
12
---
13
13
14
14
# Azure OpenAI Embedding skill
15
15
16
16
> [!IMPORTANT]
17
-
> This feature is in public preview under [Supplemental Terms of Use](https://azure.microsoft.com/support/legal/preview-supplemental-terms/). The [2023-10-01-Preview REST API](/rest/api/searchservice/skillsets/create-or-update?view=rest-searchservice-2023-10-01-preview&preserve-view=true) supports this feature.
17
+
> This feature is in public preview under [Supplemental Terms of Use](https://azure.microsoft.com/support/legal/preview-supplemental-terms/). The [2023-10-01-preview REST API](/rest/api/searchservice/skillsets/create-or-update?view=rest-searchservice-2023-10-01-preview&preserve-view=true) supports the first iteration of this feature. The [2024-05-01-preview REST API](/rest/api/searchservice/skillsets/create-or-update?view=rest-searchservice-2024-05-01-preview&preserve-view=true) adds more properties and supports more text embedding models on Azure OpenAI.
18
18
19
-
The **Azure OpenAI Embedding** skill connects to a deployed embedding model on your [Azure OpenAI](/azure/ai-services/openai/overview) resource to generate embeddings.
19
+
The **Azure OpenAI Embedding** skill connects to a deployed embedding model on your [Azure OpenAI](/azure/ai-services/openai/overview) resource to generate embeddings during indexing.
20
20
21
-
The [Import and vectorize data](search-get-started-portal-import-vectors.md) uses the **Azure OpenAI Embedding** skill to vectorize content. You can run the wizard and review the generated skillset to see how the wizard builds it.
21
+
The [Import and vectorize data wizard](search-get-started-portal-import-vectors.md)in the Azure portal uses the **Azure OpenAI Embedding** skill to vectorize content. You can run the wizard and review the generated skillset to see how the wizard builds the skill for the text-embedding-ada-002 model.
22
22
23
23
> [!NOTE]
24
24
> This skill is bound to Azure OpenAI and is charged at the existing [Azure OpenAI pay-as-you go price](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/#pricing).
@@ -42,6 +42,18 @@ Parameters are case-sensitive.
42
42
|`apiKey`| The secret key used to access the model. If you provide a key, leave `authIdentity` empty. If you set both the `apiKey` and `authIdentity`, the `apiKey` is used on the connection. |
43
43
|`deploymentId`| The name of the deployed Azure OpenAI embedding model. The model should be an embedding model, such as text-embedding-ada-002. See the [List of Azure OpenAI models](/azure/ai-services/openai/concepts/models) for supported models.|
44
44
|`authIdentity`| A user-managed identity used by the search service for connecting to Azure OpenAI. You can use either a [system or user managed identity](search-howto-managed-identities-data-sources.md). To use a system manged identity, leave `apiKey` and `authIdentity` blank. The system-managed identity is used automatically. A managed identity must have [Cognitive Services OpenAI User](/azure/ai-services/openai/how-to/role-based-access-control#azure-openai-roles) permissions to send text to Azure OpenAI. |
45
+
|`modelName`| This property is required if your skillset is created using the 2024-05-01-preview REST API. Set this property to the deployment name of an Azure OpenAI embedding model deployed on the provider specified through `resourceUri` and identified through `deploymentId`. Currently, the supported values are `text-embedding-ada-002`, `text-embedding-3-large`, and `text-embedding-3-small`. |
46
+
|`dimensions`| (Optional, introduced in the 2024-05-01-preview REST API). The dimensions of embeddings that you would like to generate if the model supports reducing the embedding dimensions. Supported ranges are listed below. Defaults to the maximum dimensions for each model if not specified. For skillsets created using the 2023-10-01-preview, dimensions are fixed at 1536. |
47
+
48
+
## Supported dimensions by `modelName`
49
+
50
+
The supported dimensions for an Azure OpenAI Embedding skill depend on the `modelName` that is configured.
51
+
52
+
|`modelName`| Minimum dimensions | Maximum dimensions |
@@ -116,11 +130,20 @@ The output resides in memory. To send this output to a field in the search index
116
130
## Best practices
117
131
118
132
The following are some best practices you need to consider when utilizing this skill:
133
+
119
134
- If you are hitting your Azure OpenAI TPM (Tokens per minute) limit, consider the [quota limits advisory](../ai-services/openai/quotas-limits.md) so you can address accordingly. Refer to the [Azure OpenAI monitoring](../ai-services/openai/how-to/monitoring.md) documentation for more information about your Azure OpenAI instance performance.
135
+
120
136
- The Azure OpenAI embeddings model deployment you use for this skill should be ideally separate from the deployment used for other use cases, including the [query vectorizer](vector-search-how-to-configure-vectorizer.md). This helps each deployment to be tailored to its specific use case, leading to optimized performance and identifying traffic from the indexer and the index embedding calls easily.
137
+
121
138
- Your Azure OpenAI instance should be in the same region or at least geographically close to the region where your AI Search service is hosted. This reduces latency and improves the speed of data transfer between the services.
139
+
122
140
- If you have a larger than default Azure OpenAI TPM (Tokens per minute) limit as published in [quotas and limits](../ai-services/openai/quotas-limits.md) documentation, open a [support case](../azure-portal/supportability/how-to-create-azure-support-request.md) with the Azure AI Search team, so this can be adjusted accordingly. This helps your indexing process not being unnecessarily slowed down by the documented default TPM limit, if you have higher limits.
123
141
142
+
- For examples and working code samples using this skill, see the following links:
0 commit comments