Skip to content

Commit fb8766d

Browse files
authored
Merge pull request #275211 from MicrosoftDocs/release-build-azure-search
Release build Azure AI search
2 parents df89aa3 + d97f040 commit fb8766d

36 files changed

+1701
-182
lines changed

articles/search/TOC.yml

Lines changed: 20 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,8 @@
2525
items:
2626
- name: Create an index
2727
href: search-get-started-portal.md
28+
- name: Data chunking and vectorization in Azure portal (preview)
29+
href: search-get-started-portal-import-vectors.md
2830
- name: Create a demo app
2931
href: search-create-app-portal.md
3032
- name: Create a skillset
@@ -286,6 +288,8 @@
286288
href: search-howto-connecting-azure-sql-mi-to-azure-search-using-indexers.md
287289
- name: Azure SQL Server VMs
288290
href: search-howto-connecting-azure-sql-iaas-to-azure-search-using-indexers.md
291+
- name: OneLake files
292+
href: search-how-to-index-onelake-files.md
289293
- name: SharePoint in Microsoft 365
290294
href: search-howto-index-sharepoint-online.md
291295
- name: Skillsets
@@ -322,6 +326,8 @@
322326
items:
323327
- name: Create a vector index
324328
href: vector-search-how-to-create-index.md
329+
- name: Index binary data for vector search
330+
href: vector-search-how-to-index-binary-data.md
325331
- name: Query vectors
326332
href: vector-search-how-to-query.md
327333
- name: Filter vectors
@@ -336,8 +342,8 @@
336342
href: vector-search-how-to-chunk-documents.md
337343
- name: Generate embeddings
338344
href: vector-search-how-to-generate-embeddings.md
339-
- name: Chunk and embed in Azure portal (preview)
340-
href: search-get-started-portal-import-vectors.md
345+
- name: Integrated vectorization with Azure AI Studio models
346+
href: vector-search-integrated-vectorization-ai-studio.md
341347
- name: Keyword search
342348
items:
343349
- name: Full text query
@@ -623,6 +629,8 @@
623629
href: cognitive-search-skill-sentiment-v3.md
624630
- name: Text Translation
625631
href: cognitive-search-skill-text-translation.md
632+
- name: AI Vision multimodal embeddings
633+
href: cognitive-search-skill-vision-vectorize.md
626634
- name: Azure AI Search utility skills (nonbillable)
627635
items:
628636
- name: Conditional
@@ -654,6 +662,16 @@
654662
href: cognitive-search-skill-entity-recognition.md
655663
- name: Sentiment (v2)
656664
href: cognitive-search-skill-sentiment.md
665+
- name: Vectorizers reference
666+
items:
667+
- name: Azure OpenAI
668+
href: vector-search-vectorizer-azure-open-ai.md
669+
- name: Azure AI Vision
670+
href: vector-search-vectorizer-ai-services-vision.md
671+
- name: Azure AI Studio model catalog
672+
href: vector-search-vectorizer-azure-machine-learning-ai-studio-catalog.md
673+
- name: Custom Web API
674+
href: vector-search-vectorizer-custom-web-api.md
657675
- name: Resources
658676
items:
659677
- name: Stack Overflow

articles/search/cognitive-search-aml-skill.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ ms.service: cognitive-search
99
ms.custom:
1010
- ignite-2023
1111
ms.topic: reference
12-
ms.date: 12/01/2022
12+
ms.date: 05/08/2024
1313
---
1414

1515
# AML skill in an Azure AI Search enrichment pipeline
@@ -19,7 +19,9 @@ ms.date: 12/01/2022
1919
2020
The **AML** skill allows you to extend AI enrichment with a custom [Azure Machine Learning](../machine-learning/overview-what-is-azure-machine-learning.md) (AML) model. Once an AML model is [trained and deployed](../machine-learning/concept-azure-machine-learning-architecture.md#workspace), an **AML** skill integrates it into AI enrichment.
2121

22-
Like built-in skills, an **AML** skill has inputs and outputs. The inputs are sent to your deployed AML online endpoint as a JSON object, which outputs a JSON payload as a response along with a success status code. The response is expected to have the outputs specified by your **AML** skill. Any other response is considered an error and no enrichments are performed.
22+
Like other built-in skills, an **AML** skill has inputs and outputs. The inputs are sent to your deployed AML online endpoint as a JSON object, which outputs a JSON payload as a response along with a success status code. The response is expected to have the outputs specified by your **AML** skill. Any other response is considered an error and no enrichments are performed.
23+
24+
If you're using the [Azure AI Studio model catalog vectorizer (preview)](vector-search-vectorizer-azure-machine-learning-ai-studio-catalog.md) for integrated vectorization at query time, you should also use the **AML** skill for integrated vectorization during indexing. See [How to implement integrated vectorization using models from Azure AI Studio](vector-search-integrated-vectorization-ai-studio.md) for instructions. This scenario is supported through the 2024-05-01-preview REST API and the Azure portal.
2325

2426
> [!NOTE]
2527
> The indexer will retry twice for certain standard HTTP status codes returned from the AML online endpoint. These HTTP status codes are:
@@ -165,3 +167,4 @@ For cases when the AML online endpoint is unavailable or returns an HTTP error,
165167

166168
+ [How to define a skillset](cognitive-search-defining-skillset.md)
167169
+ [AML online endpoint troubleshooting](../machine-learning/how-to-troubleshoot-online-endpoints.md)
170+
+ [Integrated vectorization with models from Azure AI Studio](vector-search-integrated-vectorization-ai-studio.md)

articles/search/cognitive-search-predefined-skills.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,7 @@ Skills that call the Azure AI are billed at the pay-as-you-go rate when you [att
4141
| [Microsoft.Skills.Text.TranslationSkill](cognitive-search-skill-text-translation.md) | This skill uses a pretrained model to translate the input text into various languages for normalization or localization use cases. | Azure AI services ([pricing](https://azure.microsoft.com/pricing/details/cognitive-services/)) |
4242
| [Microsoft.Skills.Vision.ImageAnalysisSkill](cognitive-search-skill-image-analysis.md) | This skill uses an image detection algorithm to identify the content of an image and generate a text description. | Azure AI services ([pricing](https://azure.microsoft.com/pricing/details/cognitive-services/)) |
4343
| [Microsoft.Skills.Vision.OcrSkill](cognitive-search-skill-ocr.md) | Optical character recognition. | Azure AI services ([pricing](https://azure.microsoft.com/pricing/details/cognitive-services/)) |
44+
| [Microsoft.Skills.Vision.VectorizeSkill](cognitive-search-skill-vision-vectorize.md) | Multimodal image and text vectorization. | Azure AI services ([pricing](https://azure.microsoft.com/pricing/details/cognitive-services/)) |
4445

4546
## Azure OpenAI skills
4647

articles/search/cognitive-search-skill-annotation-language.md

Lines changed: 24 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: Skill context and input annotation reference language
33
titleSuffix: Azure AI Search
4-
description: Annotation syntax reference for annotation in the context, inputs and outputs of a skillset in an AI enrichment pipeline in Azure AI Search.
4+
description: Annotation syntax reference for annotation in the context, inputs, and outputs of a skillset in an AI enrichment pipeline in Azure AI Search.
55

66
author: BertrandLeRoy
77
ms.author: beleroy
@@ -27,7 +27,7 @@ The enriched data structure can be [inspected from debug sessions](cognitive-sea
2727
Expressions querying the structure can also be [tested from debug sessions](cognitive-search-debug-session.md#expression-evaluator).
2828

2929
Throughout the article, we'll use the following enriched data as an example.
30-
This data is typical of the kind of structure you would get when enriching a document using a skillset with [OCR](cognitive-search-skill-ocr.md), [key phrase extraction](cognitive-search-skill-keyphrases.md), [text translation](cognitive-search-skill-text-translation.md), [language detection](cognitive-search-skill-language-detection.md), [entity recognition](cognitive-search-skill-entity-recognition-v3.md) skills and a custom tokenizer skill.
30+
This data is typical of the kind of structure you would get when enriching a document using a skillset with [OCR](cognitive-search-skill-ocr.md), [key phrase extraction](cognitive-search-skill-keyphrases.md), [text translation](cognitive-search-skill-text-translation.md), [language detection](cognitive-search-skill-language-detection.md), and [entity recognition](cognitive-search-skill-entity-recognition-v3.md) skills, as well as a custom tokenizer skill.
3131

3232
|Path|Value|
3333
|---|---|
@@ -134,7 +134,7 @@ The `'#'` token expresses that the array should be treated as a single value ins
134134

135135
### Enumerating arrays in context
136136

137-
It is often useful to process each element of an array in isolation and have a different set of skill inputs and outputs for each.
137+
It's often useful to process each element of an array in isolation and have a different set of skill inputs and outputs for each.
138138
This can be done by setting the context of the skill to an enumeration instead of the default `"/document"`.
139139

140140
In the following example, we use one of the input expressions we used before, but with a different context that changes the resulting value.
@@ -143,10 +143,10 @@ In the following example, we use one of the input expressions we used before, bu
143143
|---|---|---|
144144
|`/document/normalized_images/*`|`/document/normalized_images/*/text/words/*`|`["Study", "of", "BMN", "110" ...]`<br/>`["it", "is", "certainly" ...]`<br>...|
145145

146-
For this combination of context and input, the skill will get executed once for each normalized image: once for `"/document/normalized_images/0"` and once for `"/document/normalized_images/1"`. The two input values corresponding to each skill execution are detailed in the values column.
146+
For this combination of context and input, the skill gets executed once for each normalized image: once for `"/document/normalized_images/0"` and once for `"/document/normalized_images/1"`. The two input values corresponding to each skill execution are detailed in the values column.
147147

148148
When enumerating an array in context, any outputs the skill produces will also be added to the document as enrichments of the context.
149-
In the above example, an output named `"out"` will have its values for each execution added to the document respectively under `"/document/normalized_images/0/out"` and `"/document/normalized_images/1/out"`.
149+
In the above example, an output named `"out"` has its values for each execution added to the document respectively under `"/document/normalized_images/0/out"` and `"/document/normalized_images/1/out"`.
150150

151151
## Literal values
152152

@@ -162,9 +162,25 @@ String values can be enclosed in single `'` or double `"` quotes.
162162
|`="unicod\u0065"`|`"unicode"`|
163163
|`=false`|`false`|
164164

165+
### In line arrays
166+
167+
If a certain skill input requires an array of data, but the data is represented as a single value currently or you need to combine multiple different single values into an array field, then you can create an array value inline as part of a skill input expression by wrapping a comma separated list of expressions in brackets (`[` and `]`). The array value can be a combination of expression paths or literal values as needed. You can also create nested arrays within arrays this way.
168+
169+
|Expression|Value|
170+
|---|---|
171+
|`=['item']`|["item"]|
172+
|`=[$(/document/merged_content/entities/0/text), 'item']`|["BMN", "item"]|
173+
|`=[1, 3, 5]`|[1, 3, 5]|
174+
|`=[true, true, false]`|[true, true, false]|
175+
|`=[[$(/document/merged_content/entities/0/text), 'item'],['item2', $(/document/merged_content/keyphrases/1)]]`|[["BMN", "item"], ["item2", "Syndrome"]]|
176+
177+
If the skill has a context that explains to run the skill per an array input (that is, how `"context": "/document/pages/*"` means the skill runs once per "page" in `pages`) then passing that value as the expression as input to an in line array uses one of those values at a time.
178+
179+
For an example with our sample enriched data, if your skill's `context` is `/document/merged_content/keyphrases/*` and then you create an inline array of the following `=['key phrase', $(/document/merged_content/keyphrases/*)]` on an input of that skill, then the skill is executed three times, once with a value of ["key phrase", "Study of BMN"], another with a value of ["key phrase", "Syndrome"], and finally with a value of ["key phrase", "Pediatric Patients"]. The literal "key phrase" value stays the same each time, but the value of the expression path changes with each skill execution.
180+
165181
## Composite expressions
166182

167-
It's possible to combine values together using unary, binary and ternary operators.
183+
It's possible to combine values together using unary, binary, and ternary operators.
168184
Operators can combine literal values and values resulting from path evaluation.
169185
When used inside an expression, paths should be enclosed between `"$("` and `")"`.
170186

@@ -225,7 +241,7 @@ When used inside an expression, paths should be enclosed between `"$("` and `")"
225241
|`=15>4`|`true`|
226242
|`=1>=2`|`false`|
227243

228-
### Equality and non-equality `'=='` `'!='`
244+
### Equality and nonequality `'=='` `'!='`
229245

230246
|Expression|Value|
231247
|---|---|
@@ -248,7 +264,7 @@ When used inside an expression, paths should be enclosed between `"$("` and `")"
248264

249265
### Ternary operator `'?:'`
250266

251-
It is possible to give an input different values based on the evaluation of a Boolean expression using the ternary operator.
267+
It's possible to give an input different values based on the evaluation of a Boolean expression using the ternary operator.
252268

253269
|Expression|Value|
254270
|---|---|

articles/search/cognitive-search-skill-azure-openai-embedding.md

Lines changed: 27 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -8,17 +8,17 @@ ms.service: cognitive-search
88
ms.custom:
99
- ignite-2023
1010
ms.topic: reference
11-
ms.date: 03/28/2024
11+
ms.date: 05/08/2024
1212
---
1313

1414
# Azure OpenAI Embedding skill
1515

1616
> [!IMPORTANT]
17-
> This feature is in public preview under [Supplemental Terms of Use](https://azure.microsoft.com/support/legal/preview-supplemental-terms/). The [2023-10-01-Preview REST API](/rest/api/searchservice/skillsets/create-or-update?view=rest-searchservice-2023-10-01-preview&preserve-view=true) supports this feature.
17+
> This feature is in public preview under [Supplemental Terms of Use](https://azure.microsoft.com/support/legal/preview-supplemental-terms/). The [2023-10-01-preview REST API](/rest/api/searchservice/skillsets/create-or-update?view=rest-searchservice-2023-10-01-preview&preserve-view=true) supports the first iteration of this feature. The [2024-05-01-preview REST API](/rest/api/searchservice/skillsets/create-or-update?view=rest-searchservice-2024-05-01-preview&preserve-view=true) adds more properties and supports more text embedding models on Azure OpenAI.
1818
19-
The **Azure OpenAI Embedding** skill connects to a deployed embedding model on your [Azure OpenAI](/azure/ai-services/openai/overview) resource to generate embeddings.
19+
The **Azure OpenAI Embedding** skill connects to a deployed embedding model on your [Azure OpenAI](/azure/ai-services/openai/overview) resource to generate embeddings during indexing.
2020

21-
The [Import and vectorize data](search-get-started-portal-import-vectors.md) uses the **Azure OpenAI Embedding** skill to vectorize content. You can run the wizard and review the generated skillset to see how the wizard builds it.
21+
The [Import and vectorize data wizard](search-get-started-portal-import-vectors.md) in the Azure portal uses the **Azure OpenAI Embedding** skill to vectorize content. You can run the wizard and review the generated skillset to see how the wizard builds the skill for the text-embedding-ada-002 model.
2222

2323
> [!NOTE]
2424
> This skill is bound to Azure OpenAI and is charged at the existing [Azure OpenAI pay-as-you go price](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/#pricing).
@@ -42,6 +42,18 @@ Parameters are case-sensitive.
4242
| `apiKey` | The secret key used to access the model. If you provide a key, leave `authIdentity` empty. If you set both the `apiKey` and `authIdentity`, the `apiKey` is used on the connection. |
4343
| `deploymentId` | The name of the deployed Azure OpenAI embedding model. The model should be an embedding model, such as text-embedding-ada-002. See the [List of Azure OpenAI models](/azure/ai-services/openai/concepts/models) for supported models.|
4444
| `authIdentity` | A user-managed identity used by the search service for connecting to Azure OpenAI. You can use either a [system or user managed identity](search-howto-managed-identities-data-sources.md). To use a system manged identity, leave `apiKey` and `authIdentity` blank. The system-managed identity is used automatically. A managed identity must have [Cognitive Services OpenAI User](/azure/ai-services/openai/how-to/role-based-access-control#azure-openai-roles) permissions to send text to Azure OpenAI. |
45+
| `modelName` | This property is required if your skillset is created using the 2024-05-01-preview REST API. Set this property to the deployment name of an Azure OpenAI embedding model deployed on the provider specified through `resourceUri` and identified through `deploymentId`. Currently, the supported values are `text-embedding-ada-002`, `text-embedding-3-large`, and `text-embedding-3-small`. |
46+
| `dimensions` | (Optional, introduced in the 2024-05-01-preview REST API). The dimensions of embeddings that you would like to generate if the model supports reducing the embedding dimensions. Supported ranges are listed below. Defaults to the maximum dimensions for each model if not specified. For skillsets created using the 2023-10-01-preview, dimensions are fixed at 1536. |
47+
48+
## Supported dimensions by `modelName`
49+
50+
The supported dimensions for an Azure OpenAI Embedding skill depend on the `modelName` that is configured.
51+
52+
| `modelName` | Minimum dimensions | Maximum dimensions |
53+
|--------------------|-------------|-------------|
54+
| text-embedding-ada-002 | 1536 | 1536 |
55+
| text-embedding-3-large | 1 | 3072 |
56+
| text-embedding-3-small | 1 | 1536 |
4557

4658
## Skill inputs
4759

@@ -73,6 +85,8 @@ Then your skill definition might look like this:
7385
"description": "Connects a deployed embedding model.",
7486
"resourceUri": "https://my-demo-openai-eastus.openai.azure.com/",
7587
"deploymentId": "my-text-embedding-ada-002-model",
88+
"modelName": "text-embedding-ada-002",
89+
"dimensions": 1536,
7690
"inputs": [
7791
{
7892
"name": "text",
@@ -116,11 +130,20 @@ The output resides in memory. To send this output to a field in the search index
116130
## Best practices
117131

118132
The following are some best practices you need to consider when utilizing this skill:
133+
119134
- If you are hitting your Azure OpenAI TPM (Tokens per minute) limit, consider the [quota limits advisory](../ai-services/openai/quotas-limits.md) so you can address accordingly. Refer to the [Azure OpenAI monitoring](../ai-services/openai/how-to/monitoring.md) documentation for more information about your Azure OpenAI instance performance.
135+
120136
- The Azure OpenAI embeddings model deployment you use for this skill should be ideally separate from the deployment used for other use cases, including the [query vectorizer](vector-search-how-to-configure-vectorizer.md). This helps each deployment to be tailored to its specific use case, leading to optimized performance and identifying traffic from the indexer and the index embedding calls easily.
137+
121138
- Your Azure OpenAI instance should be in the same region or at least geographically close to the region where your AI Search service is hosted. This reduces latency and improves the speed of data transfer between the services.
139+
122140
- If you have a larger than default Azure OpenAI TPM (Tokens per minute) limit as published in [quotas and limits](../ai-services/openai/quotas-limits.md) documentation, open a [support case](../azure-portal/supportability/how-to-create-azure-support-request.md) with the Azure AI Search team, so this can be adjusted accordingly. This helps your indexing process not being unnecessarily slowed down by the documented default TPM limit, if you have higher limits.
123141

142+
- For examples and working code samples using this skill, see the following links:
143+
144+
- [Integrated vectorization (Python)](https://github.com/Azure/azure-search-vector-samples/blob/main/demo-python/code/integrated-vectorization/readme.md)
145+
- [Integrated vectorization (C#)](https://github.com/Azure/azure-search-vector-samples/blob/main/demo-dotnet/DotNetIntegratedVectorizationDemo/readme.md)
146+
- [Integrated vectorization (Java)](https://github.com/Azure/azure-search-vector-samples/blob/main/demo-java/demo-integrated-vectorization/readme.md)
124147

125148
## Errors and warnings
126149

0 commit comments

Comments
 (0)