You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/search/search-relevance-overview.md
+10-5Lines changed: 10 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,14 +13,19 @@ ms.date: 07/23/2025
13
13
14
14
# Relevance in Azure AI Search
15
15
16
-
In a query operation, the relevance of any given result is determined by a ranking algorithm that evaluates the strength of a match based on how closely the indexed content and the query align. An algorithm assigns a score, and results are ranked by that score and returned in the response.
16
+
In a query operation, the relevance of any given result is determined by a ranking algorithm that evaluates the strength of a match based on how closely the query corresponds to an indexed document. When a match is found, an algorithm assigns a score, and results are ranked by that score and the topmost results are returned in the response.
17
17
18
18
Ranking occurs whenever the query request includes full text or vector queries. It doesn't occur if the query invokes strict pattern matching, such as a filter-only query or a specialized query form like autocomplete, suggestions, geospatial search, fuzzy search, or regular expression search. A uniform search score of 1.0 indicates the absence of a ranking algorithm.
19
19
20
-
***Relevance tuning*** can be used to boost search scores based on extra criteria such as freshness or proximity. In Azure AI Search, relevance tuning is primarily directed at textual and numeric (nonvector) content when you apply a [scoring profile](#custom-boosting-logic-using-scoring-profiles) or invoke the [semantic ranker](semantic-search-overview.md).
20
+
## Relevance tuning
21
21
22
-
> [!NOTE]
23
-
> In Azure AI Search, there's no explicit relevance tuning capabilities for vector content, but you can experiment between Hierarchical Navigable Small World (HNSW) and exhaustive K-nearest neighbors (KNN) to see if one algorithm outperforms the other for your scenario. HNSW graphing with an exhaustive KNN override at query time is the most flexible approach for comparison testing. You can also experiment with various embedding models to see which ones produce higher quality results.
22
+
***Relevance tuning*** is a technique for boosting search scores based on extra criteria such as weighted fields, freshness, or proximity. In Azure AI Search, relevance tuning options vary based on query type:
23
+
24
+
+ For textual and numeric (nonvector) content in keyword or hybrid search, you can tune relevance through [scoring profiles](#custom-boosting-logic-using-scoring-profiles) or invoking the [semantic ranker](semantic-search-overview.md).
25
+
26
+
+ For vector content in a hybrid query, you can [weight a vector field](hybrid-search-ranking.md#weighted-scores) to boost the importance of the vector component relative to the text component of the hybrid query.
27
+
28
+
+ For pure vector queries, you can experiment between Hierarchical Navigable Small World (HNSW) and exhaustive K-nearest neighbors (KNN) to see if one algorithm outperforms the other for your scenario. HNSW graphing with an exhaustive KNN override at query time is the most flexible approach for comparison testing. You can also experiment with various embedding models to see which ones produce higher quality results. Finally, remember that a hybrid query or a vector query on documents that include nonvector fields are in-scope for relevance tuning, so it's just the vector fields themselves that can't participate in a relevance tuning effort.
24
29
25
30
## Levels of ranking
26
31
@@ -42,7 +47,7 @@ Scoring logic applies to text and numeric nonvector content. You can use scoring
For standalone text queries, scoring profiles identify the top 1,000 matches in a [BM25-ranked search](index-similarity-and-scoring.md), with the top 50 matches returned in the response.
Copy file name to clipboardExpand all lines: articles/search/tutorial-document-extraction-image-verbalization.md
+15-18Lines changed: 15 additions & 18 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -16,14 +16,16 @@ ms.date: 05/29/2025
16
16
17
17
# Tutorial: Verbalize images using generative AI
18
18
19
-
Azure AI Search can extract and index both text and images from PDF documents stored in Azure Blob Storage. This tutorial shows you how to build a multimodal indexing pipeline by describing visual content in natural language and embedding it alongside document text.
19
+
Azure AI Search can extract and index both text and images from PDF documents stored in Azure Blob Storage. This tutorial shows you how to build a multimodal indexing pipeline that includes steps for describing visual content in natural language and using the generated descriptions in your searchable index.
20
20
21
-
From the source document, each image is passed to the [GenAI Prompt skill (preview)](cognitive-search-skill-genai-prompt.md) to generate a concise textual description. These descriptions, along with the original document text, are then embedded into vector representations using Azure OpenAI’s text-embedding-3-large model. The result is a single index containing semantically searchable content from both modalities: text and verbalized images.
21
+
From the source document, each image is passed to the [GenAI Prompt skill (preview)](cognitive-search-skill-genai-prompt.md)that calls a chat completion model to generate a concise textual description. These descriptions, along with the original document text, are then embedded into vector representations using Azure OpenAI’s text-embedding-3-large model. The result is a single index containing semantically searchable content from both modalities: text and verbalized images.
22
22
23
23
In this tutorial, you use:
24
24
25
25
+ A 36-page PDF document that combines rich visual content, such as charts, infographics, and scanned pages, with traditional text.
26
26
27
+
+ An indexer and skillset to create an indexing pipeline that includes AI enrichment through skills.
28
+
27
29
+ The [Document Extraction skill](cognitive-search-skill-document-extraction.md) for extracting normalized images and text.
28
30
29
31
+ The [GenAI Prompt skill (preview)](cognitive-search-skill-genai-prompt.md) that calls a chat completion model to create descriptions of visual content.
@@ -35,23 +37,15 @@ This tutorial demonstrates a lower-cost approach for indexing multimodal content
35
37
> [!NOTE]
36
38
> Setting `imageAction` to `generateNormalizedImages` results in image extraction, which is an extra charge. For more information, see [Azure AI Search pricing](https://azure.microsoft.com/pricing/details/search/) for image extraction.
37
39
38
-
<!-- Using a REST client and the [Search REST APIs](/rest/api/searchservice/) you will:
39
-
40
-
> [!div class="checklist"]
41
-
> + Set up sample data and configure an `azureblob` data source
42
-
> + Create an index with support for text and image embeddings
43
-
> + Define a skillset with extraction, captioning, and embedding steps
44
-
> + Create and run an indexer to process and index content
45
-
> + Search the index you just created
46
-
-->
47
-
48
40
## Prerequisites
49
41
50
-
+An Azure account with an active subscription. [Create an account for free](https://azure.microsoft.com/free/?WT.mc_id=A261C142F).
42
+
+[Azure AI Search](search-create-service-portal.md). [Configure your search service](search-manage.md) for role-based access control and a managed identity. Your service must be on the Basic tier or higher. This tutorial isn't supported on the Free tier. It must also be in the same region as your multi-service account.
51
43
52
-
+[Azure Storage](/azure/storage/common/storage-account-create), used for storing sample data.
44
+
+[Azure Storage](/azure/storage/common/storage-account-create), used for storing sample data and for creating a [knowledge store](knowledge-store-concept-intro.md).
53
45
54
-
+[Azure AI Search](search-what-is-azure-search.md), Basic pricing tier or higher, with a managed identity. [Create a service](search-create-service-portal.md) or [find an existing service](https://portal.azure.com/#blade/HubsExtension/BrowseResourceBlade/resourceType/Microsoft.Search%2FsearchServices) in your current subscription.
46
+
+ A chat completion model hosted in Azure AI Foundry or another source. The model is used to verbalize image content. You provide the URI to the hosted model in the GenAI Prompt skill definition.
47
+
48
+
+ A text embedding model deployed in Azure AI Foundry. The model is used to vectorize text content pull from source documents and the image descriptions generated by the chat completion model. For integrated vectorization, the embedding model must be located in Azure AI Foundry, and it must be either text-embedding-ada-002, text-embedding-3-large, or text-embedding-3-small. If you want to use an external embedding model, use a custom skill instead of the Azure OpenAI embedding skill.
55
49
56
50
+[Visual Studio Code](https://code.visualstudio.com/download) with a [REST client](https://marketplace.visualstudio.com/items?itemName=humao.rest-client).
57
51
@@ -76,6 +70,7 @@ Download the following sample PDF:
1. For connections made using a user-assigned managed identity. Provide a connection string that contains a ResourceId, with no account key or password. The ResourceId must include the subscription ID of the storage account, the resource group of the storage account, and the storage account name. Provide an identity using the syntax shown in the following example. Set userAssignedIdentity to the user-assigned managed identity The connection string is similar to the following example:
80
75
81
76
```json
@@ -338,7 +333,9 @@ Key points:
338
333
339
334
## Create a skillset
340
335
341
-
[Create Skillset (REST)](/rest/api/searchservice/skillsets/create) creates a search index on your search service. An index specifies all the parameters and their attributes.
336
+
[Create Skillset (REST)](/rest/api/searchservice/skillsets/create) creates a skillset on your search service. A skillset defines the operations that chunk and embed content prior to indexing. This skillset uses the built-in Document Extraction skill to extract text and images. It uses Text Split skill to chunk large text. It uses Azure OpenAI Embedding skill to vectorize text content.
337
+
338
+
The skillset also performs actions specific to images. It uses the GenAI Prompt skill to generate image descriptions. It also creates a knowledge store that stores intact images so that you can return them in a query.
342
339
343
340
```http
344
341
### Create a skillset
@@ -353,7 +350,7 @@ POST {{baseUrl}}/skillsets?api-version=2025-05-01-preview HTTP/1.1
Copy file name to clipboardExpand all lines: articles/search/tutorial-document-extraction-multimodal-embeddings.md
+7-15Lines changed: 7 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -22,6 +22,8 @@ In this tutorial, you use:
22
22
23
23
+ A 36-page PDF document that combines rich visual content, such as charts, infographics, and scanned pages, with traditional text.
24
24
25
+
+ An indexer and skillset to create an indexing pipeline that includes AI enrichment through skills.
26
+
25
27
+ The [Document Extraction skill](cognitive-search-skill-document-extraction.md) for extracting normalized images and text.
26
28
27
29
+ The [Azure AI Vision multimodal embeddings skill](cognitive-search-skill-vision-vectorize.md) to vectorize text and images.
@@ -33,23 +35,13 @@ This tutorial demonstrates a lower-cost approach for indexing multimodal content
33
35
> [!NOTE]
34
36
> Setting `imageAction` to `generateNormalizedImages` results in image extraction, which is an extra charge. For more information, see [Azure AI Search pricing](https://azure.microsoft.com/pricing/details/search/) for image extraction.
35
37
36
-
<!-- Using a REST client and the [Search REST APIs](/rest/api/searchservice/) you will:
37
-
38
-
> [!div class="checklist"]
39
-
> + Set up sample data and configure an `azureblob` data source
40
-
> + Create an index with support for text and image embeddings
41
-
> + Define a skillset with extraction and embedding steps
42
-
> + Create and run an indexer to process and index content
43
-
> + Search the index you just created
44
-
-->
45
-
46
38
## Prerequisites
47
39
48
-
+[Azure Storage](/azure/storage/common/storage-account-create), used for storing sample data.
40
+
+[Azure AI Search](search-create-service-portal.md). [Configure your search service](search-manage.md) for role-based access control and a managed identity. Your service must be on the Basic tier or higher. This tutorial isn't supported on the Free tier. It must also be in the same region as your multi-service account.
41
+
42
+
+[Azure Storage](/azure/storage/common/storage-account-create), used for storing sample data and for creating a [knowledge store](knowledge-store-concept-intro.md).
49
43
50
44
+ An [Azure AI services multi-service account](/azure/ai-services/multi-service-resource#azure-ai-services-resource-for-azure-ai-search-skills) that provides Azure AI Vision for multimodal embeddings. You must use an Azure AI multi-service account for this task. For an updated list of regions that provide multimodal embeddings, see the [Azure AI Vision documentation](/azure/ai-services/computer-vision/overview-image-analysis#region-availability).
51
-
+[Azure AI Search](search-create-service-portal.md), with a managed identity. [Create a service](search-create-service-portal.md) or [find an existing service](https://portal.azure.com/#blade/HubsExtension/BrowseResourceBlade/resourceType/Microsoft.Search%2FsearchServices) in your current subscription.
52
-
> Your service must be on the Basic tier or higher—this tutorial isn't supported on the Free tier. It must also be in the same region as your multi-service account.
53
45
54
46
+[Visual Studio Code](https://code.visualstudio.com/download) with a [REST client](https://marketplace.visualstudio.com/items?itemName=humao.rest-client).
55
47
@@ -335,7 +327,7 @@ Key points:
335
327
336
328
## Create a skillset
337
329
338
-
[Create Skillset (REST)](/rest/api/searchservice/skillsets/create) creates a search index on your search service. An index specifies all the parameters and their attributes.
330
+
[Create Skillset (REST)](/rest/api/searchservice/skillsets/create) creates a skillset on your search service. A skillset defines the operations that chunk and embed content prior to indexing. This skillset uses the built-in Document Extraction skill to extract text and images. It uses Text Split skill to chunk large text. It uses Azure AI Vision multimodal embeddings skill to vectorize image and text content.
339
331
340
332
```http
341
333
### Create a skillset
@@ -350,7 +342,7 @@ POST {{baseUrl}}/skillsets?api-version=2025-05-01-preview HTTP/1.1
Copy file name to clipboardExpand all lines: articles/search/tutorial-document-layout-image-verbalization.md
+11-11Lines changed: 11 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -24,26 +24,23 @@ In this tutorial, you use:
24
24
25
25
+ A 36-page PDF document that combines rich visual content, such as charts, infographics, and scanned pages, with traditional text.
26
26
27
+
+ An indexer and skillset to create an indexing pipeline that includes AI enrichment through skills.
28
+
27
29
+ The [Document Layout skill (preview)](cognitive-search-skill-document-intelligence-layout.md) for extracting text and normalized images with its `locationMetadata` from various documents, such as page numbers or bounding regions.
28
30
29
31
+ The [GenAI Prompt skill (preview)](cognitive-search-skill-genai-prompt.md) that calls a chat completion model to create descriptions of visual content.
30
32
31
33
+ A search index configured to store extracted text and image verbalizations. Some content is vectorized for vector-based similarity search.
32
34
33
-
<!-- Using a REST client and the [Search REST APIs](/rest/api/searchservice/), you will:
35
+
## Prerequisites
34
36
35
-
> [!div class="checklist"]
36
-
> + Set up sample data and configure an `azureblob` data source
37
-
> + Create an index with support for text and image embeddings
38
-
> + Define a skillset with extraction, captioning, embedding and knowleage store file projection steps
39
-
> + Create and run an indexer to process and index content
40
-
> + Search the index you just created -->
37
+
+[Azure AI Search](search-create-service-portal.md). [Configure your search service](search-manage.md) for role-based access control and a managed identity. Your service must be on the Basic tier or higher. This tutorial isn't supported on the Free tier. It must also be in the same region as your multi-service account.
41
38
42
-
## Prerequisites
39
+
+[Azure Storage](/azure/storage/common/storage-account-create), used for storing sample data and for creating a [knowledge store](knowledge-store-concept-intro.md).
43
40
44
-
+[Azure Storage](/azure/storage/common/storage-account-create), used for storing sample data.
41
+
+A chat completion model hosted in Azure AI Foundry or another source. The model is used to verbalize image content. You provide the URI to the hosted model in the GenAI Prompt skill definition.
45
42
46
-
+[Azure AI Search](search-what-is-azure-search.md). [Create a service](search-create-service-portal.md) or [find an existing service](https://portal.azure.com/#blade/HubsExtension/BrowseResourceBlade/resourceType/Microsoft.Search%2FsearchServices) in your current subscription. Your service must be on the Basic tier or higher. This tutorial isn't supported on the Free tier.
43
+
+A text embedding model deployed in Azure AI Foundry. The model is used to vectorize text content pull from source documents and the image descriptions generated by the chat completion model. For integrated vectorization, the embedding model must be located in Azure AI Foundry, and it must be either text-embedding-ada-002, text-embedding-3-large, or text-embedding-3-small. If you want to use an external embedding model, use a custom skill instead of the Azure OpenAI embedding skill.
47
44
48
45
+[Visual Studio Code](https://code.visualstudio.com/download) with a [REST client](https://marketplace.visualstudio.com/items?itemName=humao.rest-client).
49
46
@@ -299,7 +296,10 @@ Key points:
299
296
300
297
## Create a skillset
301
298
302
-
[Create Skillset (REST)](/rest/api/searchservice/skillsets/create) creates a search index on your search service. An index specifies all the parameters and their attributes.
299
+
[Create Skillset (REST)](/rest/api/searchservice/skillsets/create) creates a skillset on your search service. A skillset defines the operations that chunk and embed content prior to indexing. This skillset uses the Document Layout skill to extract text and images, preserving location metadata which is useful for citations in RAG applications. It uses Azure OpenAI Embedding skill to vectorize text content.
300
+
301
+
The skillset also performs actions specific to images. It uses the GenAI Prompt skill to generate image descriptions. It also creates a knowledge store that stores intact images so that you can return them in a query.
0 commit comments