You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/search/index-similarity-and-scoring.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,7 +12,7 @@ ms.topic: conceptual
12
12
ms.date: 09/27/2023
13
13
---
14
14
15
-
# Relevance scoring for full text search (BM25)
15
+
# Relevance in keyword search (BM25 scoring)
16
16
17
17
This article explains the BM25 relevance scoring algorithm used to compute search scores for [full text search](search-lucene-query-architecture.md). BM25 relevance is exclusive to full text search. Filter queries, autocomplete and suggested queries, wildcard search or fuzzy search queries aren't scored or ranked for relevance.
Copy file name to clipboardExpand all lines: articles/search/search-lucene-query-architecture.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -15,7 +15,7 @@ ms.date: 10/09/2023
15
15
16
16
# Full text search in Azure AI Search
17
17
18
-
Full text search is an approach in information retrieval that matches on plain text content stored in an index. For example, given a query string "hotels in San Diego on the beach", the search engine looks for content containing those terms. To make scans more efficient, query strings undergo lexical analysis: lower-casing all terms, removing stop words like "the", and reducing terms to primitive root forms. When matching terms are found, the search engine retrieves documents, ranks them in order of relevance, and returns the top results.
18
+
Full text search is an approach in information retrieval that matches on plain text stored in an index. For example, given a query string "hotels in San Diego on the beach", the search engine looks for tokenized strings based on those terms. To make scans more efficient, query strings undergo lexical analysis: lower-casing all terms, removing stop words like "the", and reducing terms to primitive root forms. When matching terms are found, the search engine retrieves documents, ranks them in order of relevance, and returns the top results.
19
19
20
20
Query execution can be complex. This article is for developers who need a deeper understanding of how full text search works in Azure AI Search. For text queries, Azure AI Search seamlessly delivers expected results in most scenarios, but occasionally you might get a result that seems "off" somehow. In these situations, having a background in the four stages of Lucene query execution (query parsing, lexical analysis, document matching, scoring) can help you identify specific changes to query parameters or index configuration that produce the desired outcome.
Copy file name to clipboardExpand all lines: articles/search/vector-search-how-to-query.md
+12-6Lines changed: 12 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -44,7 +44,7 @@ All results are returned in plain text, including vectors in fields marked as `r
44
44
45
45
If you aren't sure whether your search index already has vector fields, look for:
46
46
47
-
+ A non-empty`vectorSearch` property containing algorithms and other vector-related configurations embedded in the index schema.
47
+
+ A nonempty`vectorSearch` property containing algorithms and other vector-related configurations embedded in the index schema.
48
48
49
49
+ In the fields collection, look for fields of type `Collection(Edm.Single)` with a `dimensions` attribute, and a `vectorSearch` section in the index.
50
50
@@ -398,7 +398,7 @@ REST API version [**2023-07-01-Preview**](/rest/api/searchservice/index-preview)
398
398
399
399
In the following example, the vector is a representation of this query string: "what Azure services support full text search". The query targets the "contentVector" field. The actual vector has 1536 embeddings, so it's trimmed in this example for readability.
400
400
401
-
In this API version, there's no pre-filter support or `vectorFilterMode` parameter. The filter criteria are applied after the search engine executes the vector query. The set of `"k"` nearest neighbors is retrieved, and then combined with the set of filtered results. As such, the value of `"k"` predetermines the surface over which the filter is applied. For `"k": 10`, the filter is applied to 10 most similar documents. For `"k": 100`, the filter iterates over 100 documents (assuming the index contains 100 documents that are sufficiently similar to the query).
401
+
In this API version, there's no prefilter support or `vectorFilterMode` parameter. The filter criteria are applied after the search engine executes the vector query. The set of `"k"` nearest neighbors is retrieved, and then combined with the set of filtered results. As such, the value of `"k"` predetermines the surface over which the filter is applied. For `"k": 10`, the filter is applied to 10 most similar documents. For `"k": 100`, the filter iterates over 100 documents (assuming the index contains 100 documents that are sufficiently similar to the query).
402
402
403
403
```http
404
404
POST https://{{search-service-name}}.search.windows.net/indexes/{{index-name}}/docs/search?api-version=2023-07-01-Preview
@@ -577,14 +577,20 @@ Search results are composed of "retrievable" fields from your search index. A re
577
577
+ All "retrievable" fields (a REST API default).
578
578
+ Fields explicitly listed in a "select" parameter on the query.
579
579
580
-
The examples in this article used a "select" statement to specify text (non-vector) fields in the response.
580
+
The examples in this article used a "select" statement to specify text (nonvector) fields in the response.
581
581
582
582
> [!NOTE]
583
583
> Vectors aren't designed for readability, so avoid returning them in the response. Instead, choose non-vector fields that are representative of the search document. For example, if the query targets a "descriptionVector" field, return an equivalent text field if you have one ("description") in the response.
584
584
585
-
### Number of results
585
+
### Number of ranked results in a vector query response
586
586
587
-
A query might match to any number of documents, as many as all of them if the search criteria are weak (for example "search=*" for a null query). Because it's seldom practical to return unbounded results, you should specify a maximum for the response:
587
+
A vector query specifies the `k` parameter, which determines how many matches are returned in the results. The search engine always returns `k` number of matches. If `k` is larger than the number of documents in the index, then the number of documents determines the upper limit of what can be returned.
588
+
589
+
If you're familiar with full text search, you know to expect zero results if the index doesn't contain a term or phrase. However, in vector search, the search operation is identifying nearest neighbors, and it will always return `k` results even if the nearest neighbors aren't that similar. So, it's possible to get results for nonsensical or off-topic queries, especially if you aren't using prompts to set boundaries. Less relevant results have a worse similarity score, but they're still the "nearest" vectors if there isn't anything closer. As such, a response with no meaningful results can still return `k` results, but each result's similarity score would be low.
590
+
591
+
A [hybrid approach](hybrid-search-overview.md) that includes full text search can mitigate this problem. Another mitigation is to set a minimum threshold on the search score, but only if the query is a pure single vector query. Hybrid queries aren't conducive to minimum thresholds because the ranges are so much smaller and volatile.
592
+
593
+
Query parameters affecting result count include:
588
594
589
595
+`"k": n` results for vector-only queries
590
596
+`"top": n` results for hybrid queries that include a "search" parameter
@@ -595,7 +601,7 @@ Both "k" and "top" are optional. Unspecified, the default number of results in a
595
601
596
602
Ranking of results is computed by either:
597
603
598
-
+ The similarity metric specified in the index `vectorSearch` section for a vector-only query. Valid values are `cosine`, `euclidean`, and `dotProduct`.
604
+
+ The similarity metric specified in the index `vectorSearch` section for a vector-only query. Valid values are `cosine`, `euclidean`, and `dotProduct`.
599
605
+ Reciprocal Rank Fusion (RRF) if there are multiple sets of search results.
600
606
601
607
Azure OpenAI embedding models use cosine similarity, so if you're using Azure OpenAI embedding models, `cosine` is the recommended metric. Other supported ranking metrics include `euclidean` and `dotProduct`.
Copy file name to clipboardExpand all lines: articles/search/vector-search-overview.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,7 @@
1
1
---
2
2
title: Vector search
3
3
titleSuffix: Azure AI Search
4
-
description: Describes concepts, scenarios, and availability of the vector search feature in Azure AI Search.
4
+
description: Describes concepts, scenarios, and availability of vector capabilities in Azure AI Search.
5
5
6
6
author: robertklee
7
7
ms.author: robertlee
@@ -12,11 +12,11 @@ ms.topic: conceptual
12
12
ms.date: 01/29/2024
13
13
---
14
14
15
-
# Vector stores and vector search in Azure AI Search
15
+
# Vectors in Azure AI Search
16
16
17
17
Vector search is an approach in information retrieval that stores numeric representations of content for search scenarios. Because the content is numeric rather than plain text, the search engine matches on vectors that are the most similar to the query, with no requirement for matching on exact terms.
18
18
19
-
This article is a high-level introduction to vector support in Azure AI Search. It also explains integration with other Azure services and covers [terminology and concepts](#vector-search-concepts) related to vector search development.
19
+
This article is a high-level introduction to vectors in Azure AI Search. It also explains integration with other Azure services and covers [terminology and concepts](#vector-search-concepts) related to vector search development.
20
20
21
21
We recommend this article for background, but if you'd rather get started, follow these steps:
22
22
@@ -110,7 +110,7 @@ In order to create effective embeddings for vector search, it's important to tak
110
110
111
111
### What is the embedding space?
112
112
113
-
*Embedding space* is the corpus for vector queries. Within a search index, it's all of the vector fields populated with embeddings from the same embedding model. Machine learning models create the embedding space by mapping individual words, phrases, or documents (for natural language processing), images, or other forms of data into a representation comprised of a vector of real numbers representing a coordinate in a high-dimensional space. In this embedding space, similar items are located close together, and dissimilar items are located farther apart.
113
+
*Embedding space* is the corpus for vector queries. Within a search index, an embedding space is all of the vector fields populated with embeddings from the same embedding model. Machine learning models create the embedding space by mapping individual words, phrases, or documents (for natural language processing), images, or other forms of data into a representation comprised of a vector of real numbers representing a coordinate in a high-dimensional space. In this embedding space, similar items are located close together, and dissimilar items are located farther apart.
114
114
115
115
For example, documents that talk about different species of dogs would be clustered close together in the embedding space. Documents about cats would be close together, but farther from the dogs cluster while still being in the neighborhood for animals. Dissimilar concepts such as cloud computing would be much farther away. In practice, these embedding spaces are abstract and don't have well-defined, human-interpretable meanings, but the core idea stays the same.
0 commit comments