You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/search/index-ranking-similarity.md
+7-2Lines changed: 7 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,14 +6,19 @@ author: HeidiSteen
6
6
ms.author: heidist
7
7
ms.service: cognitive-search
8
8
ms.topic: how-to
9
-
ms.date: 09/07/2023
9
+
ms.date: 09/25/2023
10
10
---
11
11
12
12
# Configure BM25 relevance scoring
13
13
14
14
In this article, learn how to configure the [BM25 relevance scoring algorithm](https://en.wikipedia.org/wiki/Okapi_BM25) used by Azure Cognitive Search for full text search queries. It also explains how to enable BM25 on older search services.
15
15
16
-
BM25 applies to strings (text) on fields having a "searchable" attribution. At query time, the search engine uses BM25 to calculate a **@searchScore** for each match in a given query. Matching documents are ranked by their search score, with the top results returned in the query response.
16
+
BM25 applies to:
17
+
18
+
+ Queries that use the `search` parameter for full text search, on text fields having a `searchable` attribution.
19
+
+ Scoring is scoped to `searchFields`, or to all `searchable` fields if `searchFields` is null.
20
+
21
+
The search engine uses BM25 to calculate a **@searchScore** for each match in a given query. Matching documents are ranked by their search score, with the top results returned in the query response. It's possible to get some [score variation](index-similarity-and-scoring.md#score-variation) in results, even from the same query executing over the same search index, but usually these variations are small and don't change the overall ranking of results.
17
22
18
23
BM25 has defaults for weighting term frequency and document length. You can customize these properties if the defaults aren't suited to your content. Configuration changes are scoped to individual indexes, which means you can adjust relevance scoring based on the characteristics of each index.
Copy file name to clipboardExpand all lines: articles/search/index-similarity-and-scoring.md
+19-19Lines changed: 19 additions & 19 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,39 +1,36 @@
1
1
---
2
-
title: Relevance and scoring
2
+
title: BM25 relevance scoring
3
3
titleSuffix: Azure Cognitive Search
4
-
description: Explains the concepts of relevance and scoring in Azure Cognitive Search, and what a developer can do to customize the scoring result.
4
+
description: Explains the concepts of BM25 relevance and scoring in Azure Cognitive Search, and what a developer can do to customize the scoring result.
5
5
author: HeidiSteen
6
6
ms.author: heidist
7
7
ms.service: cognitive-search
8
8
ms.topic: conceptual
9
-
ms.date: 08/31/2023
9
+
ms.date: 09/25/2023
10
10
---
11
11
12
-
# Relevance and scoring in Azure Cognitive Search
12
+
# BM25 relevance and scoring for full text search
13
13
14
-
This article explains the relevance and the scoring algorithms used to compute search scores in Azure Cognitive Search. A relevance score is computed for each match found in a [full text search](search-lucene-query-architecture.md), where the strongest matches are assigned higher search scores.
14
+
This article explains the BM25 relevance scoring algorithm used to compute search scores for [full text search](search-lucene-query-architecture.md). BM25 relevance is exclusive to full text search. Filter queries, autocomplete and suggested queries, wildcard search or fuzzy search queries aren't scored or ranked for relevance.
15
15
16
-
Relevance applies to full text search only. Filter queries, autocomplete and suggested queries, wildcard search or fuzzy search queries aren't scored or ranked for relevance.
17
-
18
-
In Azure Cognitive Search, you can tune search relevance and boost search scores through these mechanisms:
16
+
In Azure Cognitive Search, you can configure algorithm parameters, and tune search relevance and boost search scores through these mechanisms:
19
17
20
18
+ Scoring algorithm configuration
21
-
+ Semantic ranking (in preview, described in [this article](semantic-search-overview.md))
22
19
+ Scoring profiles
20
+
+[Semantic ranking](semantic-search-overview.md)
23
21
+ Custom scoring logic enabled through the *featuresMode* parameter
24
22
25
-
> [!NOTE]
26
-
> Matches are scored and ranked from high to low. The score is returned as "@search.score". By default, the top 50 are returned in the response, but you can use the **$top** parameter to return a smaller or larger number of items (up to 1000 in a single response), and **$skip** to get the next set of results.
27
-
28
23
## Relevance scoring
29
24
30
-
Relevance scoring refers to the computation of a search score that serves as an indicator of an item's relevance in the context of the current query. The higher the score, the more relevant the item.
25
+
Relevance scoring refers to the computation of a search score (**@search.score**) that serves as an indicator of an item's relevance in the context of the current query. The range is unbounded. However, the higher the score, the more relevant the item.
26
+
27
+
By default, the top 50 highest scoring matches are returned in the response, but you can use the **$top** parameter to return a smaller or larger number of items (up to 1000 in a single response), and **$skip** to get the next set of results.
31
28
32
29
The search score is computed based on statistical properties of the string input and the query itself. Azure Cognitive Search finds documents that match on search terms (some or all, depending on [searchMode](/rest/api/searchservice/search-documents#query-parameters)), favoring documents that contain many instances of the search term. The search score goes up even higher if the term is rare across the data index, but common within the document. The basis for this approach to computing relevance is known as *TF-IDF or* term frequency-inverse document frequency.
33
30
34
-
Search scores can be repeated throughout a result set. When multiple hits have the same search score, the ordering of the same scored items is undefined and not stable. Run the query again, and you might see items shift position, especially if you are using the free service or a billable service with multiple replicas. Given two items with an identical score, there's no guarantee that one appears first.
31
+
Search scores can be repeated throughout a result set. When multiple hits have the same search score, the ordering of the same scored items is undefined and not stable. Run the query again, and you might see items shift position, especially if you're using the free service or a billable service with multiple replicas. Given two items with an identical score, there's no guarantee that one appears first.
35
32
36
-
If you want to break the tie among repeating scores, you can add an **$orderby** clause to first order by score, then order by another sortable field (for example, `$orderby=search.score() desc,Rating desc`). For more information, see [$orderby](search-query-odata-orderby.md).
33
+
To break the tie among repeating scores, you can add an **$orderby** clause to first order by score, then order by another sortable field (for example, `$orderby=search.score() desc,Rating desc`). For more information, see [$orderby](search-query-odata-orderby.md).
37
34
38
35
> [!NOTE]
39
36
> A `@search.score = 1` indicates an un-scored or un-ranked result set. The score is uniform across all results. Un-scored results occur when the query form is fuzzy search, wildcard or regex queries, or an empty search (`search=*`, sometimes paired with filters, where the filter is the primary means for returning a match).
@@ -76,7 +73,7 @@ For scalability, Azure Cognitive Search distributes each index horizontally thro
76
73
77
74
By default, the score of a document is calculated based on statistical properties of the data *within a shard*. This approach is generally not a problem for a large corpus of data, and it provides better performance than having to calculate the score based on information across all shards. That said, using this performance optimization could cause two very similar documents (or even identical documents) to end up with different relevance scores if they end up in different shards.
78
75
79
-
If you prefer to compute the score based on the statistical properties across all shards, you can do so by adding *scoringStatistics=global* as a [query parameter](/rest/api/searchservice/search-documents) (or add *"scoringStatistics": "global"* as a body parameter of the [query request](/rest/api/searchservice/search-documents)).
76
+
If you prefer to compute the score based on the statistical properties across all shards, you can do so by adding `scoringStatistics=global` as a [query parameter](/rest/api/searchservice/search-documents) (or add `"scoringStatistics": "global"` as a body parameter of the [query request](/rest/api/searchservice/search-documents)).
80
77
81
78
```http
82
79
POST https://[service name].search.windows.net/indexes/hotels/docs/search?api-version=2020-06-30
@@ -86,7 +83,7 @@ POST https://[service name].search.windows.net/indexes/hotels/docs/search?api-ve
86
83
}
87
84
```
88
85
89
-
Using scoringStatistics will ensure that all shards in the same replica provide the same results. That said, different replicas may be slightly different from one another as they are always getting updated with the latest changes to your index. In some scenarios, you may want your users to get more consistent results during a "query session". In such scenarios, you can provide a `sessionId` as part of your queries. The `sessionId` is a unique string that you create to refer to a unique user session.
86
+
Using `scoringStatistics` will ensure that all shards in the same replica provide the same results. That said, different replicas may be slightly different from one another as they're always getting updated with the latest changes to your index. In some scenarios, you may want your users to get more consistent results during a "query session". In such scenarios, you can provide a `sessionId` as part of your queries. The `sessionId` is a unique string that you create to refer to a unique user session.
90
87
91
88
```http
92
89
POST https://[service name].search.windows.net/indexes/hotels/docs/search?api-version=2020-06-30
@@ -96,7 +93,7 @@ POST https://[service name].search.windows.net/indexes/hotels/docs/search?api-ve
96
93
}
97
94
```
98
95
99
-
As long as the same `sessionId` is used, a best-effort attempt will be made to target the same replica, increasing the consistency of results your users will see.
96
+
As long as the same `sessionId` is used, a best-effort attempt is made to target the same replica, increasing the consistency of results your users will see.
100
97
101
98
> [!NOTE]
102
99
> Reusing the same `sessionId` values repeatedly can interfere with the load balancing of the requests across replicas and adversely affect the performance of the search service. The value used as sessionId cannot start with a '_' character.
@@ -111,7 +108,7 @@ A scoring profile is part of the index definition, composed of weighted fields,
111
108
112
109
## featuresMode parameter (preview)
113
110
114
-
[Search Documents](/rest/api/searchservice/preview-api/search-documents) requests have a new [featuresMode](/rest/api/searchservice/preview-api/search-documents#featuresmode) parameter that can provide additional detail about relevance at the field level. Whereas the `@searchScore` is calculated for the document all-up (how relevant is this document in the context of this query), through featuresMode you can get information about individual fields, as expressed in a `@search.features` structure. The structure contains all fields used in the query (either specific fields through **searchFields** in a query, or all fields attributed as **searchable** in an index). For each field, you get the following values:
111
+
[Search Documents](/rest/api/searchservice/preview-api/search-documents) requests have a new [featuresMode](/rest/api/searchservice/preview-api/search-documents#featuresmode) parameter that can provide more detail about relevance at the field level. Whereas the `@searchScore` is calculated for the document all-up (how relevant is this document in the context of this query), through featuresMode you can get information about individual fields, as expressed in a `@search.features` structure. The structure contains all fields used in the query (either specific fields through **searchFields** in a query, or all fields attributed as **searchable** in an index). For each field, you get the following values:
115
112
116
113
+ Number of unique tokens found in the field
117
114
+ Similarity score, or a measure of how similar the content of the field is, relative to the query term
@@ -134,6 +131,9 @@ For a query that targets the "description" and "title" fields, a response that i
134
131
"similarityScore": 1.75451557,
135
132
"termFrequency" : 6
136
133
}
134
+
}
135
+
}
136
+
]
137
137
```
138
138
139
139
You can consume these data points in [custom scoring solutions](https://github.com/Azure-Samples/search-ranking-tutorial) or use the information to debug search relevance problems.
0 commit comments