Skip to content

Commit db0f1c7

Browse files
Merge pull request #234905 from HeidiSteen/heidist-refresh
[azure search] GH issue about search score range
2 parents 1a7f458 + 7c7b8aa commit db0f1c7

File tree

4 files changed

+45
-29
lines changed

4 files changed

+45
-29
lines changed

articles/search/index-ranking-similarity.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ author: HeidiSteen
77
ms.author: heidist
88
ms.service: cognitive-search
99
ms.topic: how-to
10-
ms.date: 10/14/2022
10+
ms.date: 04/18/2023
1111
---
1212

1313
# Configure relevance scoring
@@ -18,10 +18,10 @@ Configuration changes are scoped to individual indexes, which means you can adju
1818

1919
## Default scoring algorithm
2020

21-
Depending on the age of your search service, Azure Cognitive Search supports two [similarity scoring algorithms](index-similarity-and-scoring.md) for assigning relevance to results in a full text search query:
21+
Depending on the age of your search service, Azure Cognitive Search supports two [similarity scoring algorithms](index-similarity-and-scoring.md) for a full text search query:
2222

23-
+ An *Okapi BM25* algorithm, used in all search services created after July 15, 2020
24-
+ A *classic similarity* algorithm, used by all search services created before July 15, 2020
23+
+ Okapi BM25 algorithm (after July 15, 2020)
24+
+ Classic similarity algorithm (before July 15, 2020)
2525

2626
BM25 ranking is the default because it tends to produce search rankings that align better with user expectations. It includes [parameters](#set-bm25-parameters) for tuning results based on factors such as document size. For search services created after July 2020, BM25 is the only scoring algorithm. If you try to set "similarity" to ClassicSimilarity on a new service, an HTTP 400 error will be returned because that algorithm is not supported by the service.
2727

articles/search/index-similarity-and-scoring.md

Lines changed: 17 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ author: HeidiSteen
77
ms.author: heidist
88
ms.service: cognitive-search
99
ms.topic: conceptual
10-
ms.date: 10/14/2022
10+
ms.date: 04/18/2023
1111
---
1212

1313
# Relevance and scoring in Azure Cognitive Search
@@ -41,7 +41,12 @@ If you want to break the tie among repeating scores, you can add an **$orderby**
4141
4242
## Scoring algorithms in Search
4343

44-
Azure Cognitive Search provides the `BM25Similarity` ranking algorithm. On older search services, you might be using `ClassicSimilarity`.
44+
Azure Cognitive Search provides the following scoring algorithms:
45+
46+
| Algorithm | Usage | Range |
47+
|-----------|-------------|-------|
48+
| BM25Similarity | Built-in algorithm on all search services created after July 2020. You can tune relevance ranking, but on newer services, changing the algorithm isn't supported. | Unbounded range |
49+
|ClassicSimilarity | Used on older search services. You can [opt-in for BM25](index-ranking-similarity.md). | 0 < 1.00 |
4550

4651
Both BM25 and Classic are TF-IDF-like retrieval functions that use the term frequency (TF) and the inverse document frequency (IDF) as variables to calculate relevance scores for each document-query pair, which is then used for ranking results. While conceptually similar to classic, BM25 is rooted in probabilistic information retrieval that produces more intuitive matches, as measured by user research.
4752

@@ -54,6 +59,16 @@ The following video segment fast-forwards to an explanation of the generally ava
5459

5560
> [!VIDEO https://www.youtube.com/embed/Y_X6USgvB1g?version=3&start=322&end=643]
5661
62+
## Score variation
63+
64+
Search scores convey general sense of relevance, reflecting the strength of match relative to other documents in the same result set. But scores aren't always consistent from one query to the next, so as you work with queries, you might notice small discrepancies in how search documents are ordered. There are several explanations for why this might occur.
65+
66+
| Cause | Description |
67+
|-----------|-------------|
68+
| Data volatility | Index content varies as you add, modify, or delete documents. Term frequencies will change as index updates are processed over time, affecting the search scores of matching documents. |
69+
| Multiple replicas | For services using multiple replicas, queries are issued against each replica in parallel. The index statistics used to calculate a search score are calculated on a per-replica basis, with results merged and ordered in the query response. Replicas are mostly mirrors of each other, but statistics can differ due to small differences in state. For example, one replica might have deleted documents contributing to their statistics, which were merged out of other replicas. Typically, differences in per-replica statistics are more noticeable in smaller indexes. For more information about this condition, see [Concepts: search units, replicas, partitions, shards](search-capacity-planning.md#concepts-search-units-replicas-partitions-shards) in the capacity planning documentation. |
70+
| Identical scores | If multiple documents have the same score, any one of them might appear first. |
71+
5772
<a name="scoring-statistics"></a>
5873

5974
## Scoring statistics and sticky sessions

articles/search/search-pagination-page-layout.md

Lines changed: 18 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ author: HeidiSteen
88
ms.author: heidist
99
ms.service: cognitive-search
1010
ms.topic: how-to
11-
ms.date: 11/02/2022
11+
ms.date: 04/18/2023
1212
---
1313

1414
# How to work with search results in Azure Cognitive Search
@@ -27,9 +27,9 @@ Parameters on the query determine:
2727

2828
Results are tabular, composed of fields of either all "retrievable" fields, or limited to just those fields specified in the **`$select`** parameters. Rows are the matching documents.
2929

30-
While a search document might consist of a large number of fields, typically only a few are needed to represent each document in the result set. On a query request, append `$select=<field list>` to specify which fields include in the response. A field must be attributed as "retrievable" in the index to be included in a result.
30+
You can choose which fields are in search results. While a search document might have a large number of fields, typically only a few are needed to represent each document in results. On a query request, append `$select=<field list>` to specify which "retrievable" fields should appear in the response.
3131

32-
Fields that work best include those that contrast and differentiate among documents, providing sufficient information to invite a click-through response on the part of the user. On an e-commerce site, it might be a product name, description, brand, color, size, price, and rating. For the built-in hotels-sample index, it might be the "select" fields in the following example:
32+
Pick fields that offer contrast and differentiation among documents, providing sufficient information to invite a click-through response on the part of the user. On an e-commerce site, it might be a product name, description, brand, color, size, price, and rating. For the built-in hotels-sample index, it might be the "select" fields in the following example:
3333

3434
```http
3535
POST /indexes/hotels-sample-index/docs/search?api-version=2020-06-30
@@ -41,7 +41,7 @@ POST /indexes/hotels-sample-index/docs/search?api-version=2020-06-30
4141
```
4242

4343
> [!NOTE]
44-
> If want to include image files in a result, such as a product photo or logo, store them outside of Azure Cognitive Search, but include a field in your index to reference the image URL in the search document. Sample indexes that support images in the results include the **realestate-sample-us** demo (a built-in sample dataset that you can build easily in the Import Data wizard), and the [New York City Jobs demo app](https://aka.ms/azjobsdemo).
44+
> For images in results, such as a product photo or logo, store them outside of Azure Cognitive Search, but add a field in your index to reference the image URL in the search document. Sample indexes that demonstrate images in the results include the **realestate-sample-us** demo (a built-in sample dataset that you can build easily in the Import Data wizard), and the [New York City Jobs demo app](https://aka.ms/azjobsdemo).
4545
4646
### Tips for unexpected results
4747

@@ -66,7 +66,7 @@ Count won't be affected by routine maintenance or other workloads on the search
6666
6767
## Paging results
6868

69-
By default, the search engine returns up to the first 50 matches. The top 50 are determined by search score, assuming the query is full text search or semantic search. Otherwise, the top 50 are an arbitrary order for exact match queries (where "@searchScore=1.0").
69+
By default, the search engine returns up to the first 50 matches. The top 50 are determined by search score, assuming the query is full text search or semantic search. Otherwise, the top 50 are an arbitrary order for exact match queries (where uniform "@searchScore=1.0" indicates arbitrary ranking).
7070

7171
To control the paging of all documents returned in a result set, add `$top` and `$skip` parameters to the query request. The following list explains the logic.
7272

@@ -103,33 +103,31 @@ Notice that document 2 is fetched twice. This is because the new document 5 has
103103

104104
## Ordering results
105105

106-
In a full text search query, results can be ranked by a search score, a semantic reranker score (if using [semantic search](semantic-search-overview.md)), or by an **`$orderby`** expression in the query request that specifies an explicit sort order.
106+
In a full text search query, results can be ranked by:
107107

108-
Sorting methodologies aren't designed to be used together. For example, if you're sorting with **`$orderby`** for primary sorting, you can't apply a secondary sort based on search score (because the search score will be uniform).
108+
+ a search score
109+
+ a semantic reranker score
110+
+ a sort order on a "sortable" field
109111

110-
### Ordering by search score
112+
You can also boost any matches found in specific fields by adding a scoring profile.
111113

112-
For full text search queries, results are automatically ranked by a search score, calculated based on term frequency and proximity in a document (derived from [TF-IDF](https://en.wikipedia.org/wiki/Tf%E2%80%93idf)), with higher scores going to documents having more or stronger matches on a search term.
114+
### Order by search score
113115

114-
The "@search.score" range is 0 up to (but not including) 1.00. A "@search.score" equal to 1.00 indicates an unscored or unranked result set, where the 1.0 score is uniform across all results. Unscored results occur when the query form is fuzzy search, wildcard or regex queries, or an empty search (`search=*`). If you need to impose a ranking structure over unscored results, an **`$orderby`** expression will help you achieve that objective.
116+
For full text search queries, results are automatically [ranked by a search score](index-similarity-and-scoring.md), calculated based on term frequency and proximity in a document (derived from [TF-IDF](https://en.wikipedia.org/wiki/Tf%E2%80%93idf)), with higher scores going to documents having more or stronger matches on a search term.
115117

116-
Search scores convey general sense of relevance, reflecting the strength of match relative to other documents in the same result set. But scores aren't always consistent from one query to the next, so as you work with queries, you might notice small discrepancies in how search documents are ordered. There are several explanations for why this might occur.
118+
The "@search.score" range is either unbounded, or 0 up to (but not including) 1.00 on older services.
117119

118-
| Cause | Description |
119-
|-----------|-------------|
120-
| Data volatility | Index content varies as you add, modify, or delete documents. Term frequencies will change as index updates are processed over time, affecting the search scores of matching documents. |
121-
| Multiple replicas | For services using multiple replicas, queries are issued against each replica in parallel. The index statistics used to calculate a search score are calculated on a per-replica basis, with results merged and ordered in the query response. Replicas are mostly mirrors of each other, but statistics can differ due to small differences in state. For example, one replica might have deleted documents contributing to their statistics, which were merged out of other replicas. Typically, differences in per-replica statistics are more noticeable in smaller indexes. For more information about this condition, see [Concepts: search units, replicas, partitions, shards](search-capacity-planning.md#concepts-search-units-replicas-partitions-shards) in the capacity planning documentation. |
122-
| Identical scores | If multiple documents have the same score, any one of them might appear first. |
120+
For either algorithm, a "@search.score" equal to 1.00 indicates an unscored or unranked result set, where the 1.0 score is uniform across all results. Unscored results occur when the query form is fuzzy search, wildcard or regex queries, or an empty search (`search=*`). If you need to impose a ranking structure over unscored results, consider an **`$orderby`** expression to achieve that objective.
123121

124-
### Ordering by the semantic reranker
122+
### Order by the semantic reranker
125123

126124
If you're using [semantic search](semantic-search-overview.md), the "@search.rerankerScore" determines the sort order of your results.
127125

128126
The "@search.rerankerScore" range is 1 to 4.00, where a higher score indicates a stronger semantic match.
129127

130-
### Ordering with $orderby
128+
### Order with $orderby
131129

132-
If consistent ordering is an application requirement, you can explicitly define an [**`$orderby`** expression](query-odata-filter-orderby-syntax.md) on a field. Only fields that are indexed as "sortable" can be used to order results.
130+
If consistent ordering is an application requirement, you can define an [**`$orderby`** expression](query-odata-filter-orderby-syntax.md) on a field. Only fields that are indexed as "sortable" can be used to order results.
133131

134132
Fields commonly used in an **`$orderby`** include rating, date, and location. Filtering by location requires that the filter expression calls the [**`geo.distance()` function**](search-query-odata-geo-spatial-functions.md?#order-by-examples), in addition to the field name.
135133

@@ -143,7 +141,7 @@ String fields (Edm.String, Edm.ComplexType subfields) are sorted in either [ASCI
143141

144142
+ Strings that lead with diacritics appear last (Äpfel, Öffnen, Üben)
145143

146-
### Use a scoring profile to influence relevance
144+
### Boost relevance using a scoring profile
147145

148146
Another approach that promotes order consistency is using a [custom scoring profile](index-add-scoring-profiles.md). Scoring profiles give you more control over the ranking of items in search results, with the ability to boost matches found in specific fields. The extra scoring logic can help override minor differences among replicas because the search scores for each document are farther apart. We recommend the [ranking algorithm](index-ranking-similarity.md) for this approach.
149147

articles/search/search-query-odata-search-score-function.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ author: bevloh
88
ms.author: beloh
99
ms.service: cognitive-search
1010
ms.topic: reference
11-
ms.date: 09/16/2021
11+
ms.date: 04/18/2023
1212
translation.priority.mt:
1313
- "de-de"
1414
- "es-es"
@@ -23,11 +23,14 @@ translation.priority.mt:
2323
---
2424
# OData `search.score` function in Azure Cognitive Search
2525

26-
When you send a query to Azure Cognitive Search without the [**$orderby** parameter](search-query-odata-orderby.md), the results that come back will be sorted in descending order by relevance score. Even when you do use **$orderby**, the relevance score will be used to break ties by default. However, sometimes it is useful to use the relevance score as an initial sort criteria, and some other criteria as the tie-breaker. The `search.score` function allows you to do this.
26+
When you send a query to Azure Cognitive Search without the [**$orderby** parameter](search-query-odata-orderby.md), the results that come back will be sorted in descending order by relevance score. Even when you do use **$orderby**, the relevance score is used to break ties by default. However, sometimes it's useful to use the relevance score as an initial sort criteria, and some other criteria as the tie-breaker. The example in this article demonstrates using the `search.score` function for sorting.
27+
28+
> [!NOTE]
29+
> The relevance score is computed by the similarity ranking algorithm, and the range varies depending on which algorithm you use. For more information, see [Relevance and scoring in Azure Cognitive Search](index-similarity-and-scoring.md).
2730
2831
## Syntax
2932

30-
The syntax for `search.score` in **$orderby** is `search.score()`. The function `search.score` does not take any parameters. It can be used with the `asc` or `desc` sort-order specifier, just like any other clause in the **$orderby** parameter. It can appear anywhere in the list of sort criteria.
33+
The syntax for `search.score` in **$orderby** is `search.score()`. The function `search.score` doesn't take any parameters. It can be used with the `asc` or `desc` sort-order specifier, just like any other clause in the **$orderby** parameter. It can appear anywhere in the list of sort criteria.
3134

3235
## Example
3336

0 commit comments

Comments
 (0)