Skip to content

Commit ef41c8f

Browse files
committed
Updated relevance ranking doc for upgrade scenario
1 parent 56cc6a7 commit ef41c8f

File tree

3 files changed

+67
-62
lines changed

3 files changed

+67
-62
lines changed

articles/search/index-ranking-similarity.md

Lines changed: 38 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -1,36 +1,57 @@
11
---
2-
title: Configure the similarity algorithm
2+
title: Configure BM25 similarity algorithm
33
titleSuffix: Azure Cognitive Search
4-
description: Learn how to enable BM25 on older search services, and how BM25 parameters can be modified to better accommodate the content of your indexes.
4+
description: Enable Okapi BM25 ranking to upgrade the search ranking and relevance behavior on older Azure Search services.
55

6-
author: nitinme
7-
ms.author: nitinme
6+
author: HeidiSteen
7+
ms.author: heidist
88
ms.service: cognitive-search
9-
ms.topic: conceptual
10-
ms.date: 03/12/2021
9+
ms.topic: how-to
10+
ms.date: 06/22/2022
1111
---
1212

1313
# Configure the similarity ranking algorithm in Azure Cognitive Search
1414

15-
Azure Cognitive Search supports two similarity ranking algorithms:
15+
Depending on the age of your search service, Azure Cognitive Search supports two [similarity ranking algorithms](index-similarity-and-scoring.md) for scoring relevance on full text search results:
1616

17-
+ A *classic similarity* algorithm, used by all search services up until July 15, 2020.
18-
+ An implementation of the *Okapi BM25* algorithm, used in all search services created after July 15.
17+
+ An *Okapi BM25* algorithm, used in all search services created after July 15, 2020
18+
+ A *classic similarity* algorithm, used by all search services created before July 15, 2020
1919

20-
BM25 ranking is the new default because it tends to produce search rankings that align better with user expectations. It comes with [parameters](#set-bm25-parameters) for tuning results based on factors such as document size.
20+
BM25 ranking is the default because it tends to produce search rankings that align better with user expectations. It includes [parameters](#set-bm25-parameters) for tuning results based on factors such as document size.
2121

22-
For new services created after July 15, 2020, BM25 is used automatically and is the sole similarity algorithm. If you try to set similarity to ClassicSimilarity on a new service, an HTTP 400 error will be returned because that algorithm is not supported by the service.
22+
For search services created after July 2020, BM25 is the sole similarity algorithm. If you try to set similarity to ClassicSimilarity on a new service, an HTTP 400 error will be returned because that algorithm is not supported by the service.
2323

24-
For older services created before July 15, 2020, classic similarity remains the default algorithm. Older services can upgrade to BM25 on a per-index basis, as explained below. If you are switching from classic to BM25, you can expect to see some differences how search results are ordered.
24+
For older services, classic similarity remains the default algorithm. Older services can [upgrade to BM25](#enable-bm25-scoring-on-older-services) on a per-index basis. When switching from classic to BM25, you can expect to see some differences how search results are ordered.
2525

26-
> [!NOTE]
27-
> Semantic ranking, currently in preview for standard services in selected regions, is an additional step forward in producing more relevant results. Unlike the other algorithms, it is an add-on feature that iterates over an existing result set. For more information, see [Semantic search overview](semantic-search-overview.md) and [Semantic ranking](semantic-ranking.md).
26+
## Set BM25 parameters
27+
28+
BM25 similarity adds two parameters to control the relevance score calculation. To set "similarity" parameters, issue a [Create or Update Index](/rest/api/searchservice/create-index) request as illustrated by the following example.
29+
30+
Because Cognitive Search won't allow updates to a live index, you'll need to take the index offline so that the parameters can be added. Indexing and query requests will fail while the index is offline. The duration of the outage is the amount of time it takes to update the index, usually no more than several seconds. When the update is complete, the index comes back automatically. To take the index offline, append the "allowIndexDowntime=true" URI parameter on the request that sets the "similarity" property:
31+
32+
```http
33+
PUT https://[search service name].search.windows.net/indexes/[index name]?api-version=2020-06-30&allowIndexDowntime=true
34+
{
35+
"similarity": {
36+
"@odata.type": "#Microsoft.Azure.Search.BM25Similarity",
37+
"b" : 0.5,
38+
"k1" : 1.3
39+
}
40+
}
41+
```
42+
43+
### BM25 property reference
44+
45+
| Property | Type | Description |
46+
|----------|------|-------------|
47+
| k1 | number | Controls the scaling function between the term frequency of each matching terms to the final relevance score of a document-query pair. Values are usually 0.0 to 3.0, with 1.2 as the default. </br></br>A value of 0.0 represents a "binary model", where the contribution of a single matching term is the same for all matching documents, regardless of how many times that term appears in the text, while a larger k1 value allows the score to continue to increase as more instances of the same term is found in the document. </br></br>Using a higher k1 value can be important in cases where we expect multiple terms to be part of a search query. In those cases, we might want to favor documents that match many of the different query terms being searched over documents that only match a single one, multiple times. For example, when querying the index for documents containing the terms "Apollo Spaceflight", we might want to lower the score of an article about Greek Mythology that contains the term "Apollo" a few dozen times, without mentions of "Spaceflight", compared to another article that explicitly mentions both "Apollo" and "Spaceflight" a handful of times only. |
48+
| b | number | Controls how the length of a document affects the relevance score. Values are between 0 and 1, with 0.75 as the default. </br></br>A value of 0.0 means the length of the document will not influence the score, while a value of 1.0 means the impact of term frequency on relevance score will be normalized by the document's length. </br></br>Normalizing the term frequency by the document's length is useful in cases where we want to penalize longer documents. In some cases, longer documents (such as a complete novel), are more likely to contain many irrelevant terms, compared to much shorter documents. |
2849

2950
## Enable BM25 scoring on older services
3051

31-
If you are running a search service that was created prior to July 15, 2020, you can enable BM25 by setting a Similarity property on new indexes. The property is only exposed on new indexes, so if want BM25 on an existing index, you must drop and [rebuild the index](search-howto-reindex.md) with a new Similarity property set to "Microsoft.Azure.Search.BM25Similarity".
52+
If you are running a search service that was created from March 2014 through July 15, 2020, you can enable BM25 by setting a "similarity" property on new indexes. The property is only exposed on new indexes, so if want BM25 on an existing index, you must drop and [rebuild the index](search-howto-reindex.md) with a "similarity" property set to "Microsoft.Azure.Search.BM25Similarity".
3253

33-
Once an index exists with a Similarity property, you can switch between BM25Similarity or ClassicSimilarity.
54+
Once an index exists with a "similarity" property, you can switch between `BM25Similarity` or `ClassicSimilarity`.
3455

3556
The following links describe the Similarity property in the Azure SDKs.
3657

@@ -69,32 +90,9 @@ PUT https://[search service name].search.windows.net/indexes/[index name]?api-ve
6990
}
7091
```
7192

72-
## Set BM25 parameters
73-
74-
BM25 similarity adds two user customizable parameters to control the calculated relevance score. You can set BM25 parameters during index creation, or as an index update if the BM25 algorithm was specified during index creation.
75-
76-
| Property | Type | Description |
77-
|----------|------|-------------|
78-
| k1 | number | Controls the scaling function between the term frequency of each matching terms to the final relevance score of a document-query pair. Values are usually 0.0 to 3.0, with 1.2 as the default. </br></br>A value of 0.0 represents a "binary model", where the contribution of a single matching term is the same for all matching documents, regardless of how many times that term appears in the text, while a larger k1 value allows the score to continue to increase as more instances of the same term is found in the document. </br></br>Using a higher k1 value can be important in cases where we expect multiple terms to be part of a search query. In those cases, we might want to favor documents that match many of the different query terms being searched over documents that only match a single one, multiple times. For example, when querying the index for documents containing the terms "Apollo Spaceflight", we might want to lower the score of an article about Greek Mythology that contains the term "Apollo" a few dozen times, without mentions of "Spaceflight", compared to another article that explicitly mentions both "Apollo" and "Spaceflight" a handful of times only. |
79-
| b | number | Controls how the length of a document affects the relevance score. Values are between 0 and 1, with 0.75 as the default. </br></br>A value of 0.0 means the length of the document will not influence the score, while a value of 1.0 means the impact of term frequency on relevance score will be normalized by the document's length. </br></br>Normalizing the term frequency by the document's length is useful in cases where we want to penalize longer documents. In some cases, longer documents (such as a complete novel), are more likely to contain many irrelevant terms, compared to much shorter documents. |
80-
81-
### Setting k1 and b parameters
82-
83-
To set or modify b or k1 values, add them to the BM25 similarity object. Setting or changing these values on an existing index will take the index offline for at least a few seconds, causing active indexing and query requests to fail. Consequently, you should set the "allowIndexDowntime=true" parameter of the update request:
84-
85-
```http
86-
PUT https://[search service name].search.windows.net/indexes/[index name]?api-version=2020-06-30&allowIndexDowntime=true
87-
{
88-
"similarity": {
89-
"@odata.type": "#Microsoft.Azure.Search.BM25Similarity",
90-
"b" : 0.5,
91-
"k1" : 1.3
92-
}
93-
}
94-
```
95-
9693
## See also
9794

95+
+ [Similarity and scoring in Azure Cognitive Search](index-similarity-and-scoring.md)
9896
+ [REST API Reference](/rest/api/searchservice/)
9997
+ [Add scoring profiles to your index](index-add-scoring-profiles.md)
10098
+ [Create Index API](/rest/api/searchservice/create-index)

articles/search/index-similarity-and-scoring.md

Lines changed: 27 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,41 +1,31 @@
11
---
2-
title: Similarity and scoring overview
2+
title: Similarity and scoring
33
titleSuffix: Azure Cognitive Search
4-
description: Explains the concepts of similarity and scoring, and what a developer can do to customize the scoring result.
4+
description: Explains the concepts of similarity and scoring in Azure Cognitive Search, and what a developer can do to customize the scoring result.
55

66
author: HeidiSteen
77
ms.author: heidist
88
ms.service: cognitive-search
99
ms.topic: conceptual
10-
ms.date: 11/30/2021
10+
ms.date: 06/22/2022
1111
---
12-
# Similarity and scoring in Azure Cognitive Search
13-
14-
This article describes the similarity ranking algorithms used by Azure Cognitive Search to determine which matching documents are the most relevant in a [full text search query](search-lucene-query-architecture.md). This article also introduces two related features: *scoring profiles* (criteria for boosting the relevance of a specific match) and the *featuresMode* parameter (unpacks a search score to show more detail).
15-
16-
> [!NOTE]
17-
> A third [semantic re-ranking algorithm](semantic-ranking.md) is currently in public preview. For more information, start with [Semantic search overview](semantic-search-overview.md).
18-
19-
## Similarity ranking algorithms
2012

21-
Azure Cognitive Search supports two similarity ranking algorithms.
13+
# Similarity and scoring in Azure Cognitive Search
2214

23-
| Algorithm | Score | Availability |
24-
|-----------|-------|--------------|
25-
| BM25Similarity | @search.score | Used by all search services created after July 15, 2020. |
26-
| ClassicSimilarity | @search.score | Used by all search services created from March 2014 through July 15, 2020. Older services that use classic by default can [opt in to BM25](index-ranking-similarity.md). |
15+
This article describes relevance scoring and the similarity ranking algorithms used to rank search results in Azure Cognitive Search. A relevance score applies to matches returned in a [full text search query](search-lucene-query-architecture.md). Filter queries, autocomplete and suggested queries, wildcard search or fuzzy search queries are not scored or ranked.
2716

28-
Both BM25 and Classic are TF-IDF-like retrieval functions that use the term frequency (TF) and the inverse document frequency (IDF) as variables to calculate relevance scores for each document-query pair, which is then used for ranking. While conceptually similar to classic, BM25 is rooted in probabilistic information retrieval that produces more intuitive matches, as measured by user research. BM25 also offers advanced customization options, such as allowing the user to decide how the relevance score scales with the term frequency of matched terms.
17+
In Azure Cognitive Search, you can tune search relevance and boost search scores through these mechanisms:
2918

30-
The following video segment fast-forwards to an explanation of the generally available ranking algorithms used in Azure Cognitive Search. You can watch the full video for more background.
31-
32-
> [!VIDEO https://www.youtube.com/embed/Y_X6USgvB1g?version=3&start=322&end=643]
19+
+ Similarity ranking configuration
20+
+ Semantic ranking (in preview, described in [this article](semantic-ranking.md))
21+
+ Scoring profiles
22+
+ Custom scoring logic enabled through the *featuresMode* parameter
3323

3424
## Relevance scoring
3525

36-
Scoring refers to the computation of a search score for every item returned in search results for full text search queries. The score is an indicator of an item's relevance in the context of the current query. The higher the score, the more relevant the item. In search results, items are rank ordered from high to low, based on the search scores calculated for each item. The score is returned in the response as "@search.score" on every document.
26+
Relevance scoring refers to the computation of a search score for every item returned in search results for full text search queries. The score is an indicator of an item's relevance in the context of the current query. The higher the score, the more relevant the item.
3727

38-
By default, the top 50 are returned in the response, but you can use the **$top** parameter to return a smaller or larger number of items (up to 1000 in a single response), and **$skip** to get the next set of results.
28+
In search results, items are rank ordered from high to low, based on the search scores calculated for each item. The score is returned in the response as "@search.score" on every document. By default, the top 50 are returned in the response, but you can use the **$top** parameter to return a smaller or larger number of items (up to 1000 in a single response), and **$skip** to get the next set of results.
3929

4030
The search score is computed based on statistical properties of the data and the query. Azure Cognitive Search finds documents that match on search terms (some or all, depending on [searchMode](/rest/api/searchservice/search-documents#query-parameters)), favoring documents that contain many instances of the search term. The search score goes up even higher if the term is rare across the data index, but common within the document. The basis for this approach to computing relevance is known as *TF-IDF or* term frequency-inverse document frequency.
4131

@@ -46,6 +36,21 @@ If you want to break the tie among repeating scores, you can add an **$orderby**
4636
> [!NOTE]
4737
> A `@search.score = 1` indicates an un-scored or un-ranked result set. The score is uniform across all results. Un-scored results occur when the query form is fuzzy search, wildcard or regex queries, or an empty search (`search=*`, sometimes paired with filters, where the filter is the primary means for returning a match).
4838
39+
## Similarity ranking algorithms
40+
41+
Azure Cognitive Search provides the `BM25Similarity` ranking algorithm. On older search services, you might be using `ClassicSimilarity`.
42+
43+
Both BM25 and Classic are TF-IDF-like retrieval functions that use the term frequency (TF) and the inverse document frequency (IDF) as variables to calculate relevance scores for each document-query pair, which is then used for ranking results. While conceptually similar to classic, BM25 is rooted in probabilistic information retrieval that produces more intuitive matches, as measured by user research.
44+
45+
BM25 offers advanced customization options, such as allowing the user to decide how the relevance score scales with the term frequency of matched terms. For more information, see [Configure the similarity ranking algorithm](index-ranking-similarity.md).
46+
47+
> [!NOTE]
48+
> If you're using a search service that was created before July 2020, the similarity algorithm is most likely the previous default, `ClassicSimilarity`, which you an upgrade on a per-index basis. See [Enable BM25 scoring on older services](index-ranking-similarity.md#enable-bm25-scoring-on-older-services) for details.
49+
50+
The following video segment fast-forwards to an explanation of the generally available ranking algorithms used in Azure Cognitive Search. You can watch the full video for more background.
51+
52+
> [!VIDEO https://www.youtube.com/embed/Y_X6USgvB1g?version=3&start=322&end=643]
53+
4954
<a name="scoring-statistics"></a>
5055

5156
## Scoring statistics and sticky sessions

articles/search/search-dotnet-sdk-migration-version-11.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -268,6 +268,8 @@ In terms of service version updates, where code changes in version 11 relate to
268268

269269
+ [Ordered results](search-query-odata-orderby.md) for null values have changed in this version, with null values appearing first if the sort is `asc` and last if the sort is `desc`. If you wrote code to handle how null values are sorted, you should review and potentially remove that code if it's no longer necessary.
270270

271+
Due to these behavior changes, it's likely that you'll see slight variations in ranked results.
272+
271273
## Next steps
272274

273275
+ [How to use Azure.Search.Documents in a C# .NET Application](search-howto-dotnet-sdk.md)

0 commit comments

Comments
 (0)