Updated relevance ranking doc for upgrade scenario

HeidiSteen · HeidiSteen · commit ef41c8f36944 · 2022-06-22T11:52:56.000-07:00
diff --git a/articles/search/index-ranking-similarity.md b/articles/search/index-ranking-similarity.md
@@ -1,36 +1,57 @@
 ---
-title: Configure the similarity algorithm
+title: Configure BM25 similarity algorithm
 titleSuffix: Azure Cognitive Search
-description: Learn how to enable BM25 on older search services, and how BM25 parameters can be modified to better accommodate the content of your indexes.
+description: Enable Okapi BM25 ranking to upgrade the search ranking and relevance behavior on older Azure Search services.
 
-author: nitinme
-ms.author: nitinme
+author: HeidiSteen
+ms.author: heidist
 ms.service: cognitive-search
-ms.topic: conceptual
-ms.date: 03/12/2021
+ms.topic: how-to
+ms.date: 06/22/2022
 ---
 
 # Configure the similarity ranking algorithm in Azure Cognitive Search
 
-Azure Cognitive Search supports two similarity ranking algorithms:
+Depending on the age of your search service, Azure Cognitive Search supports two [similarity ranking algorithms](index-similarity-and-scoring.md) for scoring relevance on full text search results:
 
-+ A *classic similarity* algorithm, used by all search services up until July 15, 2020.
-+ An implementation of the *Okapi BM25* algorithm, used in all search services created after July 15.
++ An *Okapi BM25* algorithm, used in all search services created after July 15, 2020
++ A *classic similarity* algorithm, used by all search services created before July 15, 2020
 
-BM25 ranking is the new default because it tends to produce search rankings that align better with user expectations. It comes with [parameters](#set-bm25-parameters) for tuning results based on factors such as document size. 
+BM25 ranking is the default because it tends to produce search rankings that align better with user expectations. It includes [parameters](#set-bm25-parameters) for tuning results based on factors such as document size. 
 
-For new services created after July 15, 2020, BM25 is used automatically and is the sole similarity algorithm. If you try to set similarity to ClassicSimilarity on a new service, an HTTP 400 error will be returned because that algorithm is not supported by the service.
+For search services created after July 2020, BM25 is the sole similarity algorithm. If you try to set similarity to ClassicSimilarity on a new service, an HTTP 400 error will be returned because that algorithm is not supported by the service.
 
-For older services created before July 15, 2020, classic similarity remains the default algorithm. Older services can upgrade to BM25 on a per-index basis, as explained below. If you are switching from classic to BM25, you can expect to see some differences how search results are ordered.
+For older services, classic similarity remains the default algorithm. Older services can [upgrade to BM25](#enable-bm25-scoring-on-older-services) on a per-index basis. When switching from classic to BM25, you can expect to see some differences how search results are ordered.
 
-> [!NOTE]
-> Semantic ranking, currently in preview for standard services in selected regions, is an additional step forward in producing more relevant results. Unlike the other algorithms, it is an add-on feature that iterates over an existing result set. For more information, see [Semantic search overview](semantic-search-overview.md) and [Semantic ranking](semantic-ranking.md).
+## Set BM25 parameters
+
+BM25 similarity adds two parameters to control the relevance score calculation. To set "similarity" parameters, issue a [Create or Update Index](/rest/api/searchservice/create-index) request as illustrated by the following example.
+
+Because Cognitive Search won't allow updates to a live index, you'll need to take the index offline so that the parameters can be added. Indexing and query requests will fail while the index is offline. The duration of the outage is the amount of time it takes to update the index, usually no more than several seconds. When the update is complete, the index comes back automatically. To take the index offline, append the "allowIndexDowntime=true" URI parameter on the request that sets the "similarity" property:
+
+```http
+PUT https://[search service name].search.windows.net/indexes/[index name]?api-version=2020-06-30&allowIndexDowntime=true
+{
+    "similarity": {
+        "@odata.type": "#Microsoft.Azure.Search.BM25Similarity",
+        "b" : 0.5,
+        "k1" : 1.3
+    }
+}
+```
+
+### BM25 property reference
+
+| Property | Type | Description |
+|----------|------|-------------|
+| k1 | number | Controls the scaling function between the term frequency of each matching terms to the final relevance score of a document-query pair. Values are usually 0.0 to 3.0, with 1.2 as the default. </br></br>A value of 0.0 represents a "binary model", where the contribution of a single matching term is the same for all matching documents, regardless of how many times that term appears in the text, while a larger k1 value allows the score to continue to increase as more instances of the same term is found in the document. </br></br>Using a higher k1 value can be important in cases where we expect multiple terms to be part of a search query. In those cases, we might want to favor documents that match many of the different query terms being searched over documents that only match a single one, multiple times. For example, when querying the index for documents containing the terms "Apollo Spaceflight", we might want to lower the score of an article about Greek Mythology that contains the term "Apollo" a few dozen times, without mentions of "Spaceflight", compared to another article that explicitly mentions both "Apollo" and "Spaceflight" a handful of times only. |
+| b | number | Controls how the length of a document affects the relevance score. Values are between 0 and 1, with 0.75 as the default. </br></br>A value of 0.0 means the length of the document will not influence the score, while a value of 1.0 means the impact of term frequency on relevance score will be normalized by the document's length. </br></br>Normalizing the term frequency by the document's length is useful in cases where we want to penalize longer documents. In some cases, longer documents (such as a complete novel), are more likely to contain many irrelevant terms, compared to much shorter documents. |
 
 ## Enable BM25 scoring on older services
 
-If you are running a search service that was created prior to July 15, 2020, you can enable BM25 by setting a Similarity property on new indexes. The property is only exposed on new indexes, so if want BM25 on an existing index, you must drop and [rebuild the index](search-howto-reindex.md) with a new Similarity property set to "Microsoft.Azure.Search.BM25Similarity".
+If you are running a search service that was created from March 2014 through July 15, 2020, you can enable BM25 by setting a "similarity" property on new indexes. The property is only exposed on new indexes, so if want BM25 on an existing index, you must drop and [rebuild the index](search-howto-reindex.md) with a "similarity" property set to "Microsoft.Azure.Search.BM25Similarity".
 
-Once an index exists with a Similarity property, you can switch between BM25Similarity or ClassicSimilarity. 
+Once an index exists with a "similarity" property, you can switch between `BM25Similarity` or `ClassicSimilarity`. 
 
 The following links describe the Similarity property in the Azure SDKs. 
 
@@ -69,32 +90,9 @@ PUT https://[search service name].search.windows.net/indexes/[index name]?api-ve
 }
 ```
 
-## Set BM25 parameters
-
-BM25 similarity adds two user customizable parameters to control the calculated relevance score. You can set BM25 parameters during index creation, or as an index update if the BM25 algorithm was specified during index creation.
-
-| Property | Type | Description |
-|----------|------|-------------|
-| k1 | number | Controls the scaling function between the term frequency of each matching terms to the final relevance score of a document-query pair. Values are usually 0.0 to 3.0, with 1.2 as the default. </br></br>A value of 0.0 represents a "binary model", where the contribution of a single matching term is the same for all matching documents, regardless of how many times that term appears in the text, while a larger k1 value allows the score to continue to increase as more instances of the same term is found in the document. </br></br>Using a higher k1 value can be important in cases where we expect multiple terms to be part of a search query. In those cases, we might want to favor documents that match many of the different query terms being searched over documents that only match a single one, multiple times. For example, when querying the index for documents containing the terms "Apollo Spaceflight", we might want to lower the score of an article about Greek Mythology that contains the term "Apollo" a few dozen times, without mentions of "Spaceflight", compared to another article that explicitly mentions both "Apollo" and "Spaceflight" a handful of times only. |
-| b | number | Controls how the length of a document affects the relevance score. Values are between 0 and 1, with 0.75 as the default. </br></br>A value of 0.0 means the length of the document will not influence the score, while a value of 1.0 means the impact of term frequency on relevance score will be normalized by the document's length. </br></br>Normalizing the term frequency by the document's length is useful in cases where we want to penalize longer documents. In some cases, longer documents (such as a complete novel), are more likely to contain many irrelevant terms, compared to much shorter documents. |
-
-### Setting k1 and b parameters
-
-To set or modify b or k1 values, add them to the BM25 similarity object. Setting or changing these values on an existing index will take the index offline for at least a few seconds, causing active indexing and query requests to fail. Consequently, you should set the "allowIndexDowntime=true" parameter of the update request:
-
-```http
-PUT https://[search service name].search.windows.net/indexes/[index name]?api-version=2020-06-30&allowIndexDowntime=true
-{
-    "similarity": {
-        "@odata.type": "#Microsoft.Azure.Search.BM25Similarity",
-        "b" : 0.5,
-        "k1" : 1.3
-    }
-}
-```
-
 ## See also  
 
++ [Similarity and scoring in Azure Cognitive Search](index-similarity-and-scoring.md)
 + [REST API Reference](/rest/api/searchservice/)
 + [Add scoring profiles to your index](index-add-scoring-profiles.md)
 + [Create Index API](/rest/api/searchservice/create-index)
diff --git a/articles/search/index-similarity-and-scoring.md b/articles/search/index-similarity-and-scoring.md
@@ -1,41 +1,31 @@
 ---
-title: Similarity and scoring overview
+title: Similarity and scoring
 titleSuffix: Azure Cognitive Search
-description: Explains the concepts of similarity and scoring, and what a developer can do to customize the scoring result.
+description: Explains the concepts of similarity and scoring in Azure Cognitive Search, and what a developer can do to customize the scoring result.
 
 author: HeidiSteen
 ms.author: heidist
 ms.service: cognitive-search
 ms.topic: conceptual
-ms.date: 11/30/2021
+ms.date: 06/22/2022
 ---
-# Similarity and scoring in Azure Cognitive Search
-
-This article describes the similarity ranking algorithms used by Azure Cognitive Search to determine which matching documents are the most relevant in a [full text search query](search-lucene-query-architecture.md). This article also introduces two related features: *scoring profiles* (criteria for boosting the relevance of a specific match) and the *featuresMode* parameter (unpacks a search score to show more detail).
-
-> [!NOTE]
-> A third [semantic re-ranking algorithm](semantic-ranking.md) is currently in public preview. For more information, start with [Semantic search overview](semantic-search-overview.md).
-
-## Similarity ranking algorithms
 
-Azure Cognitive Search supports two similarity ranking algorithms.
+# Similarity and scoring in Azure Cognitive Search
 
-| Algorithm | Score | Availability |
-|-----------|-------|--------------|
-| BM25Similarity | @search.score | Used by all search services created after July 15, 2020. |
-| ClassicSimilarity | @search.score | Used by all search services created from March 2014 through July 15, 2020. Older services that use classic by default can [opt in to BM25](index-ranking-similarity.md). |
+This article describes relevance scoring and the similarity ranking algorithms used to rank search results in Azure Cognitive Search. A relevance score applies to matches returned in a [full text search query](search-lucene-query-architecture.md). Filter queries, autocomplete and suggested queries, wildcard search or fuzzy search queries are not scored or ranked.
 
-Both BM25 and Classic are TF-IDF-like retrieval functions that use the term frequency (TF) and the inverse document frequency (IDF) as variables to calculate relevance scores for each document-query pair, which is then used for ranking. While conceptually similar to classic, BM25 is rooted in probabilistic information retrieval that produces more intuitive matches, as measured by user research. BM25 also offers advanced customization options, such as allowing the user to decide how the relevance score scales with the term frequency of matched terms.
+In Azure Cognitive Search, you can tune search relevance and boost search scores through these mechanisms:
 
-The following video segment fast-forwards to an explanation of the generally available ranking algorithms used in Azure Cognitive Search. You can watch the full video for more background.
-
-> [!VIDEO https://www.youtube.com/embed/Y_X6USgvB1g?version=3&start=322&end=643]
++ Similarity ranking configuration
++ Semantic ranking (in preview, described in [this article](semantic-ranking.md))
++ Scoring profiles
++ Custom scoring logic enabled through the *featuresMode* parameter
 
 ## Relevance scoring
 
-Scoring refers to the computation of a search score for every item returned in search results for full text search queries. The score is an indicator of an item's relevance in the context of the current query. The higher the score, the more relevant the item. In search results, items are rank ordered from high to low, based on the search scores calculated for each item. The score is returned in the response as "@search.score" on every document.
+Relevance scoring refers to the computation of a search score for every item returned in search results for full text search queries. The score is an indicator of an item's relevance in the context of the current query. The higher the score, the more relevant the item. 
 
-By default, the top 50 are returned in the response, but you can use the **$top** parameter to return a smaller or larger number of items (up to 1000 in a single response), and **$skip** to get the next set of results.
+In search results, items are rank ordered from high to low, based on the search scores calculated for each item. The score is returned in the response as "@search.score" on every document. By default, the top 50 are returned in the response, but you can use the **$top** parameter to return a smaller or larger number of items (up to 1000 in a single response), and **$skip** to get the next set of results.
 
 The search score is computed based on statistical properties of the data and the query. Azure Cognitive Search finds documents that match on search terms (some or all, depending on [searchMode](/rest/api/searchservice/search-documents#query-parameters)), favoring documents that contain many instances of the search term. The search score goes up even higher if the term is rare across the data index, but common within the document. The basis for this approach to computing relevance is known as *TF-IDF or* term frequency-inverse document frequency.
 
@@ -46,6 +36,21 @@ If you want to break the tie among repeating scores, you can add an **$orderby**
 > [!NOTE]
 > A `@search.score = 1` indicates an un-scored or un-ranked result set. The score is uniform across all results. Un-scored results occur when the query form is fuzzy search, wildcard or regex queries, or an empty search (`search=*`, sometimes paired with filters, where the filter is the primary means for returning a match).
 
+## Similarity ranking algorithms
+
+Azure Cognitive Search provides the `BM25Similarity` ranking algorithm. On older search services, you might be using `ClassicSimilarity`.
+
+Both BM25 and Classic are TF-IDF-like retrieval functions that use the term frequency (TF) and the inverse document frequency (IDF) as variables to calculate relevance scores for each document-query pair, which is then used for ranking results. While conceptually similar to classic, BM25 is rooted in probabilistic information retrieval that produces more intuitive matches, as measured by user research. 
+
+BM25 offers advanced customization options, such as allowing the user to decide how the relevance score scales with the term frequency of matched terms. For more information, see [Configure the similarity ranking algorithm](index-ranking-similarity.md).
+
+> [!NOTE]
+> If you're using a search service that was created before July 2020, the similarity algorithm is most likely the previous default, `ClassicSimilarity`, which you an upgrade on a per-index basis. See [Enable BM25 scoring on older services](index-ranking-similarity.md#enable-bm25-scoring-on-older-services) for details.
+
+The following video segment fast-forwards to an explanation of the generally available ranking algorithms used in Azure Cognitive Search. You can watch the full video for more background.
+
+> [!VIDEO https://www.youtube.com/embed/Y_X6USgvB1g?version=3&start=322&end=643]
+
 <a name="scoring-statistics"></a>
 
 ## Scoring statistics and sticky sessions
diff --git a/articles/search/search-dotnet-sdk-migration-version-11.md b/articles/search/search-dotnet-sdk-migration-version-11.md
@@ -268,6 +268,8 @@ In terms of service version updates, where code changes in version 11 relate to
 
 + [Ordered results](search-query-odata-orderby.md) for null values have changed in this version, with null values appearing first if the sort is `asc` and last if the sort is `desc`. If you wrote code to handle how null values are sorted, you should review and potentially remove that code if it's no longer necessary.
 
+Due to these behavior changes, it's likely that you'll see slight variations in ranked results.
+
 ## Next steps
 
 + [How to use Azure.Search.Documents in a C# .NET Application](search-howto-dotnet-sdk.md)