Merge pull request #234905 from HeidiSteen/heidist-refresh

prmerger-automator[bot] · web-flow · commit db0f1c73060b · 2023-04-18T20:56:31.000Z
[azure search] GH issue about search score range
diff --git a/articles/search/index-ranking-similarity.md b/articles/search/index-ranking-similarity.md
@@ -7,7 +7,7 @@ author: HeidiSteen
 ms.author: heidist
 ms.service: cognitive-search
 ms.topic: how-to
-ms.date: 10/14/2022
+ms.date: 04/18/2023
 ---
 
 # Configure relevance scoring
@@ -18,10 +18,10 @@ Configuration changes are scoped to individual indexes, which means you can adju
 
 ## Default scoring algorithm
 
-Depending on the age of your search service, Azure Cognitive Search supports two [similarity scoring algorithms](index-similarity-and-scoring.md) for assigning relevance to results in a full text search query:
+Depending on the age of your search service, Azure Cognitive Search supports two [similarity scoring algorithms](index-similarity-and-scoring.md) for a full text search query:
 
-+ An *Okapi BM25* algorithm, used in all search services created after July 15, 2020
-+ A *classic similarity* algorithm, used by all search services created before July 15, 2020
++ Okapi BM25 algorithm (after July 15, 2020)
++ Classic similarity algorithm (before July 15, 2020)
 
 BM25 ranking is the default because it tends to produce search rankings that align better with user expectations. It includes [parameters](#set-bm25-parameters) for tuning results based on factors such as document size. For search services created after July 2020, BM25 is the only scoring algorithm. If you try to set "similarity" to ClassicSimilarity on a new service, an HTTP 400 error will be returned because that algorithm is not supported by the service.
 
diff --git a/articles/search/index-similarity-and-scoring.md b/articles/search/index-similarity-and-scoring.md
@@ -7,7 +7,7 @@ author: HeidiSteen
 ms.author: heidist
 ms.service: cognitive-search
 ms.topic: conceptual
-ms.date: 10/14/2022
+ms.date: 04/18/2023
 ---
 
 # Relevance and scoring in Azure Cognitive Search
@@ -41,7 +41,12 @@ If you want to break the tie among repeating scores, you can add an **$orderby**
 
 ## Scoring algorithms in Search
 
-Azure Cognitive Search provides the `BM25Similarity` ranking algorithm. On older search services, you might be using `ClassicSimilarity`.
+Azure Cognitive Search provides the following scoring algorithms:
+
+| Algorithm | Usage | Range |
+|-----------|-------------|-------|
+| BM25Similarity | Built-in algorithm on all search services created after July 2020. You can tune relevance ranking, but on newer services, changing the algorithm isn't supported. | Unbounded range |
+|ClassicSimilarity | Used on older search services. You can [opt-in for BM25](index-ranking-similarity.md). | 0 < 1.00 |
 
 Both BM25 and Classic are TF-IDF-like retrieval functions that use the term frequency (TF) and the inverse document frequency (IDF) as variables to calculate relevance scores for each document-query pair, which is then used for ranking results. While conceptually similar to classic, BM25 is rooted in probabilistic information retrieval that produces more intuitive matches, as measured by user research. 
 
@@ -54,6 +59,16 @@ The following video segment fast-forwards to an explanation of the generally ava
 
 > [!VIDEO https://www.youtube.com/embed/Y_X6USgvB1g?version=3&start=322&end=643]
 
+## Score variation
+
+Search scores convey general sense of relevance, reflecting the strength of match relative to other documents in the same result set. But scores aren't always consistent from one query to the next, so as you work with queries, you might notice small discrepancies in how search documents are ordered. There are several explanations for why this might occur.
+
+| Cause | Description |
+|-----------|-------------|
+| Data volatility | Index content varies as you add, modify, or delete documents. Term frequencies will change as index updates are processed over time, affecting the search scores of matching documents. |
+| Multiple replicas | For services using multiple replicas, queries are issued against each replica in parallel. The index statistics used to calculate a search score are calculated on a per-replica basis, with results merged and ordered in the query response. Replicas are mostly mirrors of each other, but statistics can differ due to small differences in state. For example, one replica might have deleted documents contributing to their statistics, which were merged out of other replicas. Typically, differences in per-replica statistics are more noticeable in smaller indexes. For more information about this condition, see [Concepts: search units, replicas, partitions, shards](search-capacity-planning.md#concepts-search-units-replicas-partitions-shards) in the capacity planning documentation. |
+| Identical scores | If multiple documents have the same score, any one of them might appear first.  |
+
 <a name="scoring-statistics"></a>
 
 ## Scoring statistics and sticky sessions
diff --git a/articles/search/search-pagination-page-layout.md b/articles/search/search-pagination-page-layout.md
@@ -8,7 +8,7 @@ author: HeidiSteen
 ms.author: heidist
 ms.service: cognitive-search
 ms.topic: how-to
-ms.date: 11/02/2022
+ms.date: 04/18/2023
 ---
 
 # How to work with search results in Azure Cognitive Search
@@ -27,9 +27,9 @@ Parameters on the query determine:
 
 Results are tabular, composed of fields of either all "retrievable" fields, or limited to just those fields specified in the **`$select`** parameters. Rows are the matching documents.
 
-While a search document might consist of a large number of fields, typically only a few are needed to represent each document in the result set. On a query request, append `$select=<field list>` to specify which fields include in the response. A field must be attributed as "retrievable" in the index to be included in a result. 
+You can choose which fields are in search results. While a search document might have a large number of fields, typically only a few are needed to represent each document in results. On a query request, append `$select=<field list>` to specify which "retrievable" fields should appear in the response.
 
-Fields that work best include those that contrast and differentiate among documents, providing sufficient information to invite a click-through response on the part of the user. On an e-commerce site, it might be a product name, description, brand, color, size, price, and rating. For the built-in hotels-sample index, it might be the "select" fields in the following example:
+Pick fields that offer contrast and differentiation among documents, providing sufficient information to invite a click-through response on the part of the user. On an e-commerce site, it might be a product name, description, brand, color, size, price, and rating. For the built-in hotels-sample index, it might be the "select" fields in the following example:
 
 ```http
 POST /indexes/hotels-sample-index/docs/search?api-version=2020-06-30 
@@ -41,7 +41,7 @@ POST /indexes/hotels-sample-index/docs/search?api-version=2020-06-30
 ```
 
 > [!NOTE]
-> If want to include image files in a result, such as a product photo or logo, store them outside of Azure Cognitive Search, but include a field in your index to reference the image URL in the search document. Sample indexes that support images in the results include the **realestate-sample-us** demo (a built-in sample dataset that you can build easily in the Import Data wizard), and the [New York City Jobs demo app](https://aka.ms/azjobsdemo).
+> For images in results, such as a product photo or logo, store them outside of Azure Cognitive Search, but add a field in your index to reference the image URL in the search document. Sample indexes that demonstrate images in the results include the **realestate-sample-us** demo (a built-in sample dataset that you can build easily in the Import Data wizard), and the [New York City Jobs demo app](https://aka.ms/azjobsdemo).
 
 ### Tips for unexpected results
 
@@ -66,7 +66,7 @@ Count won't be affected by routine maintenance or other workloads on the search
 
 ## Paging results
 
-By default, the search engine returns up to the first 50 matches. The top 50 are determined by search score, assuming the query is full text search or semantic search. Otherwise, the top 50 are an arbitrary order for exact match queries (where "@searchScore=1.0").
+By default, the search engine returns up to the first 50 matches. The top 50 are determined by search score, assuming the query is full text search or semantic search. Otherwise, the top 50 are an arbitrary order for exact match queries (where uniform "@searchScore=1.0" indicates arbitrary ranking).
 
 To control the paging of all documents returned in a result set, add `$top` and `$skip` parameters to the query request. The following list explains the logic.
 
@@ -103,33 +103,31 @@ Notice that document 2 is fetched twice. This is because the new document 5 has
 
 ## Ordering results
 
-In a full text search query, results can be ranked by a search score, a semantic reranker score (if using [semantic search](semantic-search-overview.md)), or by an **`$orderby`** expression in the query request that specifies an explicit sort order.
+In a full text search query, results can be ranked by:
 
-Sorting methodologies aren't designed to be used together. For example, if you're sorting with **`$orderby`** for primary sorting, you can't apply a secondary sort based on search score (because the search score will be uniform).
++ a search score
++ a semantic reranker score
++ a sort order on a "sortable" field
 
-### Ordering by search score
+You can also boost any matches found in specific fields by adding a scoring profile.
 
-For full text search queries, results are automatically ranked by a search score, calculated based on term frequency and proximity in a document (derived from [TF-IDF](https://en.wikipedia.org/wiki/Tf%E2%80%93idf)), with higher scores going to documents having more or stronger matches on a search term.
+### Order by search score
 
-The "@search.score" range is 0 up to (but not including) 1.00. A "@search.score" equal to 1.00 indicates an unscored or unranked result set, where the 1.0 score is uniform across all results. Unscored results occur when the query form is fuzzy search, wildcard or regex queries, or an empty search (`search=*`). If you need to impose a ranking structure over unscored results, an **`$orderby`** expression will help you achieve that objective.
+For full text search queries, results are automatically [ranked by a search score](index-similarity-and-scoring.md), calculated based on term frequency and proximity in a document (derived from [TF-IDF](https://en.wikipedia.org/wiki/Tf%E2%80%93idf)), with higher scores going to documents having more or stronger matches on a search term.
 
-Search scores convey general sense of relevance, reflecting the strength of match relative to other documents in the same result set. But scores aren't always consistent from one query to the next, so as you work with queries, you might notice small discrepancies in how search documents are ordered. There are several explanations for why this might occur.
+The "@search.score" range is either unbounded, or 0 up to (but not including) 1.00 on older services. 
 
-| Cause | Description |
-|-----------|-------------|
-| Data volatility | Index content varies as you add, modify, or delete documents. Term frequencies will change as index updates are processed over time, affecting the search scores of matching documents. |
-| Multiple replicas | For services using multiple replicas, queries are issued against each replica in parallel. The index statistics used to calculate a search score are calculated on a per-replica basis, with results merged and ordered in the query response. Replicas are mostly mirrors of each other, but statistics can differ due to small differences in state. For example, one replica might have deleted documents contributing to their statistics, which were merged out of other replicas. Typically, differences in per-replica statistics are more noticeable in smaller indexes. For more information about this condition, see [Concepts: search units, replicas, partitions, shards](search-capacity-planning.md#concepts-search-units-replicas-partitions-shards) in the capacity planning documentation. |
-| Identical scores | If multiple documents have the same score, any one of them might appear first.  |
+For either algorithm, a "@search.score" equal to 1.00 indicates an unscored or unranked result set, where the 1.0 score is uniform across all results. Unscored results occur when the query form is fuzzy search, wildcard or regex queries, or an empty search (`search=*`). If you need to impose a ranking structure over unscored results, consider an **`$orderby`** expression to achieve that objective.
 
-### Ordering by the semantic reranker
+### Order by the semantic reranker
 
 If you're using [semantic search](semantic-search-overview.md), the "@search.rerankerScore" determines the sort order of your results. 
 
 The "@search.rerankerScore" range is 1 to 4.00, where a higher score indicates a stronger semantic match.
 
-### Ordering with $orderby
+### Order with $orderby
 
-If consistent ordering is an application requirement, you can explicitly define an [**`$orderby`** expression](query-odata-filter-orderby-syntax.md) on a field. Only fields that are indexed as "sortable" can be used to order results.
+If consistent ordering is an application requirement, you can define an [**`$orderby`** expression](query-odata-filter-orderby-syntax.md) on a field. Only fields that are indexed as "sortable" can be used to order results.
 
 Fields commonly used in an **`$orderby`** include rating, date, and location. Filtering by location requires that the filter expression calls the [**`geo.distance()` function**](search-query-odata-geo-spatial-functions.md?#order-by-examples), in addition to the field name.
 
@@ -143,7 +141,7 @@ String fields (Edm.String, Edm.ComplexType subfields) are sorted in either [ASCI
 
 + Strings that lead with diacritics appear last (Äpfel, Öffnen, Üben)
 
-### Use a scoring profile to influence relevance
+### Boost relevance using a scoring profile
 
 Another approach that promotes order consistency is using a [custom scoring profile](index-add-scoring-profiles.md). Scoring profiles give you more control over the ranking of items in search results, with the ability to boost matches found in specific fields. The extra scoring logic can help override minor differences among replicas because the search scores for each document are farther apart. We recommend the [ranking algorithm](index-ranking-similarity.md) for this approach.
 
diff --git a/articles/search/search-query-odata-search-score-function.md b/articles/search/search-query-odata-search-score-function.md
@@ -8,7 +8,7 @@ author: bevloh
 ms.author: beloh
 ms.service: cognitive-search
 ms.topic: reference
-ms.date: 09/16/2021
+ms.date: 04/18/2023
 translation.priority.mt:
   - "de-de"
   - "es-es"
@@ -23,11 +23,14 @@ translation.priority.mt:
 ---
 # OData `search.score` function in Azure Cognitive Search
 
-When you send a query to Azure Cognitive Search without the [**$orderby** parameter](search-query-odata-orderby.md), the results that come back will be sorted in descending order by relevance score. Even when you do use **$orderby**, the relevance score will be used to break ties by default. However, sometimes it is useful to use the relevance score as an initial sort criteria, and some other criteria as the tie-breaker. The `search.score` function allows you to do this.
+When you send a query to Azure Cognitive Search without the [**$orderby** parameter](search-query-odata-orderby.md), the results that come back will be sorted in descending order by relevance score. Even when you do use **$orderby**, the relevance score is used to break ties by default. However, sometimes it's useful to use the relevance score as an initial sort criteria, and some other criteria as the tie-breaker. The example in this article demonstrates using the `search.score` function for sorting.
+
+> [!NOTE]
+> The relevance score is computed by the similarity ranking algorithm, and the range varies depending on which algorithm you use. For more information, see [Relevance and scoring in Azure Cognitive Search](index-similarity-and-scoring.md).
 
 ## Syntax
 
-The syntax for `search.score` in **$orderby** is `search.score()`. The function `search.score` does not take any parameters. It can be used with the `asc` or `desc` sort-order specifier, just like any other clause in the **$orderby** parameter. It can appear anywhere in the list of sort criteria.
+The syntax for `search.score` in **$orderby** is `search.score()`. The function `search.score` doesn't take any parameters. It can be used with the `asc` or `desc` sort-order specifier, just like any other clause in the **$orderby** parameter. It can appear anywhere in the list of sort criteria.
 
 ## Example