Merge pull request #112949 from HeidiSteen/heidist-master

garycentric · web-flow · commit ae2148c16d06 · 2020-04-28T10:31:03.000-07:00
[azure search]  Relevance scoring overview doc
diff --git a/articles/search/TOC.yml b/articles/search/TOC.yml
@@ -262,9 +262,11 @@
     items:
     - name: Work with results
       href: search-pagination-page-layout.md
-    - name: Relevance tuning (scoring profiles)    
+    - name: Relevance scoring
+      href: index-similarity-and-scoring.md
+    - name: Scoring profiles 
       href: index-add-scoring-profiles.md
-    - name: Relevance tuning (similarity algorithm)  
+    - name: Similarity algorithm
       href: index-ranking-similarity.md
   - name: Plan
     items:
diff --git a/articles/search/index-add-scoring-profiles.md b/articles/search/index-add-scoring-profiles.md
@@ -23,7 +23,7 @@ translation.priority.mt:
 ---
 # Add scoring profiles to an Azure Cognitive Search index
 
-  Scoring refers to the computation of a *search score* for every item returned in search results. The score is an indicator of an item's relevance in the context of the current search operation. The higher the score, the more relevant the item. In search results, items are rank ordered from high to low, based on the search scores calculated for each item.  
+*Scoring* computes a search score for each item in a rank ordered result set. Every item in a search result set is assigned a search score, then ranked highest to lowest.
 
  Azure Cognitive Search uses default scoring to compute an initial score, but you can customize the calculation through a *scoring profile*. Scoring profiles give you greater control over the ranking of items in search results. For example, you might want to boost items based on their revenue potential, promote newer items, or perhaps boost items that have been in inventory too long.  
 
@@ -280,6 +280,7 @@ The search score is computed based on statistical properties of the data and the
  For more examples, see [XML Schema: Datatypes (W3.org web site)](https://www.w3.org/TR/xmlschema11-2/#dayTimeDuration).  
 
 ## See also  
- [Azure Cognitive Search REST](https://docs.microsoft.com/rest/api/searchservice/)   
- [Create Index &#40;Azure Cognitive Search REST API&#41;](https://docs.microsoft.com/rest/api/searchservice/create-index)   
- [Azure Cognitive Search .NET SDK](https://docs.microsoft.com/dotnet/api/overview/azure/search?view=azure-dotnet)  
+
++ [REST API Reference](https://docs.microsoft.com/rest/api/searchservice/)   
++ [Create Index API](https://docs.microsoft.com/rest/api/searchservice/create-index)   
++ [Azure Cognitive Search .NET SDK](https://docs.microsoft.com/dotnet/api/overview/azure/search?view=azure-dotnet)  
diff --git a/articles/search/index-ranking-similarity.md b/articles/search/index-ranking-similarity.md
@@ -14,17 +14,19 @@ ms.date: 03/13/2020
 # Ranking algorithm in Azure Cognitive Search
 
 > [!IMPORTANT]
-> Starting July 15, 2020, newly created search services will use the BM25 ranking function, which has proven in most cases to provide search rankings that align better with user expectations than the current default ranking.  Beyond superior ranking, BM25 also enables configuration options for tuning results based on factors such as document size.  
+> Starting July 15, 2020, newly created search services will use the BM25 ranking function automatically, which has proven in most cases to provide search rankings that align better with user expectations than the current default ranking. Beyond superior ranking, BM25 also enables configuration options for tuning results based on factors such as document size.  
 >
-> With this change, you will most likely see slight changes in the ordering of your search results.   For those that want to test the impact of this change, we have made available in the 2019-05-06-Preview API an ability to enable BM25 scoring on new indexes.  
+> With this change, you will most likely see slight changes in the ordering of your search results. For those who want to test the impact of this change, the BM25 algorithm is available in the api-version 2019-05-06-Preview.  
 
-This article describes how you can update a service created before July 15, 2020 to to use the new BM25 ranking algorithm.
+This article describes how you can use the new BM25 ranking algorithm on existing search services for new indexes created and queried using the preview API.
 
-Azure Cognitive Search will be using the official Lucene implementation of the Okapi BM25 algorithm, *BM25Similarity*, which will replace the previously used *ClassicSimilarity* implementation. Like the older ClassicSimilarity algorithm, BM25Similarity is a TF-IDF-like retrieval function which uses the term frequency (TF) and the inverse document frequency (IDF) as variables to calculate relevance scores for each document-query pair, which is then used for ranking. While conceptually similar to the older Classic Similarity algorithm, BM25 takes its root in probabilistic information retrieval to improve upon it. BM25 also offers advanced customization options, such as allowing the user to decide how the relevance score scales with the term frequency of matched terms.
+Azure Cognitive Search is in the process of adopting the official Lucene implementation of the Okapi BM25 algorithm, *BM25Similarity*, which will replace the previously used *ClassicSimilarity* implementation. Like the older ClassicSimilarity algorithm, BM25Similarity is a TF-IDF-like retrieval function that uses the term frequency (TF) and the inverse document frequency (IDF) as variables to calculate relevance scores for each document-query pair, which is then used for ranking. 
+
+While conceptually similar to the older Classic Similarity algorithm, BM25 takes its root in probabilistic information retrieval to improve upon it. BM25 also offers advanced customization options, such as allowing the user to decide how the relevance score scales with the term frequency of matched terms.
 
 ## How to test BM25 today
 
-When you create a new index, you can set a "similarity" property. You will need to use the *2019-05-06-Preview* version, as shown below.
+When you create a new index, you can set a **similarity** property to specify the algorithm. You will need to use the `api-version=2019-05-06-Preview`, as shown below.
 
 ```
 PUT https://[search service name].search.windows.net/indexes/[index name]?api-version=2019-05-06-Preview
@@ -53,16 +55,19 @@ PUT https://[search service name].search.windows.net/indexes/[index name]?api-ve
 }
 ```
 
-For services created before July 15, 2020: If the similarity is omitted or set to null, the index will use the old classic similarity algorithm.
+The **similarity** property is useful during this interim period when both algorithms are available, on existing services only. 
 
-For services created after July 15, 2020: If the similarity is omitted or set to null, the index will use the new BM25 similarity algorithm.
+| Property | Description |
+|----------|-------------|
+| similarity | Optional. Valid values include *"#Microsoft.Azure.Search.ClassicSimilarity"* or *"#Microsoft.Azure.Search.BM25Similarity"*. <br/> Requires `api-version=2019-05-06-Preview` or later on a search service created prior to July 15, 2020. |
 
-You can also explicitly set the similarity value to be one of the following two values:   *"#Microsoft.Azure.Search.ClassicSimilarity"* or *"#Microsoft.Azure.Search.BM25Similarity"*.
+For new services created after July 15, 2020, BM25 is used automatically and is the sole similarity algorithm. If you try to set **similarity** to `ClassicSimilarity` on a new service, a 400 error will be returned because that algorithm is not supported on a new service.
 
+For existing services created before July 15, 2020, the Classic similarity remains the default algorithm. If the **similarity** property is omitted or set to null, the index uses the Classic algorithm. If you want to use the new algorithm, you will need to set **similarity** as described above.
 
 ## BM25 similarity parameters
 
-BM25 similarity adds two user customizable parameters to control the calculated relevance score:
+BM25 similarity adds two user customizable parameters to control the calculated relevance score.
 
 ### k1
 
@@ -94,10 +99,9 @@ The similarity algorithm can only be set at index creation time. This means the
 PUT https://[search service name].search.windows.net/indexes/[index name]?api-version=[api-version]&allowIndexDowntime=true
 ```
 
-
 ## See also  
 
- [Azure Cognitive Search REST](https://docs.microsoft.com/rest/api/searchservice/)   
- [Add scoring profiles to your index](index-add-scoring-profiles.md)    
- [Create Index &#40;Azure Cognitive Search REST API&#41;](https://docs.microsoft.com/rest/api/searchservice/create-index)   
-  [Azure Cognitive Search .NET SDK](https://docs.microsoft.com/dotnet/api/overview/azure/search?view=azure-dotnet)  
++ [REST API Reference](https://docs.microsoft.com/rest/api/searchservice/)   
++ [Add scoring profiles to your index](index-add-scoring-profiles.md)    
++ [Create Index API](https://docs.microsoft.com/rest/api/searchservice/create-index)   
++ [Azure Cognitive Search .NET SDK](https://docs.microsoft.com/dotnet/api/overview/azure/search?view=azure-dotnet)  
diff --git a/articles/search/index-similarity-and-scoring.md b/articles/search/index-similarity-and-scoring.md
@@ -0,0 +1,73 @@
+---
+title: Similarity and scoring overview
+titleSuffix: Azure Cognitive Search
+description: Explains the concepts of similarity and scoring, and what a developer can do to customize the scoring result.
+
+manager: nitinme
+author: luiscabrer
+ms.author: luisca
+ms.service: cognitive-search
+ms.topic: conceptual
+ms.date: 04/27/2020
+---
+# Similarity and scoring in Azure Cognitive Search
+
+Scoring refers to the computation of a search score for every item returned in search results for full text search queries. The score is an indicator of an item's relevance in the context of the current search operation. The higher the score, the more relevant the item. In search results, items are rank ordered from high to low, based on the search scores calculated for each item. 
+
+By default, the top 50 are returned in the response, but you can use the **$top** parameter to return a smaller or larger number of items (up to 1000 in a single response), and **$skip** to get the next set of results.
+
+The search score is computed based on statistical properties of the data and the query. Azure Cognitive Search finds documents that match on search terms (some or all, depending on [searchMode](https://docs.microsoft.com/rest/api/searchservice/search-documents#searchmodeany--all-optional)), favoring documents that contain many instances of the search term. The search score goes up even higher if the term is rare across the data index, but common within the document. The basis for this approach to computing relevance is known as *TF-IDF or* term frequency-inverse document frequency.
+
+Search score values can be repeated throughout a result set. When multiple hits have the same search score, the ordering of the same scored items is not defined, and is not stable. Run the query again, and you might see items shift position, especially if you are using the free service or a billable service with multiple replicas. Given two items with an identical score, there is no guarantee which one appears first.
+
+If you want to break the tie among repeating scores, you can add an **$orderby** clause to first order by score, then order by another sortable field (for example, `$orderby=search.score() desc,Rating desc`). For more information, see [$orderby](https://docs.microsoft.com/azure/search/search-query-odata-orderby).
+
+> [!NOTE]
+> A `@search.score = 1.00` indicates an un-scored or un-ranked result set. The score is uniform across all results. Un-scored results occur when the query form is fuzzy search, wildcard or regex queries, or a **$filter** expression. 
+
+## Scoring profiles
+
+You can customize the way different fields are ranked by defining a custom *scoring profile*. Scoring profiles give you greater control over the ranking of items in search results. For example, you might want to boost items based on their revenue potential, promote newer items, or perhaps boost items that have been in inventory too long. 
+
+A scoring profile is part of the index definition, composed of weighted fields, functions, and parameters. For more information about defining one, see [Scoring Profiles](index-add-scoring-profiles.md).
+
+## Scoring statistics
+
+For scalability, Azure Cognitive Search distributes each index horizontally through a sharding process, which means that portions of an index are physically separate.
+
+By default, the score of a document is calculated based on statistical properties of the data *within a shard*. This approach is generally not a problem for a large corpus of data, and it provides better performance than having to calculate the score based on information across all shards. That said, using this performance optimization could cause two very similar documents (or even identical documents) to end up with different relevance scores if they end up in different shards.
+
+If you prefer to compute the score based on the statistical properties across all shards, you can do so by adding *scoringStatistics=global* as a [query parameter](https://docs.microsoft.com/rest/api/searchservice/search-documents) (or add *"scoringStatistics": "global"* as a body parameter of the [query request](https://docs.microsoft.com/rest/api/searchservice/search-documents)).
+
+```http
+GET https://[service name].search.windows.net/indexes/[index name]/docs?scoringStatistics=global
+  Content-Type: application/json
+  api-key: [admin key]  
+```
+
+> [!NOTE]
+> An admin api-key is required for the `scoringStatistics` parameter.
+
+## Similarity ranking algorithms
+
+Azure Cognitive Search supports two different similarity ranking algorithms: A *classic similarity* algorithm and the official implementation of the *Okapi BM25* algorithm (currently in preview). The classical similarity algorithm is the default algorithm, but starting July 15, any new services created after that date use the new BM25 algorithm. It will be the only algorithm available on new services.
+
+For now, you can specify which similarity ranking algorithm you would like to use. For more information, see [Ranking algorithm](index-ranking-similarity.md).
+
+## Watch this video
+
+In this 16-minute video, software engineer Raouf Merouche explains the process of indexing, querying, and how to create scoring profiles. It gives you a good idea of what is going on under the hood as your documents are being indexed and retrieved.
+
+>[!VIDEO https://channel9.msdn.com/Shows/AI-Show/Similarity-and-Scoring-in-Azure-Cognitive-Search/player]
+
++ 2 - 3 minutes cover indexing: text processing and lexical analysis.
++ 3 - 4 minutes cover indexing: inverted indexes.
++ 4 - 6 minutes cover querying: retrieval and ranking.
++ 7 - 16 minutes covers scoring profiles.
+
+## See also
+
+ [Scoring Profiles](index-add-scoring-profiles.md)
+ [REST API Reference](https://docs.microsoft.com/rest/api/searchservice/)   
+ [Search Documents API](https://docs.microsoft.com/rest/api/searchservice/search-documents)   
+ [Azure Cognitive Search .NET SDK](https://docs.microsoft.com/dotnet/api/overview/azure/search?view=azure-dotnet)