Merge pull request #278043 from HeidiSteen/heidist-june12

JamesJBarnett · web-flow · commit a76c74215926 · 2024-06-13T14:42:17.000-07:00
[azure search] maxTextRecallSize updates
diff --git a/articles/search/hybrid-search-how-to-query.md b/articles/search/hybrid-search-how-to-query.md
@@ -9,16 +9,20 @@ ms.service: cognitive-search
 ms.custom:
   - ignite-2023
 ms.topic: how-to
-ms.date: 04/23/2024
+ms.date: 06/12/2024
 ---
 
 # Create a hybrid query in Azure AI Search
 
-[Hybrid search](hybrid-search-overview.md) combines one or more keyword queries with one or more vector queries in a single search request. The queries execute in parallel. The results are merged and reordered by new search scores, using [Reciprocal Rank Fusion (RRF)](hybrid-search-ranking.md) to return a single ranked result set.
+[Hybrid search](hybrid-search-overview.md) combines one or more text (keyword) queries with one or more vector queries in a single search request. The queries execute in parallel. The results are merged and reordered by new search scores, using [Reciprocal Rank Fusion (RRF)](hybrid-search-ranking.md) to return a unified result set.
 
-In most cases, [per benchmark tests](https://techcommunity.microsoft.com/t5/ai-azure-ai-services-blog/azure-ai-search-outperforming-vector-search-with-hybrid/ba-p/3929167), hybrid queries with semantic ranking return the most relevant results.
+In many cases, [per benchmark tests](https://techcommunity.microsoft.com/t5/ai-azure-ai-services-blog/azure-ai-search-outperforming-vector-search-with-hybrid/ba-p/3929167), hybrid queries with semantic ranking return the most relevant results.
 
-To define a hybrid query, use REST API [**2023-11-01**](/rest/api/searchservice/documents/search-post), [**2023-10-01-preview**](/rest/api/searchservice/documents/search-post?view=rest-searchservice-2023-10-01-preview&preserve-view=true), [**2024-03-01-preview**](/rest/api/searchservice/documents/search-post?view=rest-searchservice-2024-03-01-preview&preserve-view=true), Search Explorer in the Azure portal, or newer versions of the Azure SDKs. 
+To improve relevance:
+
++ New parameters [maxTextRecallSize and countAndFacetMode(preview)](#set-maxtextrecallsize-and-countandfacetmode-preview) give you more control over text inputs into a hybrid query. 
+
++ New [vector weighting](vector-search-how-to-query.md#vector-weighting-preview) lets you set the relative weight of the vector query. This feature is particularly useful in complex queries where two or more distinct result sets need to be combined, as is the case for hybrid search. 
 
 ## Prerequisites
 
@@ -28,6 +32,15 @@ To define a hybrid query, use REST API [**2023-11-01**](/rest/api/searchservice/
 
 + (Optional) If you want text-to-vector conversion of a query string (currently in preview), [create and assign a vectorizer](vector-search-how-to-configure-vectorizer.md) to vector fields in the search index.
 
+## Choose an API or tool
+
++ [**2023-11-01**](/rest/api/searchservice/documents/search-post) stable version
++ [**2023-10-01-preview**](/rest/api/searchservice/documents/search-post?view=rest-searchservice-2023-10-01-preview&preserve-view=true), adds integrated vectorization to the vector side of a hybrid query
++ [**2024-03-01-preview**](/rest/api/searchservice/documents/search-post?view=rest-searchservice-2024-03-01-preview&preserve-view=true), adds narrow data types and scalar quantization to the vector side of a hybrid query
++ [**2024-05-01-preview**](/rest/api/searchservice/documents/search-post?view=rest-searchservice-2024-05-01-preview&preserve-view=true) adds `maxTextRecallSize` and `countAndFacetMode` specifially for hybrid search
++ Search Explorer in the Azure portal (targets 2024-05-01-preview behaviors)
++ Newer stable or beta packages of the Azure SDKs (see change logs for SDK feature support)
+
 ## Run a hybrid query in Search Explorer
 
 1. In [Search Explorer](search-explorer.md), make sure the API version is **2023-10-01-preview** or later.
@@ -73,22 +86,37 @@ POST https://{{search-service-name}}.search.windows.net/indexes/{{index-name}}/d
 Content-Type: application/json
 api-key: {{admin-api-key}}
 {
-    "vectorQueries": [{
-        "vector": [
-            -0.009154141,
-            0.018708462,
-            . . . 
-            -0.02178128,
-            -0.00086512347
-        ],
-        "fields": "DescriptionVector",
-        "kind": "vector",
-        "exhaustive": true,
-        "k": 10
-    }],
+    "vectorQueries": [
+        {
+            "vector": [
+                -0.009154141,
+                0.018708462,
+                . . . 
+                -0.02178128,
+                -0.00086512347
+            ],
+            "fields": "DescriptionVector",
+            "kind": "vector",
+            "exhaustive": true,
+            "k": 10
+        },
+        {
+            "vector": [
+                -0.009154141,
+                0.018708462,
+                . . . 
+                -0.02178128,
+                -0.00086512347
+            ],
+            "fields": "DescriptionVector",
+            "kind": "vector",
+            "exhaustive": true,
+            "k": 10
+        }
+    ],
     "search": "historic hotel walk to restaurants and shopping",
     "select": "HotelName, Description, Address/City",
-    "top": "10"
+    "top": 10
 }
 ```
 
@@ -222,7 +250,75 @@ api-key: {{admin-api-key}}
 
 + Prefiltering is applied before query execution. If prefilter reduces the search area to 100 documents, the vector query executes over the "DescriptionVector" field for those 100 documents, returning the k=50 best matches. Those 50 matching documents then pass to RRF for merged results, and then to semantic ranker.
 
-+ Postfilter is applied after query execution. If k=50 returns 50 matches on the vector query side, then the post-filter is applied to the 50 matches, reducing results that meet filter criteria, leaving you with fewer than 50 documents to pass to semantic ranker
++ Postfilter is applied after query execution. If k=50 returns 50 matches on the vector query side, then the post-filter is applied to the 50 matches, reducing results that meet filter criteria, leaving you with fewer than 50 documents to pass to semantic ranker.
+
+## Set maxTextRecallSize and countAndFacetMode (preview)
+
+This section explains how to adjust the inputs to a hybrid query by controlling the amount BM25-ranked results that flow to the hybrid ranking model. Controlling over the BM25-ranked input gives you more options for relevance tuning in hybrid scenarios. 
+
+> [!TIP]
+> Another option to consider is a supplemental or replacement technique, is [vector weighting](vector-search-how-to-query.md#vector-weighting-preview), which increases the importance of vector queries in the request.
+
+1. Use [Search - POST](/rest/api/searchservice/documents/search-post?view=rest-searchservice-2024-05-01-preview&preserve-view=true) or [Search - GET](/rest/api/searchservice/documents/search-get?view=rest-searchservice-2024-05-01-preview&preserve-view=true) in 2024-05-01-preview to specify these parameters.
+
+1. Add a `hybridSearch` query parameter object to set the maximum number of documents recalled through the BM25-ranked results of a hybrid query. It has two properties:
+
+   + `maxTextRecallSize` specifies the number of BM25-ranked results to provide to the Reciprocal Rank Fusion (RRF) ranker used in hybrid queries. The default is 1,000. The maximum is 10,000.
+
+   + `countAndFacetMode` reports the counts for the BM25-ranked results (and for facets if you're using them). The default is all documents that match the query. Optionally, you can scope "count" to the `maxTextRecallSize`.
+
+1. Reduce `maxTextRecallSize` if vector similarity search is generally outperforming the text-side of the the hybrid query.
+
+1. Raise `maxTextRecallSize` if you have a large index, and the default isn't capturing a sufficient number of results. With a larger BM25-ranked result set, you can also set `top`, `skip`, and `next` to retrieve portions of those results.
+
+The following REST examples show two use-cases for setting `maxTextRecallSize`. 
+
+The first example reduces `maxTextRecallSize` to 100, limiting the text side of the hybrid query to just 100 document. It also sets `countAndFacetMode` to include only those results from `maxTextRecallSize`.
+
+```http
+POST https://[service-name].search.windows.net/indexes/[index-name]/docs/search?api-version=2024-05-01-Preview 
+
+    { 
+      "vectorQueries": [ 
+        { 
+          "kind": "vector", 
+          "vector": [1.0, 2.0, 3.0], 
+          "fields": "my_vector_field", 
+          "k": 10 
+        } 
+      ], 
+      "search": "hello world", 
+      "hybridSearch": { 
+        "maxTextRecallSize": 100, 
+        "countAndFacetMode": "countRetrievableResults" 
+      } 
+    } 
+```
+
+The second example raises `maxTextRecallSize` to 5,000. It also uses top, skip, and next to pull results from large result sets. In this case, the request pulls in BM25-ranked results starting at position 1,500 through 2,000 as the text query contribution to the RRF composite result set.
+
+```http
+POST https://[service-name].search.windows.net/indexes/[index-name]/docs/search?api-version=2024-05-01-Preview 
+
+    { 
+      "vectorQueries": [ 
+        { 
+          "kind": "vector", 
+          "vector": [1.0, 2.0, 3.0], 
+          "fields": "my_vector_field", 
+          "k": 10 
+        } 
+      ], 
+      "search": "hello world",
+      "top": 500,
+      "skip": 1500,
+      "next": 500,
+      "hybridSearch": { 
+        "maxTextRecallSize": 5000, 
+        "countAndFacetMode": "countRetrievableResults" 
+      } 
+    } 
+```
 
 ## Configure a query response
 
diff --git a/articles/search/hybrid-search-ranking.md b/articles/search/hybrid-search-ranking.md
@@ -9,7 +9,7 @@ ms.service: cognitive-search
 ms.custom:
   - ignite-2023
 ms.topic: conceptual
-ms.date: 11/01/2023
+ms.date: 06/12/2024
 ---
 
 # Relevance scoring in hybrid search using Reciprocal Rank Fusion (RRF)
@@ -57,13 +57,31 @@ The following chart identifies the scoring property returned on each match, algo
 
 Semantic ranking doesn't participate in RRF. Its score (`@search.rerankerScore`) is always reported separately in the query response. Semantic ranking can rerank full text and hybrid search results, assuming those results include fields having semantically rich content.
 
+## Weighted scores
+
+Starting in 2024-05-01-preview, you can [weight vector queries](vector-search-how-to-query.md#vector-weighting-preview) to increase or decrease their importance in a hybrid query.
+
+Recall that when computing RRF for a certain document, the search engine looks at the rank of that document for each result set where it shows up. Assume a document shows up in three separate search results, where the results are from two vector queries and one text BM25-ranked query. The position of the document varies in each result.
+
+| Match found | Position in results | @search.score | weight multiplier | @search.score (weighted) |
+|--|--|--|--|--|--|
+| vector results one | position 1 | 0.8383955 | 0.5 | 0.41919775 |
+| vector results two | position 5 | 0.81514114 | 2.0 | 1.63028228 |
+| BM25 results | position 10  | 0.8577363 | NA | 0.8577363 |
+
+The document's position in each result set corresponds to an initial score, which are added up to create the final RRF score for that document. 
+
+If you add vector weighting, the initial scores are subect to a weighting multiplier that increases or decreases the score. The default is 1.0, which means no weighting and the initial score is used as-is in RRF scoring. However, if you add a weight of 0.5, the score is reduced and that result becomes less important in the combined ranking. Conversely, if you add a weight of 2.0, the score becomes a larger factor in the overall RRF score.
+
+In this example, the @search.score (weighted) values are passed to the RRF ranking model.
+
 ## Number of ranked results in a hybrid query response
 
 By default, if you aren't using pagination, the search engine returns the top 50 highest ranking matches for full text search, and the most similar `k` matches for vector search. In a hybrid query, `top` determines the number of results in the response. Based on defaults, the top 50 highest ranked matches of the unified result set are returned. 
 
-Often, the search engine finds more results than `top` and `k`. To return more results, use the paging parameters `top`, `skip`, and `next`. Paging is how you determine the number of results on each logical page and navigate through the full payload. 
+Often, the search engine finds more results than `top` and `k`. To return more results, use the paging parameters `top`, `skip`, and `next`. Paging is how you determine the number of results on each logical page and navigate through the full payload. You can set `maxTextRecallSize` to larger values (the default is 1,000) to return more results from the text side of hybrid query.
 
-Full text search is subject to a maximum limit of 1,000 matches (see [API response limits](search-limits-quotas-capacity.md#api-response-limits)). Once 1,000 matches are found, the search engine no longer looks for more.
+By default, full text search is subject to a maximum limit of 1,000 matches (see [API response limits](search-limits-quotas-capacity.md#api-response-limits)). Once 1,000 matches are found, the search engine no longer looks for more.
 
 For more information, see [How to work with search results](search-pagination-page-layout.md).
 
diff --git a/articles/search/search-pagination-page-layout.md b/articles/search/search-pagination-page-layout.md
@@ -10,7 +10,7 @@ ms.service: cognitive-search
 ms.custom:
   - ignite-2023
 ms.topic: how-to
-ms.date: 08/31/2023
+ms.date: 06/12/2024
 ---
 
 # How to shape results in Azure AI Search
@@ -68,6 +68,8 @@ Count won't be affected by routine maintenance or other workloads on the search
 
 By default, the search engine returns up to the first 50 matches. The top 50 are determined by search score, assuming the query is full text search or semantic. Otherwise, the top 50 are an arbitrary order for exact match queries (where uniform "@searchScore=1.0" indicates arbitrary ranking).
 
+The upper limit is 1,000 documents returned per page of search results, so you can set top to return up to 1000 document in the first result. In newer preview APIs, if you're using a hybrid query, you can [specify maxTextRecallSize](hybrid-search-how-to-query.md#set-maxtextrecallsize-and-countandfacetmode-preview) to return up to 10,000 documents.
+
 To control the paging of all documents returned in a result set, add `$top` and `$skip` parameters to the GET query request, or `top` and `skip` to the POST query request. The following list explains the logic.
 
 + Return the first set of 15 matching documents plus a count of total matches: `GET /indexes/<INDEX-NAME>/docs?search=<QUERY STRING>&$top=15&$skip=0&$count=true`
diff --git a/articles/search/vector-search-how-to-query.md b/articles/search/vector-search-how-to-query.md
@@ -9,7 +9,7 @@ ms.service: cognitive-search
 ms.custom:
   - build-2024
 ms.topic: how-to
-ms.date: 05/30/2024
+ms.date: 06/12/2024
 ---
 
 # Create a vector query in Azure AI Search
@@ -22,7 +22,6 @@ In Azure AI Search, if you have a [vector index](vector-search-how-to-create-ind
 > + [Query multiple vector fields at once](#multiple-vector-fields)
 > + [Query with integrated vectorization (preview)](#query-with-integrated-vectorization-preview)
 > + [Set thresholds to exclude low-scoring results (preview)](#set-thresholds-to-exclude-low-scoring-results-preview)
-> + [Set MaxTextSizeRecall to control the number of results (preview)](#maxtextsizerecall-for-hybrid-search-preview)
 > + [Set vector weights (preview)](#vector-weighting-preview)
 
 This article uses REST for illustration. For code samples in other languages, see the [azure-search-vector-samples](https://github.com/Azure/azure-search-vector-samples) GitHub repository for end-to-end solutions that include vector queries. 
@@ -598,51 +597,45 @@ POST https://[service-name].search.windows.net/indexes/[index-name]/docs/search?
 
 Filtering occurs before [fusing results](hybrid-search-ranking.md) from different recall sets. 
 
+ <!-- Keep H2 as-is. Direct link from a blog post. Bulk of maxtextsizerecall has moved to hybrid query doc-->
 ## MaxTextSizeRecall for hybrid search (preview)
 
-Add a `hybridSearch` query parameter object to specify the maximum number of documents recalled using text queries in hybrid (text and vector) search. The default is 1,000 documents. With this parameter, you can increase or decrease the number of results returned in hybrid queries.
+Vector queries are often used in hybrid constructs that include nonvector fields. If you discover that BM25-ranked results are over or under represented in a hybrid query results, you can [set `maxTextRecallSize`](hybrid-search-how-to-query.md#set-maxtextrecallsize-and-countandfacetmode-preview) to increase or decrease the BM25-ranked results provided for hybrid ranking.
 
-```http
-POST https://[service-name].search.windows.net/indexes/[index-name]/docs/search?api-version=2024-05-01-Preview 
-    Content-Type: application/json 
-    api-key: [admin key] 
+You can only set this property in hybrid requests that include both "search" and "vectorQueries" components.
 
-    { 
-      "vectorQueries": [ 
-        { 
-          "kind": "vector", 
-          "vector": [1.0, 2.0, 3.0], 
-          "fields": "my_vector_field", 
-          "k": 10 
-        } 
-      ], 
-      "search": "hello world", 
-      "hybridSearch": { 
-        "maxTextRecallSize": 100, 
-        "countAndFacetMode": "countAllResults" 
-      } 
-    } 
-```
+For more information, see [Set maxTextRecallSize - Create a hybrid query](hybrid-search-how-to-query.md#set-maxtextrecallsize-and-countandfacetmode-preview).
 
 ## Vector weighting (preview)
 
-Add a `weight` query parameter to specify the relative weight of each vector included in search operations. This feature is particularly useful in complex queries where two or more distinct result sets need to be combined, such as in hybrid search or multivector requests. 
+Add a `weight` query parameter to specify the relative weight of each vector included in search operations. This value is used when combining the results of multiple ranking lists produced by two or more vector queries in the same request, or from the vector portion of a hybrid query.
+
+The default is 1.0 and the value must be a positive number larger than zero.
+
+Weights are used when calculating the [reciprocal rank fusion](hybrid-search-ranking.md#weighted-scores) scores of each document. The calculation is multiplier of the `weight` value against the rank score of the document within its respective result set.
 
-Weights are used when calculating the [reciprocal rank fusion](hybrid-search-ranking.md) scores of each document. The calculation is multiplier of the `weight` value against the rank score of the document within its respective result set. 
+The following example is a hybrid query with two vector query strings and one text string. Weights are assigned to the vector queries. The first query is 0.5 or half the weight, reducing its importance in the request. The second vectory query is twice as important. 
+
+Text queries have no weight parameters, but you can increase or decrease their importance by setting [maxTextRecallSize](hybrid-search-how-to-query.md#set-maxtextrecallsize-and-countandfacetmode-preview).
 
 ```http
 POST https://[service-name].search.windows.net/indexes/[index-name]/docs/search?api-version=2024-05-01-Preview 
-Content-Type: application/json 
-api-key: [admin key] 
 
     { 
       "vectorQueries": [ 
         { 
           "kind": "vector", 
           "vector": [1.0, 2.0, 3.0], 
-          "fields": "my_vector_field", 
+          "fields": "my_first_vector_field", 
           "k": 10, 
           "weight": 0.5 
+        },
+        { 
+          "kind": "vector", 
+          "vector": [4.0, 5.0, 6.0], 
+          "fields": "my_second_vector_field", 
+          "k": 10, 
+          "weight": 2.0
         } 
       ], 
       "search": "hello world" 
diff --git a/articles/search/whats-new.md b/articles/search/whats-new.md