Merge pull request #250071 from HeidiSteen/heidist-fix

Jill Grant · web-flow · commit 05db21f87865 · 2023-08-31T20:33:34.000-06:00
[azure search] GH issues for ranking, paging, and metadata
diff --git a/articles/search/index-ranking-similarity.md b/articles/search/index-ranking-similarity.md
@@ -6,7 +6,7 @@ author: HeidiSteen
 ms.author: heidist
 ms.service: cognitive-search
 ms.topic: how-to
-ms.date: 04/18/2023
+ms.date: 08/31/2023
 ---
 
 # Configure relevance scoring
@@ -60,7 +60,7 @@ BM25 similarity adds two parameters to control the relevance score calculation.
 
 ## Enable BM25 scoring on older services
 
-If you're running a search service that was created from March 2014 through July 15, 2020, you can enable BM25 by setting a "similarity" property on new indexes. The property is only exposed on new indexes, so if want BM25 on an existing index, you must drop and [rebuild the index](search-howto-reindex.md) with a "similarity" property set to "Microsoft.Azure.Search.BM25Similarity".
+If you're running a search service that was created from March 2014 through July 15, 2020, you can enable BM25 by setting a "similarity" property on new indexes. The property is only exposed on new indexes, so if you want BM25 on an existing index, you must drop and [rebuild the index](search-howto-reindex.md) with a "similarity" property set to "Microsoft.Azure.Search.BM25Similarity".
 
 Once an index exists with a "similarity" property, you can switch between `BM25Similarity` or `ClassicSimilarity`. 
 
diff --git a/articles/search/index-similarity-and-scoring.md b/articles/search/index-similarity-and-scoring.md
@@ -6,14 +6,14 @@ author: HeidiSteen
 ms.author: heidist
 ms.service: cognitive-search
 ms.topic: conceptual
-ms.date: 04/18/2023
+ms.date: 08/31/2023
 ---
 
 # Relevance and scoring in Azure Cognitive Search
 
 This article explains the relevance and the scoring algorithms used to compute search scores in Azure Cognitive Search. A relevance score is computed for each match found in a [full text search](search-lucene-query-architecture.md), where the strongest matches are assigned higher search scores. 
 
-Relevance applies to full text search only. Filter queries, autocomplete and suggested queries, wildcard search or fuzzy search queries are not scored or ranked for relevance.
+Relevance applies to full text search only. Filter queries, autocomplete and suggested queries, wildcard search or fuzzy search queries aren't scored or ranked for relevance.
 
 In Azure Cognitive Search, you can tune search relevance and boost search scores through these mechanisms:
 
@@ -31,7 +31,7 @@ Relevance scoring refers to the computation of a search score that serves as an
 
 The search score is computed based on statistical properties of the string input and the query itself. Azure Cognitive Search finds documents that match on search terms (some or all, depending on [searchMode](/rest/api/searchservice/search-documents#query-parameters)), favoring documents that contain many instances of the search term. The search score goes up even higher if the term is rare across the data index, but common within the document. The basis for this approach to computing relevance is known as *TF-IDF or* term frequency-inverse document frequency.
 
-Search scores can be repeated throughout a result set. When multiple hits have the same search score, the ordering of the same scored items is undefined and not stable. Run the query again, and you might see items shift position, especially if you are using the free service or a billable service with multiple replicas. Given two items with an identical score, there is no guarantee which one appears first.
+Search scores can be repeated throughout a result set. When multiple hits have the same search score, the ordering of the same scored items is undefined and not stable. Run the query again, and you might see items shift position, especially if you are using the free service or a billable service with multiple replicas. Given two items with an identical score, there's no guarantee that one appears first.
 
 If you want to break the tie among repeating scores, you can add an **$orderby** clause to first order by score, then order by another sortable field (for example, `$orderby=search.score() desc,Rating desc`). For more information, see [$orderby](search-query-odata-orderby.md).
 
@@ -44,8 +44,8 @@ Azure Cognitive Search provides the following scoring algorithms:
 
 | Algorithm | Usage | Range |
 |-----------|-------------|-------|
-| BM25Similarity | Fixed algorithm on all search services created after July 2020. You can configure this algorithm, but you can't switch to an older one (classic). | Unbounded. |
-|ClassicSimilarity | Present on older search services. You can [opt-in for BM25](index-ranking-similarity.md) and choose an algorithm on a per-index basis. | 0 < 1.00 |
+| `BM25Similarity` | Fixed algorithm on all search services created after July 2020. You can configure this algorithm, but you can't switch to an older one (classic). | Unbounded. |
+|`ClassicSimilarity` | Present on older search services. You can [opt-in for BM25](index-ranking-similarity.md) and choose an algorithm on a per-index basis. | 0 < 1.00 |
 
 Both BM25 and Classic are TF-IDF-like retrieval functions that use the term frequency (TF) and the inverse document frequency (IDF) as variables to calculate relevance scores for each document-query pair, which is then used for ranking results. While conceptually similar to classic, BM25 is rooted in probabilistic information retrieval that produces more intuitive matches, as measured by user research. 
 
diff --git a/articles/search/search-pagination-page-layout.md b/articles/search/search-pagination-page-layout.md
@@ -8,7 +8,7 @@ author: HeidiSteen
 ms.author: heidist
 ms.service: cognitive-search
 ms.topic: how-to
-ms.date: 04/18/2023
+ms.date: 08/31/2023
 ---
 
 # How to work with search results in Azure Cognitive Search
@@ -17,8 +17,9 @@ This article explains how to work with a query response in Azure Cognitive Searc
 
 Parameters on the query determine:
 
-+ Selection of fields within results
++ Field selection
 + Count of matches found in the index for the query
++ Paging results
 + Number of results in the response (up to 50, by default)
 + Sort order
 + Highlighting of terms within a result, matching on either the whole or partial term in the body
@@ -68,7 +69,7 @@ Count won't be affected by routine maintenance or other workloads on the search
 
 By default, the search engine returns up to the first 50 matches. The top 50 are determined by search score, assuming the query is full text search or semantic search. Otherwise, the top 50 are an arbitrary order for exact match queries (where uniform "@searchScore=1.0" indicates arbitrary ranking).
 
-To control the paging of all documents returned in a result set, add `$top` and `$skip` parameters to the GET query request or `top` and `skip` to the POST query request. The following list explains the logic.
+To control the paging of all documents returned in a result set, add `$top` and `$skip` parameters to the GET query request, or `top` and `skip` to the POST query request. The following list explains the logic.
 
 + Return the first set of 15 matching documents plus a count of total matches: `GET /indexes/<INDEX-NAME>/docs?search=<QUERY STRING>&$top=15&$skip=0&$count=true`
 
@@ -78,7 +79,7 @@ The results of paginated queries aren't guaranteed to be stable if the underlyin
 
 Following is an example of how you might get duplicates. Assume an index with four documents:
 
-```text
+```json
 { "id": "1", "rating": 5 }
 { "id": "2", "rating": 3 }
 { "id": "3", "rating": 2 }
@@ -87,50 +88,58 @@ Following is an example of how you might get duplicates. Assume an index with fo
 
 Now assume you want results returned two at a time, ordered by rating. You would execute this query to get the first page of results: `$top=2&$skip=0&$orderby=rating desc`, producing the following results:
 
-```text
+```json
 { "id": "1", "rating": 5 }
 { "id": "2", "rating": 3 }
 ```
 
 On the service, assume a fifth document is added to the index in between query calls: `{ "id": "5", "rating": 4 }`.  Shortly thereafter, you execute a query to fetch the second page: `$top=2&$skip=2&$orderby=rating desc`, and get these results:
 
-```text
+```json
 { "id": "2", "rating": 3 }
 { "id": "3", "rating": 2 }
 ```
 
 Notice that document 2 is fetched twice. This is because the new document 5 has a greater value for rating, so it sorts before document 2 and lands on the first page. While this behavior might be unexpected, it's typical of how a search engine behaves.
 
-### Paging through large numbers of results
+### Paging through a large number of results
 
-Using `$top` and `$skip` allows a search query to page through 100,000 results. A value greater than 100,000 may not be used for `$skip`. It's possible to work around this limitation if a field has the ["filterable"](./search-filters.md) and ["sortable"] attributes.
+Using `$top` and `$skip` allows a search query to page through 100,000 results, but what if results are larger than 100,000? To page through a response this large, use a [sort order](search-query-odata-orderby.md) and [range filter](search-query-odata-comparison-operators.md) as a workaround for `$skip`. 
+
+In this workaround, sort and filter are applied to a document ID field or another field that is unique for each document. The unique field must have `filterable` and `sortable` attribution in the search index.
 
 1. Issue a query to return a full page of sorted results.
-```http
-POST /indexes/good-books/docs/search?api-version=2020-06-30
-    {  
-      "search": "divine secrets",
-      "top": 50,
-      "orderby": "id asc"
-    }
-```
-2. Choose the last result returned by the search query. An example result with only an "id" value is shown here.
-```json
-{
-    "id": "50"
-}
-```
-3. Use that "id" value in a range query to fetch the next page of results. This "id" field should have unique values, otherwise pagination may include duplicate results.
-```http
-POST /indexes/good-books/docs/search?api-version=2020-06-30
-    {  
-      "search": "divine secrets",
-      "top": 50,
-      "orderby": "id asc",
-      "filter": "id ge 50"
+
+    ```http
+    POST /indexes/good-books/docs/search?api-version=2020-06-30
+        {  
+          "search": "divine secrets",
+          "top": 50,
+          "orderby": "id asc"
+        }
+    ```
+
+1. Choose the last result returned by the search query. An example result with only an "id" value is shown here.
+
+    ```json
+    {
+        "id": "50"
     }
-```
-4. Pagination ends when the query returns 0 results.
+    ```
+
+1. Use that "id" value in a range query to fetch the next page of results. This "id" field should have unique values, otherwise pagination may include duplicate results.
+
+    ```http
+    POST /indexes/good-books/docs/search?api-version=2020-06-30
+        {  
+          "search": "divine secrets",
+          "top": 50,
+          "orderby": "id asc",
+          "filter": "id ge 50"
+        }
+    ```
+
+1. Pagination ends when the query returns zero results.
 
 > [!NOTE]
 > The "filterable" and "sortable" attributes can only be enabled when a field is first added to an index, they cannot be enabled on an existing field.
@@ -189,7 +198,7 @@ Hit highlighting instructions are provided on the [query request](/rest/api/sear
 
 ### Requirements for hit highlighting
 
-+ Fields must be Edm.String or Collection(Edm.String)
++ Fields must be `Edm.String` or `Collection(Edm.String)`
 + Fields must be attributed at **searchable**
 
 ### Specify highlighting in the request
@@ -264,6 +273,7 @@ Within a highlighted field, formatting is applied to whole terms. For example, o
         "original_title": "Grave Secrets",
         "title": "Grave Secrets (Temperance Brennan, #5)"
     }
+]
 ```
 
 ### Phrase search highlighting
@@ -282,7 +292,7 @@ POST /indexes/good-books/docs/search?api-version=2020-06-30
     }
 ```
 
-Because the criteria now specifies both terms, only one match is found in the search index. The response to the above query looks like this:
+Because the criteria now has both terms, only one match is found in the search index. The response to the above query looks like this:
 
 ```json
 {
diff --git a/articles/search/search-performance-analysis.md b/articles/search/search-performance-analysis.md
@@ -1,12 +1,12 @@
 ---
 title: Analyze performance
 titleSuffix: Azure Cognitive Search
-description: TBD
+description: Learn about the tools, behaviors, and approaches for analyzing query and indexing performance in Cognitive Search.
 author: LiamCavanagh
 ms.author: liamca
 ms.service: cognitive-search
 ms.topic: conceptual
-ms.date: 01/30/2023
+ms.date: 08/31/2023
 ---
 
 # Analyze performance in Azure Cognitive Search
diff --git a/articles/search/semantic-ranking.md b/articles/search/semantic-ranking.md
@@ -8,7 +8,7 @@ author: HeidiSteen
 ms.author: heidist
 ms.service: cognitive-search
 ms.topic: conceptual
-ms.date: 08/14/2023
+ms.date: 08/31/2023
 ---
 
 # Semantic ranking in Azure Cognitive Search
diff --git a/articles/search/vector-search-how-to-query.md b/articles/search/vector-search-how-to-query.md
@@ -148,7 +148,58 @@ api-key: {{admin-api-key}}
 
 The response includes 5 matches, and each result provides a search score, title, content, and category. In a similarity search, the response always includes "k" matches, even if the similarity is weak. For indexes that have fewer than "k" documents, only those number of documents will be returned.
 
-Notice that "select" returns textual fields from the index. Although the vector field is "retrievable" in this example, its content isn't usable as a search result, so it's not included in the results.
+Notice that "select" returns textual fields from the index. Although the vector field is "retrievable" in this example, its content isn't usable as a search result, so it's often excluded in the results.
+
+### Vector query response
+
+Here's a modified example so that you can see the basic structure of a response from a pure vector query. 
+
+```json
+{
+    "@odata.count": 3,
+    "value": [
+        {
+            "@search.score": 0.80025613,
+            "title": "Azure Search",
+            "category": "AI + Machine Learning",
+            "contentVector": [
+                -0.0018343845,
+                0.017952163,
+                0.0025753193,
+                ...
+            ]
+        },
+        {
+            "@search.score": 0.78856903,
+            "title": "Azure Application Insights",
+            "category": "Management + Governance",
+            "contentVector": [
+                -0.016821077,
+                0.0037742127,
+                0.016136652,
+                ...
+            ]
+        },
+        {
+            "@search.score": 0.78650564,
+            "title": "Azure Media Services",
+            "category": "Media",
+            "contentVector": [
+                -0.025449317,
+                0.0038463024,
+                -0.02488436,
+                ...
+            ]
+        }
+    ]
+}
+```
+
+**Key points:**
+
++ It's reduced to 3 "k" matches.
++ It shows a **`@search.score`** that's determined by the HNSW algorithm and a `cosine` similarity metric. 
++ Fields include text and vector values. The content vector field consists of 1536 dimensions for each match, so it's truncated for brevity (normally, you might exclude vector fields from results). The text fields used in the response (`"select": "title, category"`) aren't used during query execution. The match is made on vector data alone. However, a response can include any "retrievable" field in an index. As such, the inclusion of text fields is helpful because its values are easily recognized by users.
 
 ### [**.NET**](#tab/dotnet-vector-query)
 
diff --git a/articles/search/vector-search-ranking.md b/articles/search/vector-search-ranking.md