Skip to content

Commit 05db21f

Browse files
author
Jill Grant
authored
Merge pull request #250071 from HeidiSteen/heidist-fix
[azure search] GH issues for ranking, paging, and metadata
2 parents 7cdc7c0 + 82fd856 commit 05db21f

7 files changed

+135
-61
lines changed

articles/search/index-ranking-similarity.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ author: HeidiSteen
66
ms.author: heidist
77
ms.service: cognitive-search
88
ms.topic: how-to
9-
ms.date: 04/18/2023
9+
ms.date: 08/31/2023
1010
---
1111

1212
# Configure relevance scoring
@@ -60,7 +60,7 @@ BM25 similarity adds two parameters to control the relevance score calculation.
6060
6161
## Enable BM25 scoring on older services
6262
63-
If you're running a search service that was created from March 2014 through July 15, 2020, you can enable BM25 by setting a "similarity" property on new indexes. The property is only exposed on new indexes, so if want BM25 on an existing index, you must drop and [rebuild the index](search-howto-reindex.md) with a "similarity" property set to "Microsoft.Azure.Search.BM25Similarity".
63+
If you're running a search service that was created from March 2014 through July 15, 2020, you can enable BM25 by setting a "similarity" property on new indexes. The property is only exposed on new indexes, so if you want BM25 on an existing index, you must drop and [rebuild the index](search-howto-reindex.md) with a "similarity" property set to "Microsoft.Azure.Search.BM25Similarity".
6464
6565
Once an index exists with a "similarity" property, you can switch between `BM25Similarity` or `ClassicSimilarity`.
6666

articles/search/index-similarity-and-scoring.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -6,14 +6,14 @@ author: HeidiSteen
66
ms.author: heidist
77
ms.service: cognitive-search
88
ms.topic: conceptual
9-
ms.date: 04/18/2023
9+
ms.date: 08/31/2023
1010
---
1111

1212
# Relevance and scoring in Azure Cognitive Search
1313

1414
This article explains the relevance and the scoring algorithms used to compute search scores in Azure Cognitive Search. A relevance score is computed for each match found in a [full text search](search-lucene-query-architecture.md), where the strongest matches are assigned higher search scores.
1515

16-
Relevance applies to full text search only. Filter queries, autocomplete and suggested queries, wildcard search or fuzzy search queries are not scored or ranked for relevance.
16+
Relevance applies to full text search only. Filter queries, autocomplete and suggested queries, wildcard search or fuzzy search queries aren't scored or ranked for relevance.
1717

1818
In Azure Cognitive Search, you can tune search relevance and boost search scores through these mechanisms:
1919

@@ -31,7 +31,7 @@ Relevance scoring refers to the computation of a search score that serves as an
3131

3232
The search score is computed based on statistical properties of the string input and the query itself. Azure Cognitive Search finds documents that match on search terms (some or all, depending on [searchMode](/rest/api/searchservice/search-documents#query-parameters)), favoring documents that contain many instances of the search term. The search score goes up even higher if the term is rare across the data index, but common within the document. The basis for this approach to computing relevance is known as *TF-IDF or* term frequency-inverse document frequency.
3333

34-
Search scores can be repeated throughout a result set. When multiple hits have the same search score, the ordering of the same scored items is undefined and not stable. Run the query again, and you might see items shift position, especially if you are using the free service or a billable service with multiple replicas. Given two items with an identical score, there is no guarantee which one appears first.
34+
Search scores can be repeated throughout a result set. When multiple hits have the same search score, the ordering of the same scored items is undefined and not stable. Run the query again, and you might see items shift position, especially if you are using the free service or a billable service with multiple replicas. Given two items with an identical score, there's no guarantee that one appears first.
3535

3636
If you want to break the tie among repeating scores, you can add an **$orderby** clause to first order by score, then order by another sortable field (for example, `$orderby=search.score() desc,Rating desc`). For more information, see [$orderby](search-query-odata-orderby.md).
3737

@@ -44,8 +44,8 @@ Azure Cognitive Search provides the following scoring algorithms:
4444

4545
| Algorithm | Usage | Range |
4646
|-----------|-------------|-------|
47-
| BM25Similarity | Fixed algorithm on all search services created after July 2020. You can configure this algorithm, but you can't switch to an older one (classic). | Unbounded. |
48-
|ClassicSimilarity | Present on older search services. You can [opt-in for BM25](index-ranking-similarity.md) and choose an algorithm on a per-index basis. | 0 < 1.00 |
47+
| `BM25Similarity` | Fixed algorithm on all search services created after July 2020. You can configure this algorithm, but you can't switch to an older one (classic). | Unbounded. |
48+
|`ClassicSimilarity` | Present on older search services. You can [opt-in for BM25](index-ranking-similarity.md) and choose an algorithm on a per-index basis. | 0 < 1.00 |
4949

5050
Both BM25 and Classic are TF-IDF-like retrieval functions that use the term frequency (TF) and the inverse document frequency (IDF) as variables to calculate relevance scores for each document-query pair, which is then used for ranking results. While conceptually similar to classic, BM25 is rooted in probabilistic information retrieval that produces more intuitive matches, as measured by user research.
5151

articles/search/search-pagination-page-layout.md

Lines changed: 44 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ author: HeidiSteen
88
ms.author: heidist
99
ms.service: cognitive-search
1010
ms.topic: how-to
11-
ms.date: 04/18/2023
11+
ms.date: 08/31/2023
1212
---
1313

1414
# How to work with search results in Azure Cognitive Search
@@ -17,8 +17,9 @@ This article explains how to work with a query response in Azure Cognitive Searc
1717

1818
Parameters on the query determine:
1919

20-
+ Selection of fields within results
20+
+ Field selection
2121
+ Count of matches found in the index for the query
22+
+ Paging results
2223
+ Number of results in the response (up to 50, by default)
2324
+ Sort order
2425
+ Highlighting of terms within a result, matching on either the whole or partial term in the body
@@ -68,7 +69,7 @@ Count won't be affected by routine maintenance or other workloads on the search
6869

6970
By default, the search engine returns up to the first 50 matches. The top 50 are determined by search score, assuming the query is full text search or semantic search. Otherwise, the top 50 are an arbitrary order for exact match queries (where uniform "@searchScore=1.0" indicates arbitrary ranking).
7071

71-
To control the paging of all documents returned in a result set, add `$top` and `$skip` parameters to the GET query request or `top` and `skip` to the POST query request. The following list explains the logic.
72+
To control the paging of all documents returned in a result set, add `$top` and `$skip` parameters to the GET query request, or `top` and `skip` to the POST query request. The following list explains the logic.
7273

7374
+ Return the first set of 15 matching documents plus a count of total matches: `GET /indexes/<INDEX-NAME>/docs?search=<QUERY STRING>&$top=15&$skip=0&$count=true`
7475

@@ -78,7 +79,7 @@ The results of paginated queries aren't guaranteed to be stable if the underlyin
7879

7980
Following is an example of how you might get duplicates. Assume an index with four documents:
8081

81-
```text
82+
```json
8283
{ "id": "1", "rating": 5 }
8384
{ "id": "2", "rating": 3 }
8485
{ "id": "3", "rating": 2 }
@@ -87,50 +88,58 @@ Following is an example of how you might get duplicates. Assume an index with fo
8788

8889
Now assume you want results returned two at a time, ordered by rating. You would execute this query to get the first page of results: `$top=2&$skip=0&$orderby=rating desc`, producing the following results:
8990

90-
```text
91+
```json
9192
{ "id": "1", "rating": 5 }
9293
{ "id": "2", "rating": 3 }
9394
```
9495

9596
On the service, assume a fifth document is added to the index in between query calls: `{ "id": "5", "rating": 4 }`. Shortly thereafter, you execute a query to fetch the second page: `$top=2&$skip=2&$orderby=rating desc`, and get these results:
9697

97-
```text
98+
```json
9899
{ "id": "2", "rating": 3 }
99100
{ "id": "3", "rating": 2 }
100101
```
101102

102103
Notice that document 2 is fetched twice. This is because the new document 5 has a greater value for rating, so it sorts before document 2 and lands on the first page. While this behavior might be unexpected, it's typical of how a search engine behaves.
103104

104-
### Paging through large numbers of results
105+
### Paging through a large number of results
105106

106-
Using `$top` and `$skip` allows a search query to page through 100,000 results. A value greater than 100,000 may not be used for `$skip`. It's possible to work around this limitation if a field has the ["filterable"](./search-filters.md) and ["sortable"] attributes.
107+
Using `$top` and `$skip` allows a search query to page through 100,000 results, but what if results are larger than 100,000? To page through a response this large, use a [sort order](search-query-odata-orderby.md) and [range filter](search-query-odata-comparison-operators.md) as a workaround for `$skip`.
108+
109+
In this workaround, sort and filter are applied to a document ID field or another field that is unique for each document. The unique field must have `filterable` and `sortable` attribution in the search index.
107110

108111
1. Issue a query to return a full page of sorted results.
109-
```http
110-
POST /indexes/good-books/docs/search?api-version=2020-06-30
111-
{
112-
"search": "divine secrets",
113-
"top": 50,
114-
"orderby": "id asc"
115-
}
116-
```
117-
2. Choose the last result returned by the search query. An example result with only an "id" value is shown here.
118-
```json
119-
{
120-
"id": "50"
121-
}
122-
```
123-
3. Use that "id" value in a range query to fetch the next page of results. This "id" field should have unique values, otherwise pagination may include duplicate results.
124-
```http
125-
POST /indexes/good-books/docs/search?api-version=2020-06-30
126-
{
127-
"search": "divine secrets",
128-
"top": 50,
129-
"orderby": "id asc",
130-
"filter": "id ge 50"
112+
113+
```http
114+
POST /indexes/good-books/docs/search?api-version=2020-06-30
115+
{
116+
"search": "divine secrets",
117+
"top": 50,
118+
"orderby": "id asc"
119+
}
120+
```
121+
122+
1. Choose the last result returned by the search query. An example result with only an "id" value is shown here.
123+
124+
```json
125+
{
126+
"id": "50"
131127
}
132-
```
133-
4. Pagination ends when the query returns 0 results.
128+
```
129+
130+
1. Use that "id" value in a range query to fetch the next page of results. This "id" field should have unique values, otherwise pagination may include duplicate results.
131+
132+
```http
133+
POST /indexes/good-books/docs/search?api-version=2020-06-30
134+
{
135+
"search": "divine secrets",
136+
"top": 50,
137+
"orderby": "id asc",
138+
"filter": "id ge 50"
139+
}
140+
```
141+
142+
1. Pagination ends when the query returns zero results.
134143
135144
> [!NOTE]
136145
> The "filterable" and "sortable" attributes can only be enabled when a field is first added to an index, they cannot be enabled on an existing field.
@@ -189,7 +198,7 @@ Hit highlighting instructions are provided on the [query request](/rest/api/sear
189198
190199
### Requirements for hit highlighting
191200
192-
+ Fields must be Edm.String or Collection(Edm.String)
201+
+ Fields must be `Edm.String` or `Collection(Edm.String)`
193202
+ Fields must be attributed at **searchable**
194203
195204
### Specify highlighting in the request
@@ -264,6 +273,7 @@ Within a highlighted field, formatting is applied to whole terms. For example, o
264273
"original_title": "Grave Secrets",
265274
"title": "Grave Secrets (Temperance Brennan, #5)"
266275
}
276+
]
267277
```
268278

269279
### Phrase search highlighting
@@ -282,7 +292,7 @@ POST /indexes/good-books/docs/search?api-version=2020-06-30
282292
}
283293
```
284294

285-
Because the criteria now specifies both terms, only one match is found in the search index. The response to the above query looks like this:
295+
Because the criteria now has both terms, only one match is found in the search index. The response to the above query looks like this:
286296

287297
```json
288298
{

articles/search/search-performance-analysis.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
---
22
title: Analyze performance
33
titleSuffix: Azure Cognitive Search
4-
description: TBD
4+
description: Learn about the tools, behaviors, and approaches for analyzing query and indexing performance in Cognitive Search.
55
author: LiamCavanagh
66
ms.author: liamca
77
ms.service: cognitive-search
88
ms.topic: conceptual
9-
ms.date: 01/30/2023
9+
ms.date: 08/31/2023
1010
---
1111

1212
# Analyze performance in Azure Cognitive Search

articles/search/semantic-ranking.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ author: HeidiSteen
88
ms.author: heidist
99
ms.service: cognitive-search
1010
ms.topic: conceptual
11-
ms.date: 08/14/2023
11+
ms.date: 08/31/2023
1212
---
1313

1414
# Semantic ranking in Azure Cognitive Search

articles/search/vector-search-how-to-query.md

Lines changed: 52 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -148,7 +148,58 @@ api-key: {{admin-api-key}}
148148

149149
The response includes 5 matches, and each result provides a search score, title, content, and category. In a similarity search, the response always includes "k" matches, even if the similarity is weak. For indexes that have fewer than "k" documents, only those number of documents will be returned.
150150

151-
Notice that "select" returns textual fields from the index. Although the vector field is "retrievable" in this example, its content isn't usable as a search result, so it's not included in the results.
151+
Notice that "select" returns textual fields from the index. Although the vector field is "retrievable" in this example, its content isn't usable as a search result, so it's often excluded in the results.
152+
153+
### Vector query response
154+
155+
Here's a modified example so that you can see the basic structure of a response from a pure vector query.
156+
157+
```json
158+
{
159+
"@odata.count": 3,
160+
"value": [
161+
{
162+
"@search.score": 0.80025613,
163+
"title": "Azure Search",
164+
"category": "AI + Machine Learning",
165+
"contentVector": [
166+
-0.0018343845,
167+
0.017952163,
168+
0.0025753193,
169+
...
170+
]
171+
},
172+
{
173+
"@search.score": 0.78856903,
174+
"title": "Azure Application Insights",
175+
"category": "Management + Governance",
176+
"contentVector": [
177+
-0.016821077,
178+
0.0037742127,
179+
0.016136652,
180+
...
181+
]
182+
},
183+
{
184+
"@search.score": 0.78650564,
185+
"title": "Azure Media Services",
186+
"category": "Media",
187+
"contentVector": [
188+
-0.025449317,
189+
0.0038463024,
190+
-0.02488436,
191+
...
192+
]
193+
}
194+
]
195+
}
196+
```
197+
198+
**Key points:**
199+
200+
+ It's reduced to 3 "k" matches.
201+
+ It shows a **`@search.score`** that's determined by the HNSW algorithm and a `cosine` similarity metric.
202+
+ Fields include text and vector values. The content vector field consists of 1536 dimensions for each match, so it's truncated for brevity (normally, you might exclude vector fields from results). The text fields used in the response (`"select": "title, category"`) aren't used during query execution. The match is made on vector data alone. However, a response can include any "retrievable" field in an index. As such, the inclusion of text fields is helpful because its values are easily recognized by users.
152203

153204
### [**.NET**](#tab/dotnet-vector-query)
154205

0 commit comments

Comments
 (0)