You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/search/index-ranking-similarity.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ author: HeidiSteen
6
6
ms.author: heidist
7
7
ms.service: cognitive-search
8
8
ms.topic: how-to
9
-
ms.date: 04/18/2023
9
+
ms.date: 08/31/2023
10
10
---
11
11
12
12
# Configure relevance scoring
@@ -60,7 +60,7 @@ BM25 similarity adds two parameters to control the relevance score calculation.
60
60
61
61
## Enable BM25 scoring on older services
62
62
63
-
If you're running a search service that was created from March 2014 through July 15, 2020, you can enable BM25 by setting a "similarity" property on new indexes. The property is only exposed on new indexes, so if want BM25 on an existing index, you must drop and [rebuild the index](search-howto-reindex.md) with a "similarity" property set to "Microsoft.Azure.Search.BM25Similarity".
63
+
If you're running a search service that was created from March 2014 through July 15, 2020, you can enable BM25 by setting a "similarity" property on new indexes. The property is only exposed on new indexes, so if you want BM25 on an existing index, you must drop and [rebuild the index](search-howto-reindex.md) with a "similarity" property set to "Microsoft.Azure.Search.BM25Similarity".
64
64
65
65
Once an index exists with a "similarity" property, you can switch between `BM25Similarity` or `ClassicSimilarity`.
Copy file name to clipboardExpand all lines: articles/search/index-similarity-and-scoring.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,14 +6,14 @@ author: HeidiSteen
6
6
ms.author: heidist
7
7
ms.service: cognitive-search
8
8
ms.topic: conceptual
9
-
ms.date: 04/18/2023
9
+
ms.date: 08/31/2023
10
10
---
11
11
12
12
# Relevance and scoring in Azure Cognitive Search
13
13
14
14
This article explains the relevance and the scoring algorithms used to compute search scores in Azure Cognitive Search. A relevance score is computed for each match found in a [full text search](search-lucene-query-architecture.md), where the strongest matches are assigned higher search scores.
15
15
16
-
Relevance applies to full text search only. Filter queries, autocomplete and suggested queries, wildcard search or fuzzy search queries are not scored or ranked for relevance.
16
+
Relevance applies to full text search only. Filter queries, autocomplete and suggested queries, wildcard search or fuzzy search queries aren't scored or ranked for relevance.
17
17
18
18
In Azure Cognitive Search, you can tune search relevance and boost search scores through these mechanisms:
19
19
@@ -31,7 +31,7 @@ Relevance scoring refers to the computation of a search score that serves as an
31
31
32
32
The search score is computed based on statistical properties of the string input and the query itself. Azure Cognitive Search finds documents that match on search terms (some or all, depending on [searchMode](/rest/api/searchservice/search-documents#query-parameters)), favoring documents that contain many instances of the search term. The search score goes up even higher if the term is rare across the data index, but common within the document. The basis for this approach to computing relevance is known as *TF-IDF or* term frequency-inverse document frequency.
33
33
34
-
Search scores can be repeated throughout a result set. When multiple hits have the same search score, the ordering of the same scored items is undefined and not stable. Run the query again, and you might see items shift position, especially if you are using the free service or a billable service with multiple replicas. Given two items with an identical score, there is no guarantee which one appears first.
34
+
Search scores can be repeated throughout a result set. When multiple hits have the same search score, the ordering of the same scored items is undefined and not stable. Run the query again, and you might see items shift position, especially if you are using the free service or a billable service with multiple replicas. Given two items with an identical score, there's no guarantee that one appears first.
35
35
36
36
If you want to break the tie among repeating scores, you can add an **$orderby** clause to first order by score, then order by another sortable field (for example, `$orderby=search.score() desc,Rating desc`). For more information, see [$orderby](search-query-odata-orderby.md).
37
37
@@ -44,8 +44,8 @@ Azure Cognitive Search provides the following scoring algorithms:
44
44
45
45
| Algorithm | Usage | Range |
46
46
|-----------|-------------|-------|
47
-
| BM25Similarity | Fixed algorithm on all search services created after July 2020. You can configure this algorithm, but you can't switch to an older one (classic). | Unbounded. |
48
-
|ClassicSimilarity | Present on older search services. You can [opt-in for BM25](index-ranking-similarity.md) and choose an algorithm on a per-index basis. | 0 < 1.00 |
47
+
|`BM25Similarity`| Fixed algorithm on all search services created after July 2020. You can configure this algorithm, but you can't switch to an older one (classic). | Unbounded. |
48
+
|`ClassicSimilarity`| Present on older search services. You can [opt-in for BM25](index-ranking-similarity.md) and choose an algorithm on a per-index basis. | 0 < 1.00 |
49
49
50
50
Both BM25 and Classic are TF-IDF-like retrieval functions that use the term frequency (TF) and the inverse document frequency (IDF) as variables to calculate relevance scores for each document-query pair, which is then used for ranking results. While conceptually similar to classic, BM25 is rooted in probabilistic information retrieval that produces more intuitive matches, as measured by user research.
Copy file name to clipboardExpand all lines: articles/search/search-pagination-page-layout.md
+44-34Lines changed: 44 additions & 34 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,7 +8,7 @@ author: HeidiSteen
8
8
ms.author: heidist
9
9
ms.service: cognitive-search
10
10
ms.topic: how-to
11
-
ms.date: 04/18/2023
11
+
ms.date: 08/31/2023
12
12
---
13
13
14
14
# How to work with search results in Azure Cognitive Search
@@ -17,8 +17,9 @@ This article explains how to work with a query response in Azure Cognitive Searc
17
17
18
18
Parameters on the query determine:
19
19
20
-
+Selection of fields within results
20
+
+Field selection
21
21
+ Count of matches found in the index for the query
22
+
+ Paging results
22
23
+ Number of results in the response (up to 50, by default)
23
24
+ Sort order
24
25
+ Highlighting of terms within a result, matching on either the whole or partial term in the body
@@ -68,7 +69,7 @@ Count won't be affected by routine maintenance or other workloads on the search
68
69
69
70
By default, the search engine returns up to the first 50 matches. The top 50 are determined by search score, assuming the query is full text search or semantic search. Otherwise, the top 50 are an arbitrary order for exact match queries (where uniform "@searchScore=1.0" indicates arbitrary ranking).
70
71
71
-
To control the paging of all documents returned in a result set, add `$top` and `$skip` parameters to the GET query request or `top` and `skip` to the POST query request. The following list explains the logic.
72
+
To control the paging of all documents returned in a result set, add `$top` and `$skip` parameters to the GET query request, or `top` and `skip` to the POST query request. The following list explains the logic.
72
73
73
74
+ Return the first set of 15 matching documents plus a count of total matches: `GET /indexes/<INDEX-NAME>/docs?search=<QUERY STRING>&$top=15&$skip=0&$count=true`
74
75
@@ -78,7 +79,7 @@ The results of paginated queries aren't guaranteed to be stable if the underlyin
78
79
79
80
Following is an example of how you might get duplicates. Assume an index with four documents:
80
81
81
-
```text
82
+
```json
82
83
{ "id": "1", "rating": 5 }
83
84
{ "id": "2", "rating": 3 }
84
85
{ "id": "3", "rating": 2 }
@@ -87,50 +88,58 @@ Following is an example of how you might get duplicates. Assume an index with fo
87
88
88
89
Now assume you want results returned two at a time, ordered by rating. You would execute this query to get the first page of results: `$top=2&$skip=0&$orderby=rating desc`, producing the following results:
89
90
90
-
```text
91
+
```json
91
92
{ "id": "1", "rating": 5 }
92
93
{ "id": "2", "rating": 3 }
93
94
```
94
95
95
96
On the service, assume a fifth document is added to the index in between query calls: `{ "id": "5", "rating": 4 }`. Shortly thereafter, you execute a query to fetch the second page: `$top=2&$skip=2&$orderby=rating desc`, and get these results:
96
97
97
-
```text
98
+
```json
98
99
{ "id": "2", "rating": 3 }
99
100
{ "id": "3", "rating": 2 }
100
101
```
101
102
102
103
Notice that document 2 is fetched twice. This is because the new document 5 has a greater value for rating, so it sorts before document 2 and lands on the first page. While this behavior might be unexpected, it's typical of how a search engine behaves.
103
104
104
-
### Paging through large numbers of results
105
+
### Paging through a large number of results
105
106
106
-
Using `$top` and `$skip` allows a search query to page through 100,000 results. A value greater than 100,000 may not be used for `$skip`. It's possible to work around this limitation if a field has the ["filterable"](./search-filters.md) and ["sortable"] attributes.
107
+
Using `$top` and `$skip` allows a search query to page through 100,000 results, but what if results are larger than 100,000? To page through a response this large, use a [sort order](search-query-odata-orderby.md) and [range filter](search-query-odata-comparison-operators.md) as a workaround for `$skip`.
108
+
109
+
In this workaround, sort and filter are applied to a document ID field or another field that is unique for each document. The unique field must have `filterable` and `sortable` attribution in the search index.
107
110
108
111
1. Issue a query to return a full page of sorted results.
109
-
```http
110
-
POST /indexes/good-books/docs/search?api-version=2020-06-30
111
-
{
112
-
"search": "divine secrets",
113
-
"top": 50,
114
-
"orderby": "id asc"
115
-
}
116
-
```
117
-
2. Choose the last result returned by the search query. An example result with only an "id" value is shown here.
118
-
```json
119
-
{
120
-
"id": "50"
121
-
}
122
-
```
123
-
3. Use that "id" value in a range query to fetch the next page of results. This "id" field should have unique values, otherwise pagination may include duplicate results.
124
-
```http
125
-
POST /indexes/good-books/docs/search?api-version=2020-06-30
126
-
{
127
-
"search": "divine secrets",
128
-
"top": 50,
129
-
"orderby": "id asc",
130
-
"filter": "id ge 50"
112
+
113
+
```http
114
+
POST /indexes/good-books/docs/search?api-version=2020-06-30
115
+
{
116
+
"search": "divine secrets",
117
+
"top": 50,
118
+
"orderby": "id asc"
119
+
}
120
+
```
121
+
122
+
1. Choose the last result returned by the search query. An example result with only an "id" value is shown here.
123
+
124
+
```json
125
+
{
126
+
"id": "50"
131
127
}
132
-
```
133
-
4. Pagination ends when the query returns 0 results.
128
+
```
129
+
130
+
1. Use that "id" value in a range query to fetch the next page of results. This "id" field should have unique values, otherwise pagination may include duplicate results.
131
+
132
+
```http
133
+
POST /indexes/good-books/docs/search?api-version=2020-06-30
134
+
{
135
+
"search": "divine secrets",
136
+
"top": 50,
137
+
"orderby": "id asc",
138
+
"filter": "id ge 50"
139
+
}
140
+
```
141
+
142
+
1. Pagination ends when the query returns zero results.
134
143
135
144
> [!NOTE]
136
145
> The "filterable" and "sortable" attributes can only be enabled when a field is first added to an index, they cannot be enabled on an existing field.
@@ -189,7 +198,7 @@ Hit highlighting instructions are provided on the [query request](/rest/api/sear
189
198
190
199
### Requirements for hit highlighting
191
200
192
-
+ Fields must be Edm.String or Collection(Edm.String)
201
+
+ Fields must be `Edm.String` or `Collection(Edm.String)`
193
202
+ Fields must be attributed at **searchable**
194
203
195
204
### Specify highlighting in the request
@@ -264,6 +273,7 @@ Within a highlighted field, formatting is applied to whole terms. For example, o
264
273
"original_title": "Grave Secrets",
265
274
"title": "Grave Secrets (Temperance Brennan, #5)"
266
275
}
276
+
]
267
277
```
268
278
269
279
### Phrase search highlighting
@@ -282,7 +292,7 @@ POST /indexes/good-books/docs/search?api-version=2020-06-30
282
292
}
283
293
```
284
294
285
-
Because the criteria now specifies both terms, only one match is found in the search index. The response to the above query looks like this:
295
+
Because the criteria now has both terms, only one match is found in the search index. The response to the above query looks like this:
Copy file name to clipboardExpand all lines: articles/search/vector-search-how-to-query.md
+52-1Lines changed: 52 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -148,7 +148,58 @@ api-key: {{admin-api-key}}
148
148
149
149
The response includes 5 matches, and each result provides a search score, title, content, and category. In a similarity search, the response always includes "k" matches, even if the similarity is weak. For indexes that have fewer than "k" documents, only those number of documents will be returned.
150
150
151
-
Notice that "select" returns textual fields from the index. Although the vector field is "retrievable" in this example, its content isn't usable as a search result, so it's not included in the results.
151
+
Notice that "select" returns textual fields from the index. Although the vector field is "retrievable" in this example, its content isn't usable as a search result, so it's often excluded in the results.
152
+
153
+
### Vector query response
154
+
155
+
Here's a modified example so that you can see the basic structure of a response from a pure vector query.
156
+
157
+
```json
158
+
{
159
+
"@odata.count": 3,
160
+
"value": [
161
+
{
162
+
"@search.score": 0.80025613,
163
+
"title": "Azure Search",
164
+
"category": "AI + Machine Learning",
165
+
"contentVector": [
166
+
-0.0018343845,
167
+
0.017952163,
168
+
0.0025753193,
169
+
...
170
+
]
171
+
},
172
+
{
173
+
"@search.score": 0.78856903,
174
+
"title": "Azure Application Insights",
175
+
"category": "Management + Governance",
176
+
"contentVector": [
177
+
-0.016821077,
178
+
0.0037742127,
179
+
0.016136652,
180
+
...
181
+
]
182
+
},
183
+
{
184
+
"@search.score": 0.78650564,
185
+
"title": "Azure Media Services",
186
+
"category": "Media",
187
+
"contentVector": [
188
+
-0.025449317,
189
+
0.0038463024,
190
+
-0.02488436,
191
+
...
192
+
]
193
+
}
194
+
]
195
+
}
196
+
```
197
+
198
+
**Key points:**
199
+
200
+
+ It's reduced to 3 "k" matches.
201
+
+ It shows a **`@search.score`** that's determined by the HNSW algorithm and a `cosine` similarity metric.
202
+
+ Fields include text and vector values. The content vector field consists of 1536 dimensions for each match, so it's truncated for brevity (normally, you might exclude vector fields from results). The text fields used in the response (`"select": "title, category"`) aren't used during query execution. The match is made on vector data alone. However, a response can include any "retrievable" field in an index. As such, the inclusion of text fields is helpful because its values are easily recognized by users.
0 commit comments