Skip to content

Commit ba7410c

Browse files
authored
Merge pull request #273105 from HeidiSteen/heidist-vectors
[azure search] Hybrid query, Search Explorer example
2 parents 95ae8d8 + 692f54b commit ba7410c

File tree

1 file changed

+84
-46
lines changed

1 file changed

+84
-46
lines changed

articles/search/hybrid-search-how-to-query.md

Lines changed: 84 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: Hybrid query how-to
2+
title: Hybrid query
33
titleSuffix: Azure AI Search
44
description: Learn how to build queries for hybrid search.
55

@@ -14,27 +14,57 @@ ms.date: 04/23/2024
1414

1515
# Create a hybrid query in Azure AI Search
1616

17-
Hybrid search combines one or more keyword queries with one or more vector queries in a single search request. The queries execute in parallel. The results are merged and reordered by a new search score, using [Reciprocal Rank Fusion (RRF)](hybrid-search-ranking.md) to return a single ranked result set.
17+
[Hybrid search](hybrid-search-overview.md) combines one or more keyword queries with one or more vector queries in a single search request. The queries execute in parallel. The results are merged and reordered by new search scores, using [Reciprocal Rank Fusion (RRF)](hybrid-search-ranking.md) to return a single ranked result set.
1818

19-
In [benchmark tests](https://techcommunity.microsoft.com/t5/ai-azure-ai-services-blog/azure-ai-search-outperforming-vector-search-with-hybrid/ba-p/3929167), hybrid queries return the most relevant results.
19+
In most cases, [per benchmark tests](https://techcommunity.microsoft.com/t5/ai-azure-ai-services-blog/azure-ai-search-outperforming-vector-search-with-hybrid/ba-p/3929167), hybrid queries with semantic ranking return the most relevant results.
2020

21-
To define a hybrid query, use [**Search Post REST API version 2023-11-01**](/rest/api/searchservice/documents/search-post), **2023-10-01-preview** or higher, Search Explorer in the Azure portal, or newer versions of the Azure SDKs.
21+
To define a hybrid query, use REST API [**2023-11-01**](/rest/api/searchservice/documents/search-post), [**2023-10-01-preview**](/rest/api/searchservice/documents/search-post?view=rest-searchservice-2023-10-01-preview&preserve-view=true), [**2024-03-01-preview**](/rest/api/searchservice/documents/search-post?view=rest-searchservice-2024-03-01-preview&preserve-view=true), Search Explorer in the Azure portal, or newer versions of the Azure SDKs.
2222

2323
## Prerequisites
2424

2525
+ A search index containing `searchable` vector and nonvector fields. See [Create an index](search-how-to-create-search-index.md) and [Add vector fields to a search index](vector-search-how-to-create-index.md).
2626

27-
+ (Optional) If you want [semantic ranking](semantic-search-overview.md), your search service must be Basic tier or higher, with [semantic ranking enabled](semantic-how-to-enable-disable.md).
27+
+ (Optional) If you want [semantic ranking](semantic-how-to-configure.md), your search service must be Basic tier or higher, with [semantic ranking enabled](semantic-how-to-enable-disable.md).
2828

2929
+ (Optional) If you want text-to-vector conversion of a query string (currently in preview), [create and assign a vectorizer](vector-search-how-to-configure-vectorizer.md) to vector fields in the search index.
3030

31-
## Hybrid query request (REST API)
31+
## Run a hybrid query in Search Explorer
32+
33+
1. In [Search Explorer](search-explorer.md), make sure the API version is **2023-10-01-preview** or later.
34+
35+
1. Under **View**, select **JSON view**.
3236

33-
A hybrid query combines text search and vector search, where the `"search"` parameter takes a query string and `"vectors.value"` takes the vector query. The search engine runs full text and vector queries in parallel. All matches are evaluated for relevance using Reciprocal Rank Fusion (RRF) and a single result set is returned in the response.
37+
1. Replace the default query template with a hybrid query, such as the one starting on line 539 for the [vector quickstart example](vector-search-how-to-configure-vectorizer.md#try-a-vectorizer-with-sample-data). For brevity, the vector is truncated in this article.
3438

35-
All results are returned in plain text, including vectors in fields marked as `retrievable`. Because numeric vectors aren't useful in search results, choose other fields in the index as a proxy for the vector match. For example, if an index has "descriptionVector" and "descriptionText" fields, the query can match on "descriptionVector" but the search result can show "descriptionText". Use the `select` parameter to specify only human-readable fields in the results.
39+
A hybrid query has a text query specified in `search`, and a vectory query specified under `vectorQueries.vector`.
3640

37-
Hybrid queries are useful because they add support for all query capabilities, including orderby and [semantic ranking](semantic-how-to-query-request.md). For example, in addition to the vector query, you could search over people or product names or titles, scenarios for which similarity search isn't a good fit.
41+
The text query and vector query should be equivalent or at least not conflict. If the queries are different, you don't get the benefit of hybrid.
42+
43+
```json
44+
{
45+
"count": true,
46+
"search": "historic hotel walk to restaurants and shopping",
47+
"select": "HotelId, HotelName, Category, Tags, Description",
48+
"top": 7,
49+
"vectorQueries": [
50+
{
51+
"vector": [0.01944167, 0.0040178085, -0.007816401 ... <remaining values omitted> ],
52+
"k": 7,
53+
"fields": "DescriptionVector",
54+
"kind": "vector",
55+
"exhaustive": true
56+
}
57+
]
58+
}
59+
```
60+
61+
1. Select **Search**.
62+
63+
## Hybrid query request (REST API)
64+
65+
A hybrid query combines text search and vector search, where the `search` parameter takes a query string and `vectorQueries.vector` takes the vector query. The search engine runs full text and vector queries in parallel. The union of all matches is evaluated for relevance using Reciprocal Rank Fusion (RRF) and a single result set is returned in the response.
66+
67+
Results are returned in plain text, including vectors in fields marked as `retrievable`. Because numeric vectors aren't useful in search results, choose other fields in the index as a proxy for the vector match. For example, if an index has "descriptionVector" and "descriptionText" fields, the query can match on "descriptionVector" but the search result can show "descriptionText". Use the `select` parameter to specify only human-readable fields in the results.
3868

3969
The following example shows a hybrid query configuration.
4070

@@ -51,30 +81,30 @@ api-key: {{admin-api-key}}
5181
-0.02178128,
5282
-0.00086512347
5383
],
54-
"fields": "contentVector",
84+
"fields": "DescriptionVector",
5585
"kind": "vector",
5686
"exhaustive": true,
5787
"k": 10
5888
}],
59-
"search": "what azure services support full text search",
60-
"select": "title, content, category",
89+
"search": "historic hotel walk to restaurants and shopping",
90+
"select": "HotelName, Description, Address/City",
6191
"top": "10"
6292
}
6393
```
6494

6595
**Key points:**
6696

67-
+ The vector query string is specified through the vector "vector.value" property. The query executes against the "contentVector" field. Set "kind" to "vector" to indicate the query type. Optionally, set "exhaustive" to true to query the full contents of the vector field.
97+
+ The vector query string is specified through the `vectorQueries.vector` property. The query executes against the "DescriptionVector" field. Set `kind` to "vector" to indicate the query type. Optionally, set `exhaustive` to true to query the full contents of the vector field.
6898

69-
+ Keyword search is specified through "search" property. It executes in parallel with the vector query.
99+
+ Keyword search is specified through `search` property. It executes in parallel with the vector query.
70100

71-
+ "k" determines how many nearest neighbor matches are returned from the vector query and provided to the RRF ranker.
101+
+ `k` determines how many nearest neighbor matches are returned from the vector query and provided to the RRF ranker.
72102

73-
+ "top" determines how many matches are returned in the response all-up. In this example, the response includes 10 results, assuming there are at least 10 matches in the merged results.
103+
+ `top` determines how many matches are returned in the response all-up. In this example, the response includes 10 results, assuming there are at least 10 matches in the merged results.
74104

75105
## Hybrid search with filter
76106

77-
This example adds a filter, which is applied to the "filterable" nonvector fields of the search index.
107+
This example adds a filter, which is applied to the `filterable` nonvector fields of the search index.
78108

79109
```http
80110
POST https://{{search-service-name}}.search.windows.net/indexes/{{index-name}}/docs/search?api-version=2023-11-01
@@ -90,29 +120,29 @@ api-key: {{admin-api-key}}
90120
-0.02178128,
91121
-0.00086512347
92122
],
93-
"fields": "contentVector",
123+
"fields": "DescriptionVector",
94124
"kind": "vector",
95125
"k": 10
96126
}
97127
],
98-
"search": "what azure services support full text search",
128+
"search": "historic hotel walk to restaurants and shopping",
99129
"vectorFilterMode": "postFilter",
100-
"filter": "category eq 'Databases'",
130+
"filter": "ParkingIncluded",
101131
"top": "10"
102132
}
103133
```
104134

105135
**Key points:**
106136

107-
+ Filters are applied to the content of filterable fields. In this example, the category field is marked as filterable in the index schema.
137+
+ Filters are applied to the content of filterable fields. In this example, the ParkingIncluded field is a boolean and it's marked as `filterable` in the index schema.
108138

109-
+ In hybrid queries, filters can be applied before query execution to reduce the query surface, or after query execution to trim results. `"preFilter"` is the default. To use `postFilter`, set the [filter processing mode](vector-search-filters.md).
139+
+ In hybrid queries, filters can be applied before query execution to reduce the query surface, or after query execution to trim results. `"preFilter"` is the default. To use `postFilter`, set the [filter processing mode](vector-search-filters.md) as shown in this example.
110140

111141
+ When you postfilter query results, the number of results might be less than top-n.
112142

113143
## Semantic hybrid search
114144

115-
Assuming that you [enabled semantic ranking](semantic-how-to-enable-disable.md) and your index definition includes a [semantic configuration](semantic-how-to-query-request.md), you can formulate a query that includes vector search, plus keyword search. Semantic ranking occurs over the merged result set, adding captions and answers.
145+
Assuming that you [enabled semantic ranking](semantic-how-to-enable-disable.md) and your index definition includes a [semantic configuration](semantic-how-to-query-request.md), you can formulate a query that includes vector search and keyword search, with semantic ranking over the merged result set. Optionally, you can add captions and answers.
116146

117147
```http
118148
POST https://{{search-service-name}}.search.windows.net/indexes/{{index-name}}/docs/search?api-version=2023-11-01
@@ -128,13 +158,13 @@ api-key: {{admin-api-key}}
128158
-0.02178128,
129159
-0.00086512347
130160
],
131-
"fields": "contentVector",
161+
"fields": "DescriptionVector",
132162
"kind": "vector",
133163
"k": 50
134164
}
135165
],
136-
"search": "what azure services support full text search",
137-
"select": "title, content, category",
166+
"search": "historic hotel walk to restaurants and shopping",
167+
"select": "HotelName, Description, Tags",
138168
"queryType": "semantic",
139169
"semanticConfiguration": "my-semantic-config",
140170
"captions": "extractive",
@@ -169,18 +199,18 @@ api-key: {{admin-api-key}}
169199
-0.02178128,
170200
-0.00086512347
171201
],
172-
"fields": "contentVector",
202+
"fields": "DescriptionVector",
173203
"kind": "vector",
174204
"k": 50
175205
}
176206
],
177-
"search": "what azure services support full text search",
178-
"select": "title, content, category",
207+
"search": "historic hotel walk to restaurants and shopping",
208+
"select": "HotelName, Description, Tags",
179209
"queryType": "semantic",
180210
"semanticConfiguration": "my-semantic-config",
181211
"captions": "extractive",
182212
"answers": "extractive",
183-
"filter": "category eq 'Databases'",
213+
"filter": "ParkingIsIncluded'",
184214
"vectorFilterMode": "postFilter",
185215
"top": "50"
186216
}
@@ -190,7 +220,7 @@ api-key: {{admin-api-key}}
190220

191221
+ The filter mode can affect the number of results available to the semantic reranker. As a best practice, it's smart to give the semantic ranker the maximum number of documents (50). If prefilters or postfilters are too selective, you might be underserving the semantic ranker by giving it fewer than 50 documents to work with.
192222

193-
+ Prefiltering is applied before query execution. If prefilter reduces the search area to 100 documents, the vector query executes over the "contentVector" field for those 100 documents, returning the k=50 best matches. Those 50 matching documents then pass to RRF for merged results, and then to semantic ranker.
223+
+ Prefiltering is applied before query execution. If prefilter reduces the search area to 100 documents, the vector query executes over the "DescriptionVector" field for those 100 documents, returning the k=50 best matches. Those 50 matching documents then pass to RRF for merged results, and then to semantic ranker.
194224

195225
+ Postfilter is applied after query execution. If k=50 returns 50 matches on the vector query side, then the post-filter is applied to the 50 matches, reducing results that meet filter criteria, leaving you with fewer than 50 documents to pass to semantic ranker
196226

@@ -200,15 +230,15 @@ When you're setting up the hybrid query, think about the response structure. The
200230

201231
### Fields in a response
202232

203-
Search results are composed of "retrievable" fields from your search index. A result is either:
233+
Search results are composed of `retrievable` fields from your search index. A result is either:
204234

205-
+ All "retrievable" fields (a REST API default).
235+
+ All `retrievable` fields (a REST API default).
206236
+ Fields explicitly listed in a "select" parameter on the query.
207237

208238
The examples in this article used a "select" statement to specify text (nonvector) fields in the response.
209239

210240
> [!NOTE]
211-
> Vectors aren't designed for readability, so avoid returning them in the response. Instead, choose non-vector fields that are representative of the search document. For example, if the query targets a "descriptionVector" field, return an equivalent text field if you have one ("description") in the response.
241+
> Vectors aren't reverse engineered into human readable text, so avoid returning them in the response. Instead, choose nonvector fields that are representative of the search document. For example, if the query targets a "DescriptionVector" field, return an equivalent text field if you have one ("Description") in the response.
212242
213243
### Number of results
214244

@@ -227,26 +257,34 @@ Multiple sets are created for hybrid queries, with or without the optional [sema
227257

228258
In this section, compare the responses between single vector search and simple hybrid search for the top result. The different ranking algorithms, HNSW's similarity metric and RRF is this case, produce scores that have different magnitudes. This behavior is by design. RRF scores can appear quite low, even with a high similarity match. Lower scores are a characteristic of the RRF algorithm. In a hybrid query with RRF, more of the reciprocal of the ranked documents are included in the results, given the relatively smaller score of the RRF ranked documents, as opposed to pure vector search.
229259

230-
**Single Vector Search**: Results ordered by cosine similarity (default vector similarity distance function).
260+
**Single Vector Search**: @search.score for results ordered by cosine similarity (default vector similarity distance function).
231261

232262
```json
233263
{
234-
"@search.score": 0.8851871,
235-
"title": "Azure AI Search",
236-
"content": "Azure AI Search is a fully managed search-as-a-service that enables you to build rich search experiences for your applications. It provides features like full-text search, faceted navigation, and filters. Azure AI Search supports various data sources, such as Azure SQL Database, Azure Blob Storage, and Azure Cosmos DB. You can use Azure AI Search to index your data, create custom scoring profiles, and integrate with other Azure services. It also integrates with other Azure services, such as Azure Cognitive Services and Azure Machine Learning.",
237-
"category": "AI + Machine Learning"
238-
},
264+
"@search.score": 0.8399121,
265+
"HotelId": "49",
266+
"HotelName": "Old Carrabelle Hotel",
267+
"Description": "Spacious rooms, glamorous suites and residences, rooftop pool, walking access to shopping, dining, entertainment and the city center.",
268+
"Category": "Luxury",
269+
"Address": {
270+
"City": "Arlington"
271+
}
272+
}
239273
```
240274

241-
**Hybrid Search**: Combined keyword and vector search results using Reciprocal Rank Fusion.
275+
**Hybrid Search**: @search.score for hybrid results ranked using Reciprocal Rank Fusion.
242276

243277
```json
244278
{
245-
"@search.score": 0.03333333507180214,
246-
"title": "Azure AI Search",
247-
"content": "Azure AI Search is a fully managed search-as-a-service that enables you to build rich search experiences for your applications. It provides features like full-text search, faceted navigation, and filters. Azure AI Search supports various data sources, such as Azure SQL Database, Azure Blob Storage, and Azure Cosmos DB. You can use Azure AI Search to index your data, create custom scoring profiles, and integrate with other Azure services. It also integrates with other Azure services, such as Azure Cognitive Services and Azure Machine Learning.",
248-
"category": "AI + Machine Learning"
249-
},
279+
"@search.score": 0.032786883413791656,
280+
"HotelId": "49",
281+
"HotelName": "Old Carrabelle Hotel",
282+
"Description": "Spacious rooms, glamorous suites and residences, rooftop pool, walking access to shopping, dining, entertainment and the city center.",
283+
"Category": "Luxury",
284+
"Address": {
285+
"City": "Arlington"
286+
}
287+
}
250288
```
251289

252290
## Next steps

0 commit comments

Comments
 (0)