|
| 1 | +--- |
| 2 | +title: Hybrid search |
| 3 | +titleSuffix: Azure Cognitive Search |
| 4 | +description: Describes concepts and architecture of hybrid query processing and document retrieval. Hybrid queries combine vector search and full text search. |
| 5 | + |
| 6 | +author: robertklee |
| 7 | +ms.author: robertlee |
| 8 | +ms.service: cognitive-search |
| 9 | +ms.topic: conceptual |
| 10 | +ms.date: 09/27/2023 |
| 11 | +--- |
| 12 | + |
| 13 | +# Hybrid search using vectors and full text in Azure Cognitive Search |
| 14 | + |
| 15 | +> [!IMPORTANT] |
| 16 | +> Hybrid search uses the [vector features](vector-search-overview.md) currently in public preview under [supplemental terms of use](https://azure.microsoft.com/support/legal/preview-supplemental-terms/). |
| 17 | +
|
| 18 | +Hybrid search is a combination of full text and vector queries that execute against a search index that contains both searchable plain text content and generated embeddings. For query purposes, hybrid search is: |
| 19 | + |
| 20 | ++ A single query request that includes `search` and `vectors` parameters, multiple vector queries, or one vector query targeting multiple fields |
| 21 | ++ Parallel query execution |
| 22 | ++ Merged results in the query response, scored using [Reciprocal Rank Fusion (RRF)](hybrid-search-ranking.md) |
| 23 | + |
| 24 | +This article explains the concepts, benefits, and limitations of hybrid search. |
| 25 | + |
| 26 | +## How does hybrid search work? |
| 27 | + |
| 28 | +In Azure Cognitive Search, vector indexes containing embeddings can live alongside textual and numerical fields allowing you to issue hybrid full text and vector queries. Hybrid queries can take advantage of existing functionality like filtering, faceting, sorting, scoring profiles, and [semantic ranking](semantic-search-overview.md) in a single search request. |
| 29 | + |
| 30 | +Hybrid search combines results from both full text and vector queries, which use different ranking functions such as BM25 and cosine similarity. To present these results in a single ranked list, a method of merging the ranked result lists is needed. |
| 31 | + |
| 32 | +## Structure of a hybrid query |
| 33 | + |
| 34 | +Hybrid search is predicated on having a search index that contains fields of various types, including plain text and numbers, geo coordinates for geospatial search, and vectors for a mathematical representation of a chunk of text or image, audio, and video. You can use almost all query capabilities in Cognitive Search with a vector query, except for client-side interactions such as autocomplete and suggestions. |
| 35 | + |
| 36 | +A representative hybrid query might be as follows (notice the vector is trimmed for brevity): |
| 37 | + |
| 38 | +```http |
| 39 | +POST https://{{searchServiceName}}.search.windows.net/indexes/hotels-vector-quickstart/docs/search?api-version=2023-07-01-Preview |
| 40 | + content-type: application/JSON |
| 41 | +{ |
| 42 | + "count": true, |
| 43 | + "search": "historic hotel walk to restaurants and shopping", |
| 44 | + "select": "HotelId, HotelName, Category, Description, Address/City, Address/StateProvince", |
| 45 | + "filter": "geo.distance(Location, geography'POINT(-77.03241 38.90166)') le 300", |
| 46 | + "facets": [ "Address/StateProvince"], |
| 47 | + "vectors": [ |
| 48 | + { |
| 49 | + "value": [ <array of embeddings> ] |
| 50 | + "k": 7, |
| 51 | + "fields": "DescriptionVector" |
| 52 | + }, |
| 53 | + { |
| 54 | + "value": [ <array of embeddings> ] |
| 55 | + "k": 7, |
| 56 | + "fields": "Description_frVector" |
| 57 | + } |
| 58 | + ], |
| 59 | + "queryType": "semantic", |
| 60 | + "queryLanguage": "en-us", |
| 61 | + "semanticConfiguration": "my-semantic-config" |
| 62 | +} |
| 63 | +``` |
| 64 | + |
| 65 | +Key points include: |
| 66 | + |
| 67 | ++ `search` specifies a full text search query. |
| 68 | ++ `vectors` for vector queries, which can be multiple, targeting multiple vector fields. If the embedding space includes multi-lingual content, vector queries can find the match with no language analyzers or translation required. |
| 69 | ++ `select` specifies which fields to return in results, which can be text fields that are human readable. |
| 70 | ++ `filters` can specify geospatial search or other include and exclude criteria, such as whether parking is included. The geospatial query in this example finds hotels within a 300-kilometer radius of Washington D.C. |
| 71 | ++ `facets` can be used to compute facet buckets over results that are returned from hybrid queries. |
| 72 | ++ `queryType=semantic` invokes semantic ranking, applying machine reading comprehension to surface more relevant search results. |
| 73 | + |
| 74 | +Filters and facets target data structures within the index that are distinct from the inverted indexes used for full text search and the vector indexes used for vector search. As such, when filters and faceted operations execute, the search engine can apply the operational result to the hybrid search results in the response. |
| 75 | + |
| 76 | +Notice how there's no `orderby` in the query. Explicit sort orders override relevanced-ranked results, so if you want similarity and BM25 relevance, omit sorting in your query. |
| 77 | + |
| 78 | +A response from the above query might look like this: |
| 79 | + |
| 80 | +```http |
| 81 | +{ |
| 82 | + "@odata.count": 3, |
| 83 | + "@search.facets": { |
| 84 | + "Address/StateProvince": [ |
| 85 | + { |
| 86 | + "count": 1, |
| 87 | + "value": "NY" |
| 88 | + }, |
| 89 | + { |
| 90 | + "count": 1, |
| 91 | + "value": "VA" |
| 92 | + } |
| 93 | + ] |
| 94 | + }, |
| 95 | + "value": [ |
| 96 | + { |
| 97 | + "@search.score": 0.03333333507180214, |
| 98 | + "@search.rerankerScore": 2.5229012966156006, |
| 99 | + "HotelId": "49", |
| 100 | + "HotelName": "Old Carrabelle Hotel", |
| 101 | + "Description": "Spacious rooms, glamorous suites and residences, rooftop pool, walking access to shopping, dining, entertainment and the city center.", |
| 102 | + "Category": "Luxury", |
| 103 | + "Address": { |
| 104 | + "City": "Arlington", |
| 105 | + "StateProvince": "VA" |
| 106 | + } |
| 107 | + }, |
| 108 | + { |
| 109 | + "@search.score": 0.032522473484277725, |
| 110 | + "@search.rerankerScore": 2.111117362976074, |
| 111 | + "HotelId": "48", |
| 112 | + "HotelName": "Nordick's Motel", |
| 113 | + "Description": "Only 90 miles (about 2 hours) from the nation's capital and nearby most everything the historic valley has to offer. Hiking? Wine Tasting? Exploring the caverns? It's all nearby and we have specially priced packages to help make our B&B your home base for fun while visiting the valley.", |
| 114 | + "Category": "Boutique", |
| 115 | + "Address": { |
| 116 | + "City": "Washington D.C.", |
| 117 | + "StateProvince": null |
| 118 | + } |
| 119 | + } |
| 120 | + ] |
| 121 | +} |
| 122 | +``` |
| 123 | + |
| 124 | +## Benefits |
| 125 | + |
| 126 | +Hybrid search combines the strengths of vector search and keyword search. The advantage of vector search is finding information that's similar to your search query, even if there are no keyword matches in the inverted index. The advantage of keyword or full text search is precision, and the ability to apply semantic ranking that improves the quality of the initial results. Some scenarios, such as product codes, highly specialized jargon, dates, etc. can perform better with keyword search because it can identify exact matches. |
| 127 | + |
| 128 | +Benchmark testing on real-world and benchmark datasets indicates that hybrid retrieval with semantic ranking offers significant benefits in search relevance. |
| 129 | + |
| 130 | +## See also |
| 131 | + |
| 132 | +[Outperform vector search with hybrid retrieval and ranking (Tech blog)](https://techcommunity.microsoft.com/t5/azure-ai-services-blog/azure-cognitive-search-outperforming-vector-search-with-hybrid/ba-p/3929167) |
0 commit comments