-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Description
tracked in #123043
Hybrid search in ES|QL
Hybrid search is a retrieval method that combines multiple search strategies (lexical, knn, semantic etc) into a single query.
In ES|QL we are looking at supporting:
- RRF (reciprocal rank fusion)
- linear combination
We will introduce a FUSE command that will group rows by _id, _index (these will be configurable) and apply different hybrid search ranking methods to obtain new relevance scores.
FUSE will support RRF and linear to start with.
FUSE will mostly be used in combination with FORK results, but is also a standalone primitive
Execution of FUSE
Conceptually FUSE will be split into multiple phases:
- A phase where each row receives a new score based on the hybrid search method that is being used
- A merge step which will:
- group the rows by _id, _index (by default) and compute a new final score based on the scores from the previous step
Feature work
Customizable colums used for fusing rows
- ability to customize the columns used for fusing rows (_id, _index are the defaults)
- ability to customize the score column (_score is used by default)
- ability to customize the discriminator/key column (_fork is used by default)
RRF support
- customization of the
rank_constant
- ability to provide different weights
Linear combination
- ability to provide different weights
-
minmax
score normalization -
l2_norm
score normalization
Implementation follow-ups
(items such as tests and bug fixes aimed to stabilise the implementation - these are required for tech preview)
- operator tests ES|QL: Add FUSE operator tests #135307
- handle the case where the group or score columns have multi values ES|QL: Handle multi values in FUSE #135448
- minmax/l2_norm should cover all cases - constant scores, negative scores, zero scores etc ES|QL: Handle multi values in FUSE #135448
- enforce that FUSE can only be applied on limited input (after a pipeline breaker, so on the coordinator) ES|QL: Set FUSE to execute on coordinator #135515
- gate l2_norm with a separate capability, since we need to clarify first what the behaviour should be for negative scores - part of ES|QL: Make FUSE available in release builds #135603
- handle unsupported fields in FUSE input ES|QL: Check for unsupported fields in FUSE #135530
- handle multi values when the key columns have multi values
- docs ES|QL: Docs for FUSE #135693
PRs:
(all the implementation PRs are linked here)
Initial prototype: #131078
- ES|QL: Add support for RRF options #134227
- ES|QL: Linear combination in FUSE #134543
- ES|QL: Refactor FUSE planning #134038
- ES|QL: Configurable score, key and group by columns for FUSE #135079
- ES|QL: Add FUSE operator tests #135307
- ES|QL: Handle multi values in FUSE #135448
- ES|QL: Set FUSE to execute on coordinator #135515
- ES|QL: Check for unsupported fields in FUSE #135530
- ES|QL: Make FUSE available in release builds #135603
- ES|QL: Docs for FUSE #135693
- ES|QL: Remove redundant capability checks in tests #135695
Future enhancements
Not required for tech preview:
- provide different normalizers per result group
- we enforce the score column to always be a DOUBLE, but it could just be any numeric type
- support l2_norm after we define what the behaviour should be for negative scores
- let FUSE output columns with unsupported data types which just return null values (like FORK)
- when ROW is used, we shouldn't require a LIMIT before FUSE