Skip to content

ES|QL: Hybrid search with FUSE (GA) #123389

@ioanatia

Description

@ioanatia

tracked in #123043

Hybrid search in ES|QL

Hybrid search is a retrieval method that combines multiple search strategies (lexical, knn, semantic etc) into a single query.
In ES|QL we are looking at supporting:

  • RRF (reciprocal rank fusion)
  • linear combination

We will introduce a FUSE command that will group rows by _id, _index (these will be configurable) and apply different hybrid search ranking methods to obtain new relevance scores.
FUSE will support RRF and linear to start with.
FUSE will mostly be used in combination with FORK results, but is also a standalone primitive

Execution of FUSE

Conceptually FUSE will be split into multiple phases:

  • A phase where each row receives a new score based on the hybrid search method that is being used
  • A merge step which will:
    • group the rows by _id, _index (by default) and compute a new final score based on the scores from the previous step

Feature work

Customizable colums used for fusing rows

#135079

  • ability to customize the columns used for fusing rows (_id, _index are the defaults)
  • ability to customize the score column (_score is used by default)
  • ability to customize the discriminator/key column (_fork is used by default)

RRF support

#134227

  • customization of the rank_constant
  • ability to provide different weights

Linear combination

#134543

  • ability to provide different weights
  • minmax score normalization
  • l2_norm score normalization

Implementation follow-ups

(items such as tests and bug fixes aimed to stabilise the implementation - these are required for tech preview)

PRs:

(all the implementation PRs are linked here)

Initial prototype: #131078

Future enhancements

Not required for tech preview:

  • provide different normalizers per result group
  • we enforce the score column to always be a DOUBLE, but it could just be any numeric type
  • support l2_norm after we define what the behaviour should be for negative scores
  • let FUSE output columns with unsupported data types which just return null values (like FORK)
  • when ROW is used, we shouldn't require a LIMIT before FUSE

Metadata

Metadata

Assignees

Labels

:Search Relevance/SearchCatch all for Search RelevanceES|QL-uiImpacts ES|QL UIMetaTeam:Search RelevanceMeta label for the Search Relevance team in Elasticsearchpriority:highA label for assessing bug priority to be used by ES engineersv9.3.0

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions