Skip to content

feat: Add new components QueryEmbeddingRetriever and MultiRetriever#10872

Open
sjrl wants to merge 19 commits intomainfrom
mulitretriever
Open

feat: Add new components QueryEmbeddingRetriever and MultiRetriever#10872
sjrl wants to merge 19 commits intomainfrom
mulitretriever

Conversation

@sjrl
Copy link
Contributor

@sjrl sjrl commented Mar 18, 2026

Related Issues

Proposed Changes:

Added two new retriever components: MultiRetriever and QueryEmbeddingRetriever.

The MultiRetriever component is a generalisation of hybrid retrieval where hybrid retrieval traditionally combines keyword search (BM25) with vector search, MultiRetriever lets you compose any number of retrievers into a single component. All retrievers are queried in parallel and their results are deduplicated before being returned.

  • This allows a user to easily combine many different retrieval strategies together without needing to wire each retriever component individually in a Pipeline.
  • Also the active_retrievers parameter allows for users to easily switch on/off any of the provided retrievers in the init method at run time. Replicating this behavior in a Pipeline is not as straightforward since it would require a lot of ConditionalRouters.

QueryEmbeddingRetriever wraps an embedding-based retriever together with a query embedder into a single
self-contained component. This new component then follows the TextRetriever protocol which simplifies the implementation of the MultiRetriever which uses the TextRetriever protocol.

How did you test it?

Added new unit and integration tests

Notes for the reviewer

After talking with @julian-risch offline we decided it would make sense to merge this into Haystack so it would be available via nightly release so it can be used/tested in platform. As it's worked on in the platform it's possible changes will need to be made to the component(s).

cc @c-bonucci @Amnah199

Checklist

  • I have read the contributors guidelines and the code of conduct.
  • I have updated the related issue with new insights and changes.
  • I have added unit tests and updated the docstrings.
  • I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test: and added ! in case the PR includes breaking changes.
  • I have documented my code.
  • I have added a release note file, following the contributors guidelines.
  • I have run pre-commit hooks and fixed any issue.

@vercel
Copy link

vercel bot commented Mar 18, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
haystack-docs Ignored Ignored Preview Mar 20, 2026 7:52am

Request Review

@github-actions github-actions bot added topic:tests type:documentation Improvements on the docs labels Mar 18, 2026
@sjrl sjrl marked this pull request as ready for review March 19, 2026 10:54
@sjrl sjrl requested a review from a team as a code owner March 19, 2026 10:54
@sjrl sjrl requested review from anakin87 and removed request for a team March 19, 2026 10:54
Copy link
Member

@anakin87 anakin87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some questions.

In general, design and implementation are pretty simple (good!).

At this point, we are starting to have several different names for retrievers, and this might be a bit confusing for users. I'd suggest involving DevRel for suggestions on names/developer experience of these new components.

@kacperlukawski
Copy link
Member

MultiRetriever is a good name. For QueryEmbeddingRetriever though, I'd suggest renaming it to TextEmbeddingRetriever. Since every retriever operates on some kind of query, the "Query" prefix doesn't add meaningful distinction - "Text" is more precise as it describes the actual input modality this component is designed to handle. Also, if we decide to add image queries, then we can follow the convention of <Modality><Method>Retriever.

@anakin87
Copy link
Member

Hey @sjrl ping me when you need a final review.
(You might want to wait for reviews/feedback from others.)

@sjrl
Copy link
Contributor Author

sjrl commented Mar 20, 2026

MultiRetriever is a good name. For QueryEmbeddingRetriever though, I'd suggest renaming it to TextEmbeddingRetriever. Since every retriever operates on some kind of query, the "Query" prefix doesn't add meaningful distinction - "Text" is more precise as it describes the actual input modality this component is designed to handle. Also, if we decide to add image queries, then we can follow the convention of <Modality><Method>Retriever.

That's fair! I was following the naming we already use for MultiQueryEmbeddingRetriever but I'm fine with going TextEmbeddingRetriever here and we could rename the other one at a future point in time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

topic:tests type:documentation Improvements on the docs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants