feat: Add new components QueryEmbeddingRetriever and MultiRetriever#10872
feat: Add new components QueryEmbeddingRetriever and MultiRetriever#10872
QueryEmbeddingRetriever and MultiRetriever#10872Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub. 1 Skipped Deployment
|
anakin87
left a comment
There was a problem hiding this comment.
I left some questions.
In general, design and implementation are pretty simple (good!).
At this point, we are starting to have several different names for retrievers, and this might be a bit confusing for users. I'd suggest involving DevRel for suggestions on names/developer experience of these new components.
|
|
|
Hey @sjrl ping me when you need a final review. |
That's fair! I was following the naming we already use for MultiQueryEmbeddingRetriever but I'm fine with going |
Related Issues
Proposed Changes:
Added two new retriever components:
MultiRetrieverandQueryEmbeddingRetriever.The
MultiRetrievercomponent is a generalisation of hybrid retrieval where hybrid retrieval traditionally combines keyword search (BM25) with vector search,MultiRetrieverlets you compose any number of retrievers into a single component. All retrievers are queried in parallel and their results are deduplicated before being returned.active_retrieversparameter allows for users to easily switch on/off any of the provided retrievers in the init method at run time. Replicating this behavior in a Pipeline is not as straightforward since it would require a lot of ConditionalRouters.QueryEmbeddingRetrieverwraps an embedding-based retriever together with a query embedder into a singleself-contained component. This new component then follows the
TextRetrieverprotocol which simplifies the implementation of theMultiRetrieverwhich uses theTextRetrieverprotocol.How did you test it?
Added new unit and integration tests
Notes for the reviewer
After talking with @julian-risch offline we decided it would make sense to merge this into Haystack so it would be available via nightly release so it can be used/tested in platform. As it's worked on in the platform it's possible changes will need to be made to the component(s).
cc @c-bonucci @Amnah199
Checklist
fix:,feat:,build:,chore:,ci:,docs:,style:,refactor:,perf:,test:and added!in case the PR includes breaking changes.