please include hybrid search in MultivectorRetriver by langchain for accurate multimodal rag #30698
mahendra867
announced in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Checked
Feature request
🔧 Feature Request
Title: Support Hybrid Search in MultiVectorRetriever
Description:
Please add native support for hybrid search in MultiVectorRetriever by enabling compatibility with retrievers such as PineconeHybridSearchRetriever. Currently, hybrid search retrievers do not subclass LangChain's VectorStore, which results in incompatibility errors when attempting to pass them to MultiVectorRetriever.
Relevant Links:
LangChain MultiVectorRetriever Docs
Pinecone Hybrid Search Retriever
LangChain VectorStore Interface
Motivation
💡 Motivation
In real-world multimodal Retrieval-Augmented Generation (RAG) systems, we often summarize different modalities like text, tables, and images to vector representations. These summaries benefit significantly from hybrid search, which combines dense embeddings with sparse retrieval like BM25. However, the current MultiVectorRetriever in LangChain only supports vector stores that subclass VectorStore, excluding hybrid search retrievers like PineconeHybridSearchRetriever.
This makes it difficult to build hybrid multimodal RAG systems without custom patches or wrappers.
I'm always frustrated when I try to pass a hybrid search retriever to MultiVectorRetriever and encounter compatibility errors, despite it having all the required methods.
Proposal (If applicable)
import os
from dotenv import load_dotenv
from langchain_community.retrievers import (
PineconeHybridSearchRetriever,
)
from langchain_openai import AzureOpenAIEmbeddings
from pinecone import Pinecone, ServerlessSpec
from pinecone_text.sparse import BM25Encoder
load_dotenv()
def create_retriever(text, text_summary, table, table_summary, image, image_summary):
This fails because PineconeHybridSearchRetriever is not a subclass of VectorStore. But it behaves like one and supports methods like add_documents and similarity_search.
✅ Proposed Solution
Allow MultiVectorRetriever to optionally accept any retriever-like object that implements the add_documents and similarity_search interface.
Alternatively, create a wrapper or adapter that conforms hybrid retrievers to the VectorStore interface.
Beta Was this translation helpful? Give feedback.
All reactions