Combine documents from multiple retrievers in create_retrieval_chain #24864

kartiksonaghela · 2024-07-31T06:54:00Z

kartiksonaghela
Jul 31, 2024

Checked other resources

I added a very descriptive title to this question.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.

Commit to Help

I commit to help with one of those options 👆

Example Code

contextualize_q_system_prompt = """"Given a chat history and the latest user question \
    which might reference context in the chat history, formulate a standalone question \
    which can be understood without the chat history. Do NOT answer the question, \
    just reformulate it if needed and otherwise return it as is.
    """
    contextualize_q_prompt = ChatPromptTemplate.from_messages(
        [
            ("system", contextualize_q_system_prompt),
            MessagesPlaceholder("chat_history"),
            ("human", "{input}"),
            
        ]
    )
    history_aware_retriever = create_history_aware_retriever(
        llm,agronomy_retriever, contextualize_q_prompt
    )
    qa_system_prompt = """"You are an assistant for question-answering tasks. 
    Use the following pieces of retrieved context to answer the question.
    If you don't know the answer, just say that you don't know could you please provide more information,don't try to make-up the answers.
    Answers should be strictly from the retrieved context.
    {context}"""
    qa_prompt = ChatPromptTemplate.from_messages(
        [
            ("system", qa_system_prompt),
            MessagesPlaceholder("chat_history"),
            ("human", "{inputQuestion}"),
        ]
    )
    question_answer_chain = create_stuff_documents_chain(llm, qa_prompt)

    rag_chain = create_retrieval_chain(history_aware_retriever, question_answer_chain)

Description

As you can see in create_retrieval_chain(history_aware_retriever, question_answer_chain) from history_aware_retriever we are getting n similar chunks based on question and then we pass it to question_answer_chain to get the answer. I want to further add more retrieved chunks to history_aware_retriever . So if history_aware_retriever gets around 10 documents i want to add more 3 to it and then pass it to question_answer_chain note that the additional 3 chunks are basically from some different version of same question. In short i want to update question_answer_chain with more chunks but not from this particular flow .
Thankyou

System Info

All Latest Verison

@kartiksonaghela · 2024-07-31T07:02:46Z

dosubot[bot]
bot Jul 31, 2024

Hey @kartiksonaghela! I'm here to help you with any bugs, questions, or contributions you have in mind. Let's tackle this together! 😊

To update the create_retrieval_chain function to combine documents from multiple retrievers, you can modify the function to include an additional retriever and then merge the documents retrieved from both retrievers before passing them to the question_answer_chain. Here is how you can do it:

from __future__ import annotations

from typing import Any, Dict, Union

from langchain_core.retrievers import (
    BaseRetriever,
    RetrieverOutput,
)
from langchain_core.runnables import Runnable, RunnablePassthrough

def create_retrieval_chain(
    retriever: Union[BaseRetriever, Runnable[dict, RetrieverOutput]],
    additional_retriever: Union[BaseRetriever, Runnable[dict, RetrieverOutput]],
    combine_docs_chain: Runnable[Dict[str, Any], str],
) -> Runnable:
    """Create retrieval chain that retrieves documents from multiple retrievers and then passes them on.

    Args:
        retriever: Primary retriever-like object that returns list of documents.
        additional_retriever: Additional retriever-like object to retrieve more documents.
        combine_docs_chain: Runnable that takes inputs and produces a string output.

    Returns:
        An LCEL Runnable. The Runnable return is a dictionary containing at the very
        least a `context` and `answer` key.
    """
    if not isinstance(retriever, BaseRetriever):
        primary_retrieval_docs: Runnable[dict, RetrieverOutput] = retriever
    else:
        primary_retrieval_docs = (lambda x: x["input"]) | retriever

    if not isinstance(additional_retriever, BaseRetriever):
        additional_retrieval_docs: Runnable[dict, RetrieverOutput] = additional_retriever
    else:
        additional_retrieval_docs = (lambda x: x["input"]) | additional_retriever

    def merge_documents(primary_docs, additional_docs):
        # Assuming both primary_docs and additional_docs are lists of documents
        return primary_docs + additional_docs[:3]  # Add 3 more documents from additional retriever

    merged_retrieval_docs = (
        primary_retrieval_docs.with_config(run_name="retrieve_primary_documents")
        | (lambda x: {"primary_docs": x})
        | additional_retrieval_docs.with_config(run_name="retrieve_additional_documents")
        | (lambda x: {"additional_docs": x})
        | (lambda x: merge_documents(x["primary_docs"], x["additional_docs"]))
    )

    retrieval_chain = (
        RunnablePassthrough.assign(
            context=merged_retrieval_docs.with_config(run_name="merge_documents"),
        ).assign(answer=combine_docs_chain)
    ).with_config(run_name="retrieval_chain")

    return retrieval_chain

This updated function now takes an additional retriever and merges the documents retrieved from both retrievers, adding 3 more documents from the additional retriever to the primary retriever's documents. The merged documents are then passed to the combine_docs_chain.

In the context of the LangChain framework, you can use the create_history_aware_retriever to handle the historical context and then combine it with another retriever to get additional documents. This ensures that the retrieval process is aware of the conversation history and can provide more relevant documents.

For example, you can use the RunnableParallel to handle multiple inputs and combine them effectively, as shown in the LangChain documentation:

from operator import itemgetter
from langchain_core.runnables import RunnableParallel, RunnablePassthrough
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser

# Define the prompt template
template = """Answer the question based only on the following context:
{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)

# Define the model
model = ChatOpenAI()

# Define the chain
chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | model
    | StrOutputParser()
)

This example shows how to set up a retrieval-augmented generation chain using LangChain, which can be adapted to include multiple retrievers and combine their outputs ^[1]^[2]^[3].

To continue talking to Dosu, mention @dosu.

5 replies

kartiksonaghela Jul 31, 2024
Author

Here you are saying to update create_retrieval_chain module from langchain is there any other way where we can tackle this problem.

dosubot[bot] bot Jul 31, 2024

Yes, you can tackle the problem of combining documents from multiple retrievers without updating the create_retrieval_chain module by using the MergerRetriever class. This class is designed to merge the results of multiple retrievers.

Here is an example of how you can use the MergerRetriever:

from langchain_core.retrievers import SomeRetrieverClass1, SomeRetrieverClass2
from langchain.retrievers.merger_retriever import MergerRetriever

# Initialize individual retrievers
retriever1 = SomeRetrieverClass1()
retriever2 = SomeRetrieverClass2()

# Combine them using MergerRetriever
merger_retriever = MergerRetriever(retrievers=[retriever1, retriever2])

# Use the merger_retriever to get relevant documents
query = "your search query"
relevant_documents = merger_retriever._get_relevant_documents(query, run_manager=CallbackManagerForRetrieverRun())

Additionally, you can create a custom retrieval chain by combining different retrievers and document chains. Here’s an example of how you can create a retrieval chain that combines multiple retrievers and uses a document chain to answer questions:

from langchain_core.runnables import RunnablePassthrough
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.messages import HumanMessage

# Define the system template
SYSTEM_TEMPLATE = """
Answer the user's questions based on the below context. 
If the context doesn't contain any relevant information to the question, don't make something up and just say "I don't know":

<context>
{context}
</context>
"""

# Create a prompt template
question_answering_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", SYSTEM_TEMPLATE),
        MessagesPlaceholder(variable_name="messages"),
    ]
)

# Create a document chain
document_chain = create_stuff_documents_chain(chat, question_answering_prompt)

# Define a function to parse the retriever input
def parse_retriever_input(params):
    return params["messages"][-1].content

# Create a retrieval chain
retrieval_chain = RunnablePassthrough.assign(
    context=parse_retriever_input | merger_retriever,
).assign(
    answer=document_chain,
)

# Use the retrieval chain to answer questions
response = retrieval_chain.invoke(
    {
        "messages": [
            HumanMessage(content="Can LangSmith help test my LLM applications?")
        ],
    }
)

This way, you can combine documents from multiple retrievers and use them in a document chain to answer questions without needing to update the create_retrieval_chain module ^[1]^[2].

To continue talking to Dosu, mention @dosu.

kartiksonaghela Jul 31, 2024
Author

No this is not working can't we just try to update create_retrieval_chain(history_aware_retriever, question_answer_chain) history_aware_retriever
where history_aware_retriever is history_aware_retriever = create_history_aware_retriever(
llm,agronomy_retriever, contextualize_q_prompt
) just some method where history_aware_retriver get more chunks added to it

dosubot[bot] bot Jul 31, 2024

You can update the create_retrieval_chain function to add more chunks to the history_aware_retriever before passing them to the question_answer_chain. Here is how you can do it:

from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain

# Define the function to add more chunks to the retriever
def create_custom_retrieval_chain(history_aware_retriever, question_answer_chain, additional_docs):
    def custom_retrieval_chain(inputs):
        # Retrieve initial documents
        initial_docs = history_aware_retriever.retrieve(inputs)
        
        # Add additional documents
        combined_docs = initial_docs + additional_docs
        
        # Pass the combined documents to the question_answer_chain
        context = {"context": combined_docs, "input": inputs["input"], "chat_history": inputs.get("chat_history", [])}
        return question_answer_chain.invoke(context)
    
    return custom_retrieval_chain

# Example usage
additional_docs = [...]  # List of additional documents to add
custom_rag_chain = create_custom_retrieval_chain(history_aware_retriever, question_answer_chain, additional_docs)

# Invoke the custom chain
response = custom_rag_chain({"input": "What is Task Decomposition?", "chat_history": []})
print(response["answer"])

In this code, additional_docs should be a list of documents that you want to add to the retrieval process. The custom_retrieval_chain function combines the initial retrieved documents with the additional documents before passing them to the question_answer_chain. The context dictionary is updated to include the "input" key instead of "question" to align with the expected input format of the BaseConversationalRetrievalChain ^[1]^[2].

To continue talking to Dosu, mention @dosu.

dosubot[bot] bot Jul 31, 2024

You can update the create_retrieval_chain function to add more chunks to the history_aware_retriever before passing them to the question_answer_chain. Here is how you can do it:

from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain

# Define the function to add more chunks to the retriever
def create_custom_retrieval_chain(history_aware_retriever, question_answer_chain, additional_docs):
    def custom_retrieval_chain(inputs):
        # Retrieve initial documents
        initial_docs = history_aware_retriever.retrieve(inputs)
        
        # Add additional documents
        combined_docs = initial_docs + additional_docs
        
        # Pass the combined documents to the question_answer_chain
        context = {"context": combined_docs, "input": inputs["input"], "chat_history": inputs.get("chat_history", [])}
        return question_answer_chain.invoke(context)
    
    return custom_retrieval_chain

# Example usage
additional_docs = [...]  # List of additional documents to add
custom_rag_chain = create_custom_retrieval_chain(history_aware_retriever, question_answer_chain, additional_docs)

# Invoke the custom chain
response = custom_rag_chain({"input": "What is Task Decomposition?", "chat_history": []})
print(response["answer"])

In this code, additional_docs should be a list of documents that you want to add to the retrieval process. The custom_retrieval_chain function combines the initial retrieved documents with the additional documents before passing them to the question_answer_chain. The context dictionary is updated to include the "input" key instead of "question" to align with the expected input format of the BaseConversationalRetrievalChain ^[1]^[2].

To continue talking to Dosu, mention @dosu.

ngothankphuong · 2024-10-05T15:04:17Z

ngothankphuong
Oct 5, 2024

did you solve this problem, I also want to combine multiQuery retriever with chat history but could not find the documentation

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Combine documents from multiple retrievers in create_retrieval_chain #24864

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 5 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Combine documents from multiple retrievers in create_retrieval_chain #24864

Uh oh!

kartiksonaghela Jul 31, 2024

Checked other resources

Commit to Help

Example Code

Description

System Info

Replies: 2 comments · 5 replies

Uh oh!

dosubot[bot] bot Jul 31, 2024

Uh oh!

kartiksonaghela Jul 31, 2024 Author

Uh oh!

dosubot[bot] bot Jul 31, 2024

Uh oh!

kartiksonaghela Jul 31, 2024 Author

Uh oh!

dosubot[bot] bot Jul 31, 2024

Uh oh!

dosubot[bot] bot Jul 31, 2024

Uh oh!

ngothankphuong Oct 5, 2024

kartiksonaghela
Jul 31, 2024

Replies: 2 comments 5 replies

dosubot[bot]
bot Jul 31, 2024

kartiksonaghela Jul 31, 2024
Author

kartiksonaghela Jul 31, 2024
Author

ngothankphuong
Oct 5, 2024