Get (document, similarity score) pairs from retriever #4595

logancyang · 2024-03-01T02:01:09Z

logancyang
Mar 1, 2024

Checked other resources

I added a very descriptive title to this question.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.

Commit to Help

I commit to help with one of those options 👆

Example Code

public static createConversationalRetrievalChain(
    args: ConversationalRetrievalChainParams,
    onDocumentsRetrieved: (documents: Document[]) => void,
  ): RunnableSequence {
    const { llm, retriever } = args;

    const condenseQuestionTemplate = `Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

    Chat History:
    {chat_history}
    Follow Up Input: {question}
    Standalone question:`;
    const CONDENSE_QUESTION_PROMPT = PromptTemplate.fromTemplate(
      condenseQuestionTemplate
    );

    const answerTemplate = `Answer the question with as detailed as possible based only on the following context:
    {context}

    Question: {question}
    `;
    const ANSWER_PROMPT = PromptTemplate.fromTemplate(answerTemplate);

    const formatChatHistory = (chatHistory: [string, string][]) => {
      const formattedDialogueTurns = chatHistory.map(
        (dialogueTurn) => `Human: ${dialogueTurn[0]}\nAssistant: ${dialogueTurn[1]}`
      );
      return formattedDialogueTurns.join("\n");
    };

    const standaloneQuestionChain = RunnableSequence.from([
      {
        question: (input: ConversationalRetrievalQAChainInput) => input.question,
        chat_history: (input: ConversationalRetrievalQAChainInput) =>
          formatChatHistory(input.chat_history),
      },
      CONDENSE_QUESTION_PROMPT,
      llm,
      new StringOutputParser(),
    ]);

    const formatDocumentsAsStringAndStore = async (documents: Document[]) => {
      // Store or log documents for debugging
      onDocumentsRetrieved(documents);
      return formatDocumentsAsString(documents);
    };

    const answerChain = RunnableSequence.from([
      {
        context: retriever.pipe(formatDocumentsAsStringAndStore),
        question: new RunnablePassthrough(),
      },
      ANSWER_PROMPT,
      llm,
    ]);

    const conversationalRetrievalQAChain = standaloneQuestionChain.pipe(answerChain);
    return conversationalRetrievalQAChain as RunnableSequence;
  }

Description

Need guidance on getting the (doc, similarity score) pairs from this LCEL setup.

I have a callback onDocumentsRetrieved here from the caller to access the retrieved docs, but I also want the similarity score. How to do that?

System Info

node v19.8.1
langchain v0.0.212

Answered by jacoblee93

Mar 5, 2024

Hey @logancyang, this is something we've been looking into a bit. The tricky thing is that we don't want to store scores as metadata on the documents themselves since those can be resaved to a store later.

If you're ok with that happening, you could create your own custom retriever:

https://js.langchain.com/docs/modules/data_connection/retrievers/custom

Or you could just wrap the vectorstore directly in something like a RunnableLambda and return tuples.

View full answer

@logancyang · 2024-03-01T02:01:20Z

dosubot[bot]
bot Mar 1, 2024

🤖

Hey @logancyang, great to see you diving into another challenge with langchainjs! Let's tackle this one together. 🚀

Based on the code you've provided, it seems you're trying to retrieve both the documents and their similarity scores from the retriever. However, the current implementation of the ScoreThresholdRetriever (or whichever retriever you're using) might not support this directly.

To achieve this, you would need to modify or extend the retriever to also return the similarity scores. Here's a general approach:

Extend the ScoreThresholdRetriever (or your current retriever) to include a method that returns both the documents and their similarity scores. This could be an array of objects, where each object contains a document and its corresponding similarity score.

class ScoreThresholdRetrieverWithScores extends ScoreThresholdRetriever {
  async getRelevantDocumentsWithScores(query: string): Promise<Array<{document: Document, score: number}>> {
    // Your implementation here
  }
}

In your getRelevantDocumentsWithScores method, you would need to compute the similarity scores between the query and the documents, filter the documents based on a minimum similarity score threshold, and then return both the documents and their corresponding similarity scores.
Replace the current retriever in your createConversationalRetrievalChain method with an instance of ScoreThresholdRetrieverWithScores.
Adjust the formatDocumentsAsStringAndStore function to handle the new data structure returned by getRelevantDocumentsWithScores.

Please note that this is a general approach and the actual implementation would depend on the internals of the ScoreThresholdRetriever and related classes. You might need to adjust this approach based on your specific use case and the actual implementation of your retriever.

For more information on the structure of the Document object, you can refer to the Document class in the langchain-core/src/documents/document.ts file. The Document object consists of pageContent (a string representing the content of the page) and metadata (an object of any type).

I hope this helps! If you have any more questions or need further clarification, feel free to ask.

Sources

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

0 replies

logancyang · 2024-03-01T02:11:13Z

logancyang
Mar 1, 2024
Author

@dosu-bot Assuming that I implemented this ScoreThresholdRetrieverWithScores, how and where should getRelevantDocumentsWithScores be called in my setup? Can you share a snippet around that.

1 reply

dosubot[bot] bot Mar 1, 2024

🤖

To integrate the getRelevantDocumentsWithScores method from your new class ScoreThresholdRetrieverWithScores into the existing createConversationalRetrievalChain function in the LangChain framework, you need to modify the ConversationalRetrievalQAChain class to use this new method instead of the existing getRelevantDocuments method. This involves changing the part of the _call method where documents are retrieved.

First, ensure that your ScoreThresholdRetrieverWithScores class implements the BaseRetrieverInterface or extends a class that does, and that it includes the getRelevantDocumentsWithScores method.

Then, modify the _call method in the ConversationalRetrievalQAChain class to use getRelevantDocumentsWithScores instead of getRelevantDocuments. Here's how you could do it:

Replace the call to this.retriever.getRelevantDocuments with this.retriever.getRelevantDocumentsWithScores if your ScoreThresholdRetrieverWithScores is set as the retriever. This assumes that getRelevantDocumentsWithScores returns data in a compatible format or that you adjust the handling accordingly.
Adjust the handling of the returned documents and scores as needed, depending on what getRelevantDocumentsWithScores returns.

Here's an example modification of the _call method:

async _call(
  values: ChainValues,
  runManager?: CallbackManagerForChainRun
): Promise<ChainValues> {
  if (!(this.inputKey in values)) {
    throw new Error(`Question key ${this.inputKey} not found.`);
  }
  if (!(this.chatHistoryKey in values)) {
    throw new Error(`Chat history key ${this.chatHistoryKey} not found.`);
  }
  const question: string = values[this.inputKey];
  const chatHistory: string =
    ConversationalRetrievalQAChain.getChatHistoryString(
      values[this.chatHistoryKey]
    );
  let newQuestion = question;
  if (chatHistory.length > 0) {
    const result = await this.questionGeneratorChain.call(
      {
        question,
        chat_history: chatHistory,
      },
      runManager?.getChild("question_generator")
    );
    const keys = Object.keys(result);
    if (keys.length === 1) {
      newQuestion = result[keys[0]];
    } else {
      throw new Error(
        "Return from llm chain has multiple values, only single values supported."
      );
    }
  }
  // Use getRelevantDocumentsWithScores if available
  let docs;
  if (typeof this.retriever.getRelevantDocumentsWithScores === 'function') {
    const { documents, scores } = await this.retriever.getRelevantDocumentsWithScores(
      newQuestion,
      runManager?.getChild("retriever")
    );
    docs = documents; // Assuming you want to proceed with documents only
    // Handle scores if needed
  } else {
    docs = await this.retriever.getRelevantDocuments(
      newQuestion,
      runManager?.getChild("retriever")
    );
  }
  const inputs = {
    question: newQuestion,
    input_documents: docs,
    chat_history: chatHistory,
  };
  let result = await this.combineDocumentsChain.call(
    inputs,
    runManager?.getChild("combine_documents")
  );
  if (this.returnSourceDocuments) {
    result = {
      ...result,
      sourceDocuments: docs,
    };
  }
  if (this.returnGeneratedQuestion) {
    result = {
      ...result,
      generatedQuestion: newQuestion,
    };
  }
  return result;
}

This example assumes that getRelevantDocumentsWithScores returns an object with documents and scores properties. You'll need to adjust this based on the actual implementation of getRelevantDocumentsWithScores in your ScoreThresholdRetrieverWithScores class.

Please note that the getRelevantDocumentsWithScores method is not explicitly defined in the provided context from the ScoreThresholdRetrieverWithScores class. Therefore, it's impossible to directly answer the question about the exact structure and type of the object returned by a method that does not appear in the given context.

You can refer to the source code of the ConversationalRetrievalQAChain class in the LangChain repository here for more details.

Sources

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

jacoblee93 · 2024-03-05T08:15:09Z

jacoblee93
Mar 5, 2024
Maintainer

Hey @logancyang, this is something we've been looking into a bit. The tricky thing is that we don't want to store scores as metadata on the documents themselves since those can be resaved to a store later.

If you're ok with that happening, you could create your own custom retriever:

https://js.langchain.com/docs/modules/data_connection/retrievers/custom

Or you could just wrap the vectorstore directly in something like a RunnableLambda and return tuples.

2 replies

logancyang Mar 5, 2024
Author

@jacoblee93 thanks for the answer! Do you guys have plans to add the ensemble retriever https://python.langchain.com/docs/modules/data_connection/retrievers/ensemble to the JS version too?

jacoblee93 Mar 6, 2024
Maintainer

Nothing immediate but we'd welcome a community contribution!

logancyang · 2024-03-11T05:36:02Z

logancyang
Mar 11, 2024
Author

@dosu-bot actually, how do I add console.log to show the output of standaloneQuestionChain? I found that the original user query got rewritten and would like to avoid it, but first I want to understand how and where to add console log to this chain to debug.

  public static createConversationalRetrievalChain(
    args: ConversationalRetrievalChainParams,
    onDocumentsRetrieved: (documents: Document[]) => void,
  ): RunnableSequence {
    const { llm, retriever } = args;

    const condenseQuestionTemplate = `Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

    Chat History:
    {chat_history}
    Follow Up Input: {question}
    Standalone question:`;
    const CONDENSE_QUESTION_PROMPT = PromptTemplate.fromTemplate(
      condenseQuestionTemplate
    );

    const answerTemplate = `Answer the question with as detailed as possible based only on the following context:
    {context}

    Question: {question}
    `;
    const ANSWER_PROMPT = PromptTemplate.fromTemplate(answerTemplate);

    const formatChatHistory = (chatHistory: [string, string][]) => {
      const formattedDialogueTurns = chatHistory.map(
        (dialogueTurn) => `Human: ${dialogueTurn[0]}\nAssistant: ${dialogueTurn[1]}`
      );
      return formattedDialogueTurns.join("\n");
    };

    const standaloneQuestionChain = RunnableSequence.from([
      {
        question: (input: ConversationalRetrievalQAChainInput) => input.question,
        chat_history: (input: ConversationalRetrievalQAChainInput) =>
          formatChatHistory(input.chat_history),
      },
      CONDENSE_QUESTION_PROMPT,
      llm,
      new StringOutputParser(),
    ]);

    const formatDocumentsAsStringAndStore = async (documents: Document[]) => {
      // Store or log documents for debugging
      onDocumentsRetrieved(documents);
      return formatDocumentsAsString(documents);
    };

    const answerChain = RunnableSequence.from([
      {
        context: retriever.pipe(formatDocumentsAsStringAndStore),
        question: new RunnablePassthrough(),
      },
      ANSWER_PROMPT,
      llm,
    ]);

    const conversationalRetrievalQAChain = standaloneQuestionChain.pipe(answerChain);
    return conversationalRetrievalQAChain as RunnableSequence;
  }
}

1 reply

dosubot[bot] bot Mar 11, 2024

Based on the code you've provided, it seems like you're trying to debug the process of generating a standalone question from a follow-up question within a conversation context. To do this, you can add console.log statements at key points in the standaloneQuestionChain and answerChain sequences.

Here's how you can do it:

Add a console.log in the standaloneQuestionChain to log the input question and the formatted chat history:

const standaloneQuestionChain = RunnableSequence.from([
  {
    question: (input: ConversationalRetrievalQAChainInput) => {
      console.log("Input Question: ", input.question);
      return input.question;
    },
    chat_history: (input: ConversationalRetrievalQAChainInput) => {
      const formattedChatHistory = formatChatHistory(input.chat_history);
      console.log("Formatted Chat History: ", formattedChatHistory);
      return formattedChatHistory;
    },
  },
  CONDENSE_QUESTION_PROMPT,
  llm,
  new StringOutputParser(),
]);

Add a console.log in the answerChain to log the context and the question:

const answerChain = RunnableSequence.from([
  {
    context: retriever.pipe(formatDocumentsAsStringAndStore),
    question: new RunnablePassthrough((question) => {
      console.log("Question for Answer Chain: ", question);
      return question;
    }),
  },
  ANSWER_PROMPT,
  llm,
]);

These console.log statements will help you understand how the input question and chat history are being processed and transformed throughout the createConversationalRetrievalChain function. If the original user query is being rewritten, these logs should help you identify where that's happening.

Remember to remove or comment out the console.log statements once you're done debugging to avoid cluttering the console with unnecessary logs.

For more information on how the RunnableSequence works, you can refer to the RunnableSequence class in the LangChain JS framework.

If you need further assistance, feel free to ask.

Sources

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

logancyang · 2024-03-11T06:29:45Z

logancyang
Mar 11, 2024
Author

@dosu-bot how come I do not see the console log from the 2nd snippet you suggested, but can see them from the 1st snippet successfully?

This is the snippet from which I do not see the message logged

const answerChain = RunnableSequence.from([
  {
    context: retriever.pipe(formatDocumentsAsStringAndStore),
    question: new RunnablePassthrough((question) => {
      console.log("Question for Answer Chain: ", question);
      return question;
    }),
  },
  ANSWER_PROMPT,
  llm,
]);

1 reply

dosubot[bot] bot Mar 11, 2024

Based on the provided context and the code snippet, it seems like you have correctly implemented the RunnableSequence.from method and the RunnablePassthrough method. The RunnableSequence.from method should execute the provided sequence in the expected order, and the RunnablePassthrough method should log the question to the console before passing it to the next runnable in the sequence.

However, if you're not seeing the console log message "Question for Answer Chain: ", it could be due to a few reasons:

The Runnable Sequence is not being invoked: The RunnableSequence and the RunnablePassthrough are set up correctly, but they won't do anything until the sequence is invoked. Make sure that you're calling the run method on the answerChain with the appropriate input.
The input to the Runnable Sequence is not reaching the RunnablePassthrough: If there's an error or an early return in any of the runnables before the RunnablePassthrough, the input might not be reaching it. Check the runnables preceding the RunnablePassthrough in the sequence to ensure they're passing their output correctly.
The console log is not visible: Depending on your runtime environment, the console log might not be visible. If you're running this in a server environment, the console logs might be redirected to a log file. If you're running this in a browser, make sure to check the browser's console.

To debug this further, you could add more logging at different points in the sequence to see how far the input is getting. For example, you could add a RunnablePassthrough at the beginning of the sequence to log the initial input:

const answerChain = RunnableSequence.from([
  new RunnablePassthrough((input) => {
    console.log("Initial input: ", input);
    return input;
  }),
  {
    context: retriever.pipe(formatDocumentsAsStringAndStore),
    question: new RunnablePassthrough((question) => {
      console.log("Question for Answer Chain: ", question);
      return question;
    }),
  },
  ANSWER_PROMPT,
  llm,
]);

This will help you determine whether the input is reaching the sequence and how it's being transformed as it moves through the sequence.

If you're still having trouble, please provide more details about your runtime environment and the context in which you're invoking the answerChain. This will help me provide more specific guidance.

Sources

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

Get (document, similarity score) pairs from retriever #4595

Uh oh!

logancyang Mar 1, 2024

Checked other resources

Commit to Help

Example Code

Description

System Info

Replies: 5 comments · 5 replies

Uh oh!

Uh oh!

dosubot[bot] bot Mar 1, 2024

Sources

About Dosu

Uh oh!

logancyang Mar 1, 2024 Author

Uh oh!

dosubot[bot] bot Mar 1, 2024

Sources

About Dosu

Uh oh!

jacoblee93 Mar 5, 2024 Maintainer

Uh oh!

logancyang Mar 5, 2024 Author

Uh oh!

jacoblee93 Mar 6, 2024 Maintainer

Uh oh!

logancyang Mar 11, 2024 Author

Uh oh!

dosubot[bot] bot Mar 11, 2024

Sources

About Dosu

Uh oh!

logancyang Mar 11, 2024 Author

Uh oh!

dosubot[bot] bot Mar 11, 2024

Sources

About Dosu

logancyang
Mar 1, 2024

Replies: 5 comments 5 replies

dosubot[bot]
bot Mar 1, 2024

logancyang
Mar 1, 2024
Author

jacoblee93
Mar 5, 2024
Maintainer

logancyang Mar 5, 2024
Author

jacoblee93 Mar 6, 2024
Maintainer

logancyang
Mar 11, 2024
Author

logancyang
Mar 11, 2024
Author