Optimal way to do implement multiple RAG contexts #6707

kholiavko-roman · 2024-09-06T11:17:06Z

kholiavko-roman
Sep 6, 2024

Checked other resources

I added a very descriptive title to this question.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.

Commit to Help

I commit to help with one of those options 👆

Example Code

import { ChatOpenAI, OpenAIEmbeddings } from "@langchain/openai";
import { HNSWLib } from "@langchain/community/vectorstores/hnswlib";
import { formatDocumentsAsString } from "langchain/util/document";
import { PromptTemplate } from "@langchain/core/prompts";
import {
  RunnableSequence,
  RunnablePassthrough,
} from "@langchain/core/runnables";
import { StringOutputParser } from "@langchain/core/output_parsers";

const model = new ChatOpenAI({});

const vectorStore = await HNSWLib.fromTexts(
  ["mitochondria is the powerhouse of the cell"],
  [{ id: 1 }],
  new OpenAIEmbeddings()
);
const retriever = vectorStore.asRetriever();

const anotherVectorStore = await HNSWLib.fromTexts(
  ["another mitochondria is the powerhouse of the cell"],
  [{ id: 2 }],
  new OpenAIEmbeddings()
);
const anotherRetriever = anotherVectorStore.asRetriever();

const prompt =
  PromptTemplate.fromTemplate(`Answer the question based only on the following context:
{context}

Another context:
{anotherContext}

Question: {question}`);

const chain = RunnableSequence.from([
  {
    context: retriever.pipe(formatDocumentsAsString),
    // second context
    anotherContext: anotherRetriever.pipe(formatDocumentsAsString),
    question: new RunnablePassthrough(),
  },
  prompt,
  model,
  new StringOutputParser(),
]);

const result = await chain.invoke("What is the powerhouse of the cell?");

Description

I have an application with RAG which has 2 different contexts.

I added code example with 2 contexts bases on the doc https://js.langchain.com/v0.1/docs/expression_language/cookbook/retrieval/

Every time I ask a question, I have 2 similarity searches for both contexts. I'm concerned that we duplicating the same action twice by converting the input to vector. This means that each retriever will send a separate request to the embedding model to receive the vector. However, since the input is the same, it's redundant to make multiple requests.

Is there an optimized way to implement multiple RAG contexts and have only 1 request to the embedding model for all the retrievers?

Currently, if I need 4 contexts, I will have 4 (the same) requests to the embedding model, which doesn't look good.

System Info

N/A

@kholiavko-roman · 2024-09-06T11:18:40Z

dosubot[bot]
bot Sep 6, 2024

Hey there, @kholiavko-roman! I'm here to help you with any bugs, questions, or becoming a contributor. Let's tackle this together! 🤖

I found a similar unsolved discussion on combining LangChain retrievers here ^[1].

Regarding your question, there is an optimized way to implement multiple RAG contexts in LangChain and have only one request to the embedding model for all the retrievers. You can use the MultiVectorRetriever to generate multiple embeddings per document and then use those embeddings for retrieval. This approach allows you to make a single request to the embedding model while handling multiple retrievers.

Here is an example of how to use the MultiVectorRetriever with smaller chunks:

import { OpenAIEmbeddings } from "@langchain/openai";
import { MultiVectorRetriever } from "langchain/retrievers_multi_vector";
import { TextSplitter } from "langchain/text_splitter";

// Initialize the embedding model
const embeddings = new OpenAIEmbeddings({
  apiKey: "your-openai-api-key",
});

// Define a text splitter to create smaller chunks
const splitter = new TextSplitter({
  chunkSize: 200,
  chunkOverlap: 50,
});

// Create the MultiVectorRetriever
const retriever = new MultiVectorRetriever({
  embeddings,
  splitter,
  documents: [
    {
      id: "doc1",
      text: "Your document text here...",
    },
    // Add more documents as needed
  ],
});

// Use the retriever to get relevant documents
const results = await retriever.retrieve("Your query here...");

console.log(results);

Additionally, you can enhance the retrieval process by using the Rewrite-Retrieve-Read method, which involves rewriting the query to optimize retrieval. This method is particularly useful for handling complex or "distracted" queries. Here is an example of how to implement this method:

import { PromptTemplate } from "npm:[email protected]/prompts";
import { ChatOpenAI } from "npm:[email protected]/chat_models/openai";
import { StringOutputParser } from "npm:[email protected]/schema/output_parser";
import { RunnableSequence, RunnablePassthrough } from "npm:[email protected]/schema/runnable";
import { TavilySearchAPIRetriever } from "npm:[email protected]/retrievers/tavily_search_api";
import type { Document } from "npm:[email protected]/schema/document";

const rewriteTemplate = `Provide a better search query for \
web search engine to answer the given question, end \
the queries with ’**’. Question: \
{x} Answer:`;
const rewritePrompt = PromptTemplate.fromTemplate(rewriteTemplate);

// Parser to remove the `**`
const _parse = (text) => text.replace("**", "");

const rewriter = RunnableSequence.from([
  rewritePrompt,
  new ChatOpenAI({ temperature: 0 }),
  new StringOutputParser(),
  _parse
]);

const template = `Answer the users question based only on the following context:

<context>
  {context}
</context>

Question: {question}`

const prompt = PromptTemplate.fromTemplate(template);

const model = new ChatOpenAI({
  temperature: 0,
  openAIApiKey: Deno.env.get("OPENAI_API_KEY"),
})

const retriever = new TavilySearchAPIRetriever({
  k: 3,
  apiKey: Deno.env.get("TAVILY_API_KEY"),
});

const formatDocs = (documents: Document[]) => documents.map((doc) => doc.pageContent).join("\n");

const rewriteRetrieveReadChain = RunnableSequence.from([
  {
    context: RunnableSequence.from([
      { x: new RunnablePassthrough() },
      rewriter,
      retriever,
      formatDocs,
    ]),
    question: new RunnablePassthrough()
  },
  prompt,
  model,
  new StringOutputParser()
]);

const distractedQuery = "man that sam bankman fried trial was crazy! what is langchain?";

await rewriteRetrieveReadChain.invoke(distractedQuery);

Moreover, you can use the RAG Fusion method, which employs multiple query generation and Reciprocal Rank Fusion to re-rank search results. Here is an example of how to implement this method:

import { OpenAIEmbeddings } from "npm:[email protected]/embeddings/openai";
import { MemoryVectorStore } from "npm:[email protected]/vectorstores/memory";
import { ChatOpenAI } from "npm:[email protected]/chat_models/openai";
import { pull } from "npm:[email protected]/hub";
import { StringOutputParser } from "npm:[email protected]/schema/output_parser";
import { RunnableLambda, RunnableSequence } from "npm:[email protected]/schema/runnable";
import { Document } from "npm:[email protected]/document";

// Initialize the embedding model and vector store
const vectorStore = await MemoryVectorStore.fromTexts(
  allDocuments.map(({ text }) => text),
  allDocuments.map(({ id }) => ({ id })),
  new OpenAIEmbeddings()
);
const retriever = vectorStore.asRetriever();

// Define the chat model
const model = new ChatOpenAI({
  temperature: 0,
});

// Pull a prompt from the hub
const prompt = await pull("langchain-ai/rag-fusion-query-generation");

// Define the chain for generating queries
const generateQueries = RunnableSequence.from([
  prompt,
  model,
  new StringOutputParser(),
  (output) => output.split("\n"),
]);

// Define the Reciprocal Rank Fusion function
const reciprocalRankFusion = (results: Document[][], k = 60) => {
  const fusedScores: Record<string, number> = {};
  for (const result of results) {
    result.forEach((item, index) => {
      const docString = item.pageContent;
      if (!(docString in fusedScores)) {
        fusedScores[docString] = 0;
      }
      fusedScores[docString] += 1 / (index + k);
    });
  }

  const rerankedResults = Object.entries(fusedScores)
    .sort((a, b) => b[1] - a[1])
    .map(
      ([doc, score]) => new Document({ pageContent: doc, metadata: { score } })
    );
  return rerankedResults;
};

// Define the full chain
const chain = RunnableSequence.from([
  generateQueries,
  retriever.map(),
  reciprocalRankFusion,
]);

const originalQuery = "impact of climate change";

const result = await chain.invoke({
  original_query: originalQuery,
});

console.log(result);

This code demonstrates how to set up the MultiVectorRetriever with smaller chunks, implement the Rewrite-Retrieve-Read method, and use the RAG Fusion method to optimize the embedding process and handle multiple retrievers with a single request to the embedding model ^[2]^[3]^[4].

To continue talking to Dosu, mention @dosu.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimal way to do implement multiple RAG contexts #6707

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Optimal way to do implement multiple RAG contexts #6707

Uh oh!

kholiavko-roman Sep 6, 2024

Checked other resources

Commit to Help

Example Code

Description

System Info

Replies: 1 comment

Uh oh!

dosubot[bot] bot Sep 6, 2024

kholiavko-roman
Sep 6, 2024

dosubot[bot]
bot Sep 6, 2024