Optimal way to do implement multiple RAG contexts #6707
Replies: 1 comment
-
Hey there, @kholiavko-roman! I'm here to help you with any bugs, questions, or becoming a contributor. Let's tackle this together! 🤖 I found a similar unsolved discussion on combining LangChain retrievers here [1]. Regarding your question, there is an optimized way to implement multiple RAG contexts in LangChain and have only one request to the embedding model for all the retrievers. You can use the Here is an example of how to use the import { OpenAIEmbeddings } from "@langchain/openai";
import { MultiVectorRetriever } from "langchain/retrievers_multi_vector";
import { TextSplitter } from "langchain/text_splitter";
// Initialize the embedding model
const embeddings = new OpenAIEmbeddings({
apiKey: "your-openai-api-key",
});
// Define a text splitter to create smaller chunks
const splitter = new TextSplitter({
chunkSize: 200,
chunkOverlap: 50,
});
// Create the MultiVectorRetriever
const retriever = new MultiVectorRetriever({
embeddings,
splitter,
documents: [
{
id: "doc1",
text: "Your document text here...",
},
// Add more documents as needed
],
});
// Use the retriever to get relevant documents
const results = await retriever.retrieve("Your query here...");
console.log(results); Additionally, you can enhance the retrieval process by using the Rewrite-Retrieve-Read method, which involves rewriting the query to optimize retrieval. This method is particularly useful for handling complex or "distracted" queries. Here is an example of how to implement this method: import { PromptTemplate } from "npm:[email protected]/prompts";
import { ChatOpenAI } from "npm:[email protected]/chat_models/openai";
import { StringOutputParser } from "npm:[email protected]/schema/output_parser";
import { RunnableSequence, RunnablePassthrough } from "npm:[email protected]/schema/runnable";
import { TavilySearchAPIRetriever } from "npm:[email protected]/retrievers/tavily_search_api";
import type { Document } from "npm:[email protected]/schema/document";
const rewriteTemplate = `Provide a better search query for \
web search engine to answer the given question, end \
the queries with ’**’. Question: \
{x} Answer:`;
const rewritePrompt = PromptTemplate.fromTemplate(rewriteTemplate);
// Parser to remove the `**`
const _parse = (text) => text.replace("**", "");
const rewriter = RunnableSequence.from([
rewritePrompt,
new ChatOpenAI({ temperature: 0 }),
new StringOutputParser(),
_parse
]);
const template = `Answer the users question based only on the following context:
<context>
{context}
</context>
Question: {question}`
const prompt = PromptTemplate.fromTemplate(template);
const model = new ChatOpenAI({
temperature: 0,
openAIApiKey: Deno.env.get("OPENAI_API_KEY"),
})
const retriever = new TavilySearchAPIRetriever({
k: 3,
apiKey: Deno.env.get("TAVILY_API_KEY"),
});
const formatDocs = (documents: Document[]) => documents.map((doc) => doc.pageContent).join("\n");
const rewriteRetrieveReadChain = RunnableSequence.from([
{
context: RunnableSequence.from([
{ x: new RunnablePassthrough() },
rewriter,
retriever,
formatDocs,
]),
question: new RunnablePassthrough()
},
prompt,
model,
new StringOutputParser()
]);
const distractedQuery = "man that sam bankman fried trial was crazy! what is langchain?";
await rewriteRetrieveReadChain.invoke(distractedQuery); Moreover, you can use the RAG Fusion method, which employs multiple query generation and Reciprocal Rank Fusion to re-rank search results. Here is an example of how to implement this method: import { OpenAIEmbeddings } from "npm:[email protected]/embeddings/openai";
import { MemoryVectorStore } from "npm:[email protected]/vectorstores/memory";
import { ChatOpenAI } from "npm:[email protected]/chat_models/openai";
import { pull } from "npm:[email protected]/hub";
import { StringOutputParser } from "npm:[email protected]/schema/output_parser";
import { RunnableLambda, RunnableSequence } from "npm:[email protected]/schema/runnable";
import { Document } from "npm:[email protected]/document";
// Initialize the embedding model and vector store
const vectorStore = await MemoryVectorStore.fromTexts(
allDocuments.map(({ text }) => text),
allDocuments.map(({ id }) => ({ id })),
new OpenAIEmbeddings()
);
const retriever = vectorStore.asRetriever();
// Define the chat model
const model = new ChatOpenAI({
temperature: 0,
});
// Pull a prompt from the hub
const prompt = await pull("langchain-ai/rag-fusion-query-generation");
// Define the chain for generating queries
const generateQueries = RunnableSequence.from([
prompt,
model,
new StringOutputParser(),
(output) => output.split("\n"),
]);
// Define the Reciprocal Rank Fusion function
const reciprocalRankFusion = (results: Document[][], k = 60) => {
const fusedScores: Record<string, number> = {};
for (const result of results) {
result.forEach((item, index) => {
const docString = item.pageContent;
if (!(docString in fusedScores)) {
fusedScores[docString] = 0;
}
fusedScores[docString] += 1 / (index + k);
});
}
const rerankedResults = Object.entries(fusedScores)
.sort((a, b) => b[1] - a[1])
.map(
([doc, score]) => new Document({ pageContent: doc, metadata: { score } })
);
return rerankedResults;
};
// Define the full chain
const chain = RunnableSequence.from([
generateQueries,
retriever.map(),
reciprocalRankFusion,
]);
const originalQuery = "impact of climate change";
const result = await chain.invoke({
original_query: originalQuery,
});
console.log(result); This code demonstrates how to set up the |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Checked other resources
Commit to Help
Example Code
Description
I added code example with 2 contexts bases on the doc https://js.langchain.com/v0.1/docs/expression_language/cookbook/retrieval/
Every time I ask a question, I have 2 similarity searches for both contexts. I'm concerned that we duplicating the same action twice by converting the input to vector. This means that each retriever will send a separate request to the embedding model to receive the vector. However, since the input is the same, it's redundant to make multiple requests.
Is there an optimized way to implement multiple RAG contexts and have only 1 request to the embedding model for all the retrievers?
Currently, if I need 4 contexts, I will have 4 (the same) requests to the embedding model, which doesn't look good.
System Info
N/A
Beta Was this translation helpful? Give feedback.
All reactions