Can Langchain filter documents by metadata using AWS Bedrock? #5918
Replies: 2 comments 2 replies
-
Hello @nick-elpedes! I'm here to help you with any issues or questions you have. I can assist you in solving bugs, answering questions, and becoming a contributor. Let's tackle this challenge together! To use LangChain to retrieve documents from AWS Bedrock with a metadata filter applied, you can modify the import {
RetrieveCommand,
BedrockAgentRuntimeClient,
BedrockAgentRuntimeClientConfig,
} from "@aws-sdk/client-bedrock-agent-runtime";
import { BaseRetriever } from "@langchain/core/retrievers";
import { Document } from "@langchain/core/documents";
/**
* Interface for the arguments required to initialize an
* AmazonKnowledgeBaseRetriever instance.
*/
export interface AmazonKnowledgeBaseRetrieverArgs {
knowledgeBaseId: string;
topK: number;
region: string;
clientOptions?: BedrockAgentRuntimeClientConfig;
}
/**
* Class for interacting with Amazon Bedrock Knowledge Bases, a RAG workflow oriented service
* provided by AWS. Extends the BaseRetriever class.
* @example
* ```typescript
* const retriever = new AmazonKnowledgeBaseRetriever({
* topK: 10,
* knowledgeBaseId: "YOUR_KNOWLEDGE_BASE_ID",
* region: "us-east-2",
* clientOptions: {
* credentials: {
* accessKeyId: "YOUR_ACCESS_KEY_ID",
* secretAccessKey: "YOUR_SECRET_ACCESS_KEY",
* },
* },
* });
*
* const docs = await retriever.getRelevantDocuments("How are clouds formed?");
* ```
*/
export class AmazonKnowledgeBaseRetriever extends BaseRetriever {
static lc_name() {
return "AmazonKnowledgeBaseRetriever";
}
lc_namespace = ["langchain", "retrievers", "amazon_bedrock_knowledge_base"];
knowledgeBaseId: string;
topK: number;
bedrockAgentRuntimeClient: BedrockAgentRuntimeClient;
constructor({
knowledgeBaseId,
topK = 10,
clientOptions,
region,
}: AmazonKnowledgeBaseRetrieverArgs) {
super();
this.topK = topK;
this.bedrockAgentRuntimeClient = new BedrockAgentRuntimeClient({
region,
...clientOptions,
});
this.knowledgeBaseId = knowledgeBaseId;
}
/**
* Cleans the result text by replacing sequences of whitespace with a
* single space and removing ellipses.
* @param resText The result text to clean.
* @returns The cleaned result text.
*/
cleanResult(resText: string) {
const res = resText.replace(/\s+/g, " ").replace(/\.\.\./g, "");
return res;
}
async queryKnowledgeBase(query: string, topK: number, metadataFilter?: Record<string, string>) {
const retrieveCommand = new RetrieveCommand({
knowledgeBaseId: this.knowledgeBaseId,
retrievalQuery: {
text: query,
metadataFilter: metadataFilter, // Add metadata filter here
},
retrievalConfiguration: {
vectorSearchConfiguration: {
numberOfResults: topK,
},
},
});
const retrieveResponse = await this.bedrockAgentRuntimeClient.send(
retrieveCommand
);
return (
retrieveResponse.retrievalResults?.map((result) => ({
pageContent: this.cleanResult(result.content?.text || ""),
metadata: {
source: result.location?.s3Location?.uri,
score: result.score,
...result.metadata,
},
})) ?? ([] as Array<Document>)
);
}
async _getRelevantDocuments(query: string, metadataFilter?: Record<string, string>): Promise<Document[]> {
const docs = await this.queryKnowledgeBase(query, this.topK, metadataFilter);
return docs;
}
} You can then use the const retriever = new AmazonKnowledgeBaseRetriever({
topK: 10,
knowledgeBaseId: "YOUR_KNOWLEDGE_BASE_ID",
region: "us-east-2",
clientOptions: {
credentials: {
accessKeyId: "YOUR_ACCESS_KEY_ID",
secretAccessKey: "YOUR_SECRET_ACCESS_KEY",
},
},
});
const metadataFilter = { author: "John Doe" };
const docs = await retriever._getRelevantDocuments("How are clouds formed?", metadataFilter); This will retrieve documents from AWS Bedrock that match the specified metadata filter [1]. |
Beta Was this translation helpful? Give feedback.
-
Updated #5600 with potential solution |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Checked other resources
Commit to Help
Example Code
Description
How are you supposed to use LangChain to retrieve documents from AWS Bedrock with a metadata filter applied? I asked in the discord and in #5600 about this, but haven't received a response for a month now. There also is no way for me to add it as an issue to get more visibility, so I will ask it here, just in case I missed the functionality somewhere.
System Info
N/A
Beta Was this translation helpful? Give feedback.
All reactions