Ask questions based not only on page_content but also on metadata. #9868

nikitacorp · 2023-08-28T17:01:57Z

nikitacorp
Aug 28, 2023

Hello everyone! I am pretty new to LangChain, I'm trying to build LLM Agent, that will answer questions based on DataFrame. I am using gpt-3.5-turbo, so my context windows is 4096 tokens, my dataframe contains 195K tokens, to I can't input it fully into llm. I have devided it into chunks, stored in vectore storage in embedded format and tryed using several approaches to get answer to some dataframe-based question. I have used AttributeInfo() to specify what metadata atribute llm should use, when is recieves some question, here is code sample of AttributeInfo:

metadata_field_info = [
    AttributeInfo(
        name="id",
        description="The unique identifier of the blog post",
        type="int",
    ),
    AttributeInfo(
        name="title",
        description="The title of the blog post",
        type="string",
    ),
...

I have added metadata_field_info into SelfQueryRetriever:

document_content_description = "blog posts about religion"
retriever = SelfQueryRetriever.from_llm(llm, db, document_content_description, metadata_field_info, verbose=True)

When i have used retriever - it actually have found some Documents, that are simillar to the question, but when I tried to use more complex questions, for example: "Provide texts, whose title contains word 'shotgun'" (and in dataset there are several texts, whose title contains that word) I received next:

query='shotgun' filter=Comparison(comparator=<Comparator.EQ: 'eq'>, attribute='title', value='shotgun') limit=None
[]

That was the first problem, that i ancountered.
Then I tried to build LLM Agent, here is code snipper:

document_prompt = PromptTemplate(input_variables=["metadata"],
                                 template="{metadata}")

retrieval_qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=db.as_retriever(search_type="mmr",
                              search_kwargs={'fetch_k': 30}),
    return_source_documents=True,
    chain_type_kwargs={"document_prompt": document_prompt}
)

Agent worked, but only on page_content data, it did't use any metadata.

My question is, how can I make LLM Agent, that will use Document with page_content and metadata and fill serach for metadata fileds in question (as in worked in SelfQueryRetriever with AttributeInfo) and try to answer baed on both data: page_content and metadata?

@dosu-bot · 2023-08-28T17:10:09Z

dosubot[bot]
bot Aug 28, 2023

🤖

Hello,

Based on your description, it seems like you're trying to configure the LLM Agent to utilize both page_content and metadata from a Document, and perform searches based on metadata fields in a question.

To achieve this, you can use the load_query_constructor_chain function. This function takes in a BaseLanguageModel instance, the contents of the document to be queried, a list of AttributeInfo objects describing the attributes of the document, and other optional parameters.

Here's how you can use it:

from langchain import BaseLanguageModel
from langchain.chains.query_constructor.schema import AttributeInfo

# Assuming you have an instance of BaseLanguageModel
llm = BaseLanguageModel()

# Document contents and metadata
document_contents = "<your document contents>"
attribute_info = [
    AttributeInfo(name="<metadata field 1>", type="<type>", description="<description>"),
    AttributeInfo(name="<metadata field 2>", type="<type>", description="<description>"),
    # Add more AttributeInfo objects as needed
]

# Load the query constructor chain
chain = load_query_constructor_chain(llm, document_contents, attribute_info)

In the AttributeInfo objects, you should specify the metadata fields of the document that you want to be searchable. The load_query_constructor_chain function will then create a LLMChain that can construct queries based on these metadata fields.

Please replace <your document contents>, <metadata field 1>, <metadata field 2>, <type>, and <description> with your actual values.

Regarding the SelfQueryRetriever, it utilizes the AttributeInfo for metadata fields during the retrieval process in the from_llm class method. This method is used to create an instance of SelfQueryRetriever from a language model, vector store, document contents, and metadata field information.

The metadata_field_info parameter, which is a list of AttributeInfo objects, is passed to the load_query_constructor_chain function. This function is responsible for creating a query constructor chain, which is used to generate structured queries for the retriever.

Here is the relevant code:

llm_chain = load_query_constructor_chain(
    llm,
    document_contents,
    metadata_field_info,
    enable_limit=enable_limit,
    **chain_kwargs,
)

In the context shared, AttributeInfo is a Pydantic model that contains three fields: name, description, and type. These fields provide information about the metadata fields of the documents in the vector store. This information is used when constructing the structured queries, allowing the retriever to search for documents based on their metadata.

I hope this helps! If you have any other questions, feel free to ask.

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

2 replies

nikitacorp Aug 28, 2023
Author

Can you explain, what is inside function load_query_constructor_chain()? I don't see it in default LangChain functions

analyticanna Oct 5, 2023

I found this https://api.python.langchain.com/en/latest/chains/langchain.chains.query_constructor.base.load_query_constructor_chain.html
Did you get this solution working for you? Curious on your result.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Ask questions based not only on page_content but also on metadata. #9868

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Ask questions based not only on page_content but also on metadata. #9868

Uh oh!

nikitacorp Aug 28, 2023

Replies: 1 comment · 2 replies

Uh oh!

dosubot[bot] bot Aug 28, 2023

Sources

Uh oh!

Uh oh!

nikitacorp Aug 28, 2023 Author

Uh oh!

Uh oh!

analyticanna Oct 5, 2023

nikitacorp
Aug 28, 2023

Replies: 1 comment 2 replies

dosubot[bot]
bot Aug 28, 2023

nikitacorp Aug 28, 2023
Author