How to get the context used in index.query() using langchain #11101

aina00001 · 2023-09-27T08:07:22Z

aina00001
Sep 27, 2023

I am using langchain to extract informations from a pdf document,
The aim is to determine from which page of the PDF he extracted the context.

But my first step is to have this context.

This is my code:

response = self.index.query(question=req, llm=ChatOpenAI())

The index is from this function:

def set_file(self, file_path):
loader = None
loader = PyPDFLoader(file_path)
data = loader.load()
self.index = VectorstoreIndexCreator(
vectorstore_cls=DocArrayInMemorySearch
).from_documents(data)

@dosu-bot · 2023-09-27T08:14:38Z

dosubot[bot]
bot Sep 27, 2023

🤖

Hello,

From your code, it seems like you're using the query() function of the VectorstoreIndexCreator class to get a response to a question from a set of documents loaded from a PDF file.

The query() function is used to retrieve the most relevant documents from the index based on a given question. It uses a language model to generate a vector representation of the question, and then retrieves the most similar documents from the index.

However, the query() function does not return the context from which the answer was extracted. It only returns the most relevant documents based on the question. If you want to know from which page of the PDF the context was extracted, you would need to keep track of this information when you load the data from the PDF file.

One way to do this would be to include the page number in the document data when you load it from the PDF file. For example, you could modify your set_file() function to include the page number in the document data:

def set_file(self, file_path):
    loader = PyPDFLoader(file_path)
    data = []
    for i, page in enumerate(loader.load()):
        data.append({
            'text': page,
            'page_number': i
        })
    self.index = VectorstoreIndexCreator(
        vectorstore_cls=DocArrayInMemorySearch
    ).from_documents(data)

Then, when you query the index, you can retrieve the page number from the returned documents:

response = self.index.query(question=req, llm=ChatOpenAI())
for doc in response:
    print(f"Page number: {doc['page_number']}")

Please note that this is just a suggestion and the actual implementation may vary based on your specific requirements and the structure of your code.

I hope this helps! Let me know if you have any other questions.

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to get the context used in index.query() using langchain #11101

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How to get the context used in index.query() using langchain #11101

Uh oh!

aina00001 Sep 27, 2023

Replies: 1 comment

Uh oh!

dosubot[bot] bot Sep 27, 2023

Sources

aina00001
Sep 27, 2023

dosubot[bot]
bot Sep 27, 2023