Replies: 1 comment
-
🤖 Hello, From your code, it seems like you're using the The However, the One way to do this would be to include the page number in the document data when you load it from the PDF file. For example, you could modify your def set_file(self, file_path):
loader = PyPDFLoader(file_path)
data = []
for i, page in enumerate(loader.load()):
data.append({
'text': page,
'page_number': i
})
self.index = VectorstoreIndexCreator(
vectorstore_cls=DocArrayInMemorySearch
).from_documents(data) Then, when you query the index, you can retrieve the page number from the returned documents: response = self.index.query(question=req, llm=ChatOpenAI())
for doc in response:
print(f"Page number: {doc['page_number']}") Please note that this is just a suggestion and the actual implementation may vary based on your specific requirements and the structure of your code. I hope this helps! Let me know if you have any other questions. SourcesThis response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I am using langchain to extract informations from a pdf document,
The aim is to determine from which page of the PDF he extracted the
context
.But my first step is to have this
context
.This is my code:
response = self.index.query(question=req, llm=ChatOpenAI())
The
index
is from this function:def set_file(self, file_path):
loader = None
loader = PyPDFLoader(file_path)
data = loader.load()
self.index = VectorstoreIndexCreator(
vectorstore_cls=DocArrayInMemorySearch
).from_documents(data)
Beta Was this translation helpful? Give feedback.
All reactions