-
Notifications
You must be signed in to change notification settings - Fork 5
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Name and Version
0.31.1
What steps will reproduce the bug?
- Upload the .docx document to the RAG, and ask a question about the information that could be found on the image only. In my case, the documents has 306 pages, the font is Calibri
- Check the step "Processing document". After converting it to pdf the number of the pages differs:
DescriptionRetriever: Number of pages: 287 - Consequently, all references to images are incorrect.
What is the expected behavior?
The number of the pages is equal and the Reference for image is pointing to the correct page.
What do you see instead?
all references to images are incorrect. In this case, there is no image on the page 238
Additional information
I have done some testing with another font (like arial) which is a system one and it seems the page calculation is more accurate.
Also tried "Embed fonts in the file" and it does not seem to have effect.
Could you please give the "best-practice" settings/fonts to use with the current implementation and let us know if this can be fixed.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working