Skip to content

Page number in Reference does not align with image location #44

@yuliiamashkovtseva

Description

@yuliiamashkovtseva

Name and Version

0.31.1

What steps will reproduce the bug?

  1. Upload the .docx document to the RAG, and ask a question about the information that could be found on the image only. In my case, the documents has 306 pages, the font is Calibri
  2. Check the step "Processing document". After converting it to pdf the number of the pages differs:
    DescriptionRetriever: Number of pages: 287
  3. Consequently, all references to images are incorrect.

What is the expected behavior?

The number of the pages is equal and the Reference for image is pointing to the correct page.

What do you see instead?

all references to images are incorrect. In this case, there is no image on the page 238

Image

Additional information

I have done some testing with another font (like arial) which is a system one and it seems the page calculation is more accurate.
Also tried "Embed fonts in the file" and it does not seem to have effect.
Could you please give the "best-practice" settings/fonts to use with the current implementation and let us know if this can be fixed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions