-
Notifications
You must be signed in to change notification settings - Fork 4.9k
Open
Description
Please provide us with the following information:
This issue is for a: (mark with an x
)
- [x ] bug report -> please search issues before submitting
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)
Minimal steps to reproduce
Run app with some pdf data that will be splited in pages
After indexing pages to Azure Search, find section that include text from the ending of x page and beginning of x+1 page
Ask chatbot about info related to that section from page x+1
Chat will respond correctly but in citation will print pdf from page x, but our info is in page x+1
Expected/desired behavior
Dividing pdf (our data) into sections to be indexed in Azure Search in prepdocs.py file should consider end of page. Information from page x and x+1 should be in separate sections.
Mention any other details that might be useful
Function find_page in file prepdocs.py is looking for page when is the beginng of the section, but do not consider that section can end in next page.
Metadata
Metadata
Assignees
Labels
No labels