Skip to content

Conversation

@jiafu1115
Copy link
Contributor

@jiafu1115 jiafu1115 commented Oct 14, 2025

Hi Team:
The PDF parsing logic have one issue that it doesn’t always respect the pagesPerDocument setting. It couldn't pass the newly added unit test:
https://github.com/spring-projects/spring-ai/pull/4627/files#diff-14539564bf2af8df87bbbc6cf120abd9e706d120ec581d95af53e502e1a9ed64R76

Error: org.springframework.ai.reader.pdf.PagePdfDocumentReaderTests.testPagesPerDocument -- Time elapsed: 1.222 s <<< FAILURE!
java.lang.AssertionError:
Expected size: 2 but was: 3

So I refactor the code to clean it up and fix this issue at the same time.

Thanks for review!

@jiafu1115 jiafu1115 changed the title Fix PDF grouping logic to respect pagesPerDocument Fix PDF document reader's grouping logic to respect pagesPerDocument Oct 14, 2025
@ericbottard ericbottard self-requested a review October 14, 2025 13:46
@ericbottard ericbottard self-assigned this Oct 14, 2025
@ericbottard
Copy link
Member

Merged on main as aeb9f8a and marked for backport to 1.0.x

@jiafu1115
Copy link
Contributor Author

thanks @ericbottard

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants