You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: preserve text and image order in content parts
Refactor content part generation to maintain page order for better AI consumption. Images now appear in proper sequence with their associated text.
Content Part Structure:
1. First part: JSON summary with results (backward compatible)
- Includes image_info metadata (page, index, width, height, format)
- Excludes base64 data from JSON to keep it manageable
2. Subsequent parts: Images in page order
- For page_texts mode: Images grouped by page
- For full_text mode: All images sorted by page number
- Each image has proper mimeType for AI vision models
Benefits:
- ✅ AI can see images in context with text
- ✅ Page order preserved (Page 1 images, then Page 2 images, etc.)
- ✅ Backward compatible (first part still has results JSON)
- ✅ Separate image parts for multimodal AI processing
- ✅ Image metadata in JSON for reference without base64 bulk
Testing:
- Added 2 new image extraction tests (91 total tests)
- Test full_text mode with images
- Test page_texts mode with images preserving order
- Coverage: 99.04% statements, 92.3% branches, 100% functions
Documentation:
- Enhanced README with detailed image extraction guide
- Added image data format example
- Clarified supported formats (RGB, RGBA, Grayscale)
- Added important considerations for image extraction
All 91 tests passing. Ready for production use with AI vision models.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <[email protected]>
Copy file name to clipboardExpand all lines: CHANGELOG.md
+46Lines changed: 46 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,6 +2,52 @@
2
2
3
3
All notable changes to this project will be documented in this file. See [standard-version](https://github.com/conventional-changelog/standard-version) for commit guidelines.
***Image Extraction**: Extract embedded images from PDF pages as base64-encoded data ([bd637f3](https://github.com/sylphxltd/pdf-reader-mcp/commit/bd637f3))
10
+
- Support for RGB, RGBA, and Grayscale formats
11
+
- Works with JPEG, PNG, and other embedded image types
12
+
- Includes image metadata (width, height, format, page number)
0 commit comments