Commit bd637f3
feat: add embedded image extraction from PDFs
Add ability to extract embedded images from PDF pages as base64-encoded data with metadata.
New Features:
- Extract images from PDF pages using PDF.js operator list API
- Support for multiple image formats (JPEG, PNG, grayscale, RGB, RGBA)
- Images returned as base64-encoded strings with metadata
- Parallel image processing within pages
- Optional via include_images parameter (default: false)
Implementation:
- NEW extractImages() function in pdf/extractor.ts
- NEW extractImagesFromPage() helper for single page extraction
- Uses page.getOperatorList() to find paintImageXObject operations
- Callback-based page.objs.get() for async image resolution
- Proper error handling for missing or invalid images
Schema Changes:
- Add include_images: boolean parameter to readPdfArgsSchema
- Default false to preserve backward compatibility
- Add ExtractedImage interface with page, index, width, height, format, data
Testing:
- 9 new tests for image extraction (90 total tests, +12.5%)
- Test coverage for all image extraction paths
- Mock OPS constants in integration tests
- Edge cases: empty images, invalid data, errors
Coverage:
- Statements: 98.94% ✅
- Branches: 93.33% ✅
- Functions: 100% ✅
- All 90 tests passing
Documentation:
- Added Example 5: Extract images from PDF
- Updated feature list to include image extraction
- Updated roadmap to mark image extraction as completed
- Added notes about image format support and response size
Usage Example:
{
"sources": [{ "path": "presentation.pdf", "pages": [1, 2] }],
"include_images": true,
"include_full_text": true
}
Returns:
- Text content from pages
- Embedded images as base64 with metadata (width, height, format)
- Each image tagged with page number and index
Image Format Support:
- ✅ JPEG images (best support)
- ✅ PNG images
- ✅ Grayscale, RGB, RGBA formats
- 1 parent e5f85e1 commit bd637f3
File tree
10 files changed
+562
-12
lines changed- dist
- handlers
- pdf
- schemas
- src
- handlers
- pdf
- schemas
- types
- test
- handlers
- pdf
10 files changed
+562
-12
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
| 19 | + | |
19 | 20 | | |
20 | 21 | | |
21 | 22 | | |
22 | 23 | | |
23 | | - | |
| 24 | + | |
24 | 25 | | |
25 | 26 | | |
26 | 27 | | |
| |||
31 | 32 | | |
32 | 33 | | |
33 | 34 | | |
34 | | - | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
35 | 38 | | |
36 | 39 | | |
37 | 40 | | |
| |||
134 | 137 | | |
135 | 138 | | |
136 | 139 | | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
137 | 162 | | |
138 | 163 | | |
139 | 164 | | |
| |||
330 | 355 | | |
331 | 356 | | |
332 | 357 | | |
333 | | - | |
| 358 | + | |
| 359 | + | |
334 | 360 | | |
335 | 361 | | |
336 | 362 | | |
337 | 363 | | |
338 | | - | |
| 364 | + | |
339 | 365 | | |
340 | 366 | | |
341 | 367 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | 3 | | |
4 | | - | |
| 4 | + | |
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
| |||
40 | 40 | | |
41 | 41 | | |
42 | 42 | | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
43 | 50 | | |
44 | 51 | | |
45 | 52 | | |
| |||
74 | 81 | | |
75 | 82 | | |
76 | 83 | | |
77 | | - | |
| 84 | + | |
78 | 85 | | |
79 | 86 | | |
80 | 87 | | |
81 | 88 | | |
82 | 89 | | |
| 90 | + | |
83 | 91 | | |
84 | 92 | | |
85 | 93 | | |
| |||
93 | 101 | | |
94 | 102 | | |
95 | 103 | | |
96 | | - | |
| 104 | + | |
97 | 105 | | |
98 | 106 | | |
99 | 107 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
| 2 | + | |
2 | 3 | | |
3 | 4 | | |
4 | 5 | | |
| |||
62 | 63 | | |
63 | 64 | | |
64 | 65 | | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
65 | 143 | | |
66 | 144 | | |
67 | 145 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
46 | 46 | | |
47 | 47 | | |
48 | 48 | | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
49 | 54 | | |
50 | 55 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
5 | | - | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
6 | 11 | | |
7 | 12 | | |
8 | 13 | | |
| |||
15 | 20 | | |
16 | 21 | | |
17 | 22 | | |
18 | | - | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
19 | 29 | | |
20 | 30 | | |
21 | 31 | | |
| |||
68 | 78 | | |
69 | 79 | | |
70 | 80 | | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
71 | 89 | | |
72 | 90 | | |
73 | 91 | | |
| |||
110 | 128 | | |
111 | 129 | | |
112 | 130 | | |
113 | | - | |
| 131 | + | |
| 132 | + | |
114 | 133 | | |
115 | 134 | | |
116 | 135 | | |
| |||
119 | 138 | | |
120 | 139 | | |
121 | 140 | | |
| 141 | + | |
122 | 142 | | |
123 | 143 | | |
124 | 144 | | |
| |||
137 | 157 | | |
138 | 158 | | |
139 | 159 | | |
140 | | - | |
| 160 | + | |
141 | 161 | | |
142 | 162 | | |
143 | 163 | | |
0 commit comments