Commit 0b73978
The Issue:
When extracting images from pdfs, we use the metadata page number to
index into a list of the images. However, the metadata page number can
now be changed via `starting_page_number`. To get the true page index,
we need to subtract this value.
Testing:
Run this snippet in a python shell. Before the fix, this throws an
IndexError. On this branch, it will return the elements.
```
from unstructured.partition.auto import partition
filename = "example-docs/layout-parser-paper-with-table.pdf"
partition(filename, strategy="hi_res", extract_image_block_types=["Image", "Table"], starting_page_number=20)
```
---------
Co-authored-by: Matt Robinson <[email protected]>
Co-authored-by: christinestraub <[email protected]>
1 parent c3af03d commit 0b73978
File tree
6 files changed
+21
-6
lines changed- test_unstructured/partition/pdf_image
- unstructured
- partition
- pdf_image
6 files changed
+21
-6
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
| 1 | + | |
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
| |||
12 | 12 | | |
13 | 13 | | |
14 | 14 | | |
| 15 | + | |
15 | 16 | | |
16 | 17 | | |
17 | 18 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1223 | 1223 | | |
1224 | 1224 | | |
1225 | 1225 | | |
| 1226 | + | |
| 1227 | + | |
1226 | 1228 | | |
1227 | 1229 | | |
1228 | 1230 | | |
| |||
1231 | 1233 | | |
1232 | 1234 | | |
1233 | 1235 | | |
| 1236 | + | |
| 1237 | + | |
1234 | 1238 | | |
1235 | 1239 | | |
1236 | 1240 | | |
1237 | 1241 | | |
1238 | | - | |
| 1242 | + | |
1239 | 1243 | | |
1240 | 1244 | | |
1241 | 1245 | | |
| |||
Lines changed: 2 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
117 | 117 | | |
118 | 118 | | |
119 | 119 | | |
| 120 | + | |
120 | 121 | | |
121 | 122 | | |
122 | 123 | | |
| |||
157 | 158 | | |
158 | 159 | | |
159 | 160 | | |
| 161 | + | |
160 | 162 | | |
161 | 163 | | |
162 | 164 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
| 1 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
660 | 660 | | |
661 | 661 | | |
662 | 662 | | |
| 663 | + | |
663 | 664 | | |
664 | 665 | | |
665 | 666 | | |
| |||
675 | 676 | | |
676 | 677 | | |
677 | 678 | | |
| 679 | + | |
678 | 680 | | |
679 | 681 | | |
680 | 682 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
121 | 121 | | |
122 | 122 | | |
123 | 123 | | |
| 124 | + | |
124 | 125 | | |
125 | 126 | | |
126 | 127 | | |
| |||
183 | 184 | | |
184 | 185 | | |
185 | 186 | | |
186 | | - | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
187 | 193 | | |
188 | 194 | | |
189 | 195 | | |
190 | 196 | | |
191 | 197 | | |
192 | 198 | | |
193 | | - | |
| 199 | + | |
194 | 200 | | |
195 | | - | |
| 201 | + | |
196 | 202 | | |
197 | 203 | | |
198 | 204 | | |
| |||
0 commit comments