Commit 1dede50
authored
fix: parsing pdf error - new_cells as str has no "copy" (#3130)
Closes #3119.
### Testing
Parsing the provided PDF should be successful.
[testing_brochure_2.pdf](https://github.com/user-attachments/files/15518094/testing_brochure_2.pdf)
```
filename = "testing_brochure_2.pdf"
with open(filename, "rb") as pdf_content:
elements = partition_pdf(
file=pdf_content,
infer_table_structure=True,
extract_image_block_types=["Image", "Table"],
chunking_strategy="by_title",
max_characters=1000,
new_after_n_chars=3000,
combine_text_under_n_chars=1000,
)
print("\n\n".join([str(el) for el in elements]))
```1 parent 1b43102 commit 1dede50
File tree
3 files changed
+5
-3
lines changed- unstructured
- partition/pdf_image
3 files changed
+5
-3
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
| 1 | + | |
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
| |||
12 | 12 | | |
13 | 13 | | |
14 | 14 | | |
| 15 | + | |
15 | 16 | | |
16 | 17 | | |
17 | 18 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
| 1 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
280 | 280 | | |
281 | 281 | | |
282 | 282 | | |
283 | | - | |
| 283 | + | |
| 284 | + | |
284 | 285 | | |
285 | 286 | | |
286 | 287 | | |
| |||
0 commit comments