Commit 087adb2
authored
feat(docx): differentiate no-file from not-ZIP (#3306)
**Summary**
The `python-docx` error `docx.opc.exceptions.PackageNotFoundError`
arises both when no file exists at the given path and when the file
exists but is not a ZIP archive (and so is not a DOCX file).
This ambiguity is unwelcome when diagnosing the error as the two
possible conditions generally indicate a different course of action to
resolve the error.
Add detailed validation to `DocxPartitionerOptions` to distinguish these
two and provide more precise exception messages.
**Additional Context**
- `python-pptx` shares the same OPC-Package (file) loading code used by
`python-docx`, so the same ambiguity will be present in `python-pptx`.
- It would be preferable for this distinguished exception behavior to be
upstream in `python-docx` and `python-pptx`. If we're willing to take
the version bump it might be worth considering doing that instead.1 parent 54ec311 commit 087adb2
File tree
4 files changed
+66
-16
lines changed- test_unstructured/partition
- unstructured
- partition
4 files changed
+66
-16
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
| 1 | + | |
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
5 | | - | |
6 | | - | |
7 | | - | |
8 | | - | |
| 5 | + | |
| 6 | + | |
9 | 7 | | |
10 | 8 | | |
11 | 9 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
770 | 770 | | |
771 | 771 | | |
772 | 772 | | |
| 773 | + | |
| 774 | + | |
| 775 | + | |
| 776 | + | |
| 777 | + | |
| 778 | + | |
| 779 | + | |
| 780 | + | |
| 781 | + | |
| 782 | + | |
| 783 | + | |
| 784 | + | |
| 785 | + | |
773 | 786 | | |
774 | 787 | | |
775 | 788 | | |
| |||
1024 | 1037 | | |
1025 | 1038 | | |
1026 | 1039 | | |
1027 | | - | |
| 1040 | + | |
| 1041 | + | |
| 1042 | + | |
| 1043 | + | |
| 1044 | + | |
| 1045 | + | |
| 1046 | + | |
| 1047 | + | |
1028 | 1048 | | |
1029 | 1049 | | |
1030 | | - | |
| 1050 | + | |
| 1051 | + | |
| 1052 | + | |
1031 | 1053 | | |
1032 | | - | |
1033 | | - | |
| 1054 | + | |
| 1055 | + | |
| 1056 | + | |
| 1057 | + | |
| 1058 | + | |
| 1059 | + | |
| 1060 | + | |
| 1061 | + | |
| 1062 | + | |
| 1063 | + | |
| 1064 | + | |
1034 | 1065 | | |
1035 | 1066 | | |
1036 | 1067 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
| 1 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
| 8 | + | |
8 | 9 | | |
| 10 | + | |
9 | 11 | | |
10 | 12 | | |
11 | 13 | | |
| |||
155 | 157 | | |
156 | 158 | | |
157 | 159 | | |
158 | | - | |
| 160 | + | |
159 | 161 | | |
160 | 162 | | |
161 | 163 | | |
| |||
214 | 216 | | |
215 | 217 | | |
216 | 218 | | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
217 | 224 | | |
218 | 225 | | |
219 | 226 | | |
| |||
358 | 365 | | |
359 | 366 | | |
360 | 367 | | |
361 | | - | |
362 | | - | |
| 368 | + | |
| 369 | + | |
363 | 370 | | |
364 | | - | |
365 | | - | |
366 | | - | |
| 371 | + | |
| 372 | + | |
| 373 | + | |
| 374 | + | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
| 384 | + | |
| 385 | + | |
| 386 | + | |
| 387 | + | |
367 | 388 | | |
368 | 389 | | |
369 | 390 | | |
| |||
0 commit comments