Skip to content

Commit 4a96d54

Browse files
authored
chore: move logger error to debug when pdfminer extract fails (#3028)
### Summary We are seeing logger error `Invalid dictionary construct` for hosted APIs, move this logger error to debug level - we still continue partition when pdfminer text extraction fails as before (just don't throw the log error anymore) ### Test I was able to reproduce the logger error with an internal only file (please DM me if needed) and the error trace look like ``` File "/Users/yumingl/develops/unstructured/unstructured/partition/pdf.py", line 709, in _process_pdfminer_pages annotation_list = get_uris(page.annots, height, coordinate_system, page_number) File "/Users/yumingl/develops/unstructured/unstructured/partition/pdf.py", line 1049, in get_uris resolved_annots = annots.resolve() ... ``` we also won't be able to repair pdf structure on `get_uris` (not a page level) so move this exception to debug level.
1 parent 865ef49 commit 4a96d54

File tree

3 files changed

+5
-4
lines changed

3 files changed

+5
-4
lines changed

CHANGELOG.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,8 @@
1-
## 0.14.4-dev4
1+
## 0.14.4-dev5
22

33
### Enhancements
44

5+
* **Move logger error to debug level when PDFminer fails to extract text** which includes error message for Invalid dictionary construct.
56
* **Add support for Pinecone serverless** Adds Pinecone serverless to the connector tests. Pinecone
67
serverless will work version versions >=0.14.2, but hadn't been tested until now.
78

unstructured/__version__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
__version__ = "0.14.4-dev4" # pragma: no cover
1+
__version__ = "0.14.4-dev5" # pragma: no cover

unstructured/partition/pdf.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -266,8 +266,8 @@ def partition_pdf_or_image(
266266
for el in page_elements
267267
)
268268
except Exception as e:
269-
logger.error(e)
270-
logger.warning("PDF text extraction failed, skip text extraction...")
269+
logger.debug(e)
270+
logger.info("PDF text extraction failed, skip text extraction...")
271271

272272
strategy = determine_pdf_or_image_strategy(
273273
strategy,

0 commit comments

Comments
 (0)