-
Notifications
You must be signed in to change notification settings - Fork 5
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Name and Version
ai-dial-rag 0.38.0
What steps will reproduce the bug?
Dial RAG fails to extract text from certain PDF files using unstructured.
Unstructured partition_pdf_or_image gets an exception TypeError: unsupported format string passed to list.__format__ from pdfminer and returns empty content for the document.
The document still can be processed by the visual retrieval pipeline, but the text pipeline gets empty text.
Looks like the issue affects Dial RAG versions from 0.34.0, after pdfplumber and pdfminer.six update in this PR https://github.com/epam/ai-dial-rag/pull/30/changes
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working