Replies: 1 comment 1 reply
-
Hi @AaaBin! I'm Dosu and I’m helping the langchain team. Image extraction with PyPDFLoader can silently fail if certain dependencies are missing or if the parser configuration isn't compatible with your PDF. Here are some steps to help troubleshoot:
Let me know what you find after trying these steps! To reply, just mention @dosu. How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Checked other resources
Commit to Help
Example Code
Description
I'm currently working with the PyPDFLoader, following the demo notebook located at
docs/docs/integrations/document_loaders/pypdfloader.ipynb
.What I'm doing:
I am running the provided Jupyter notebook. The notebook demonstrates loading a PDF document and extracting its contents, including images, using PyPDFLoader.
What I expect to happen:
Based on the output shown in the demo notebook, I expect the PyPDFLoader to successfully extract and make available the images embedded within the PDF document.
What is currently happening:
When I execute the notebook in my Colab environment, the PDF content is loaded without any errors. However, I am not observing any images being extracted or returned. There are no error messages or warnings indicating a failure in image extraction. This results in a discrepancy between my output and the expected output shown in the pypdfloader.ipynb notebook.
System Info
System Information
Package Information
Optional packages not installed
Other Dependencies
Beta Was this translation helpful? Give feedback.
All reactions