Skip to content

Conversation

@pprados
Copy link
Contributor

@pprados pprados commented Feb 10, 2025

Modify PDFPlumber to:

  • separate into loader/parser components
  • identify tables in PDFs
  • process images in PDFs like other parsers
  • support multi-threading

@vercel
Copy link

vercel bot commented Feb 10, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
langchain ✅ Ready (Inspect) Visit Preview 💬 Add feedback Apr 29, 2025 9:26am

@pprados
Copy link
Contributor Author

pprados commented Feb 10, 2025

@eyurtsev the next.
I use a special approach to handle capitalized properties. I hope this works for you.
If not, we can “double” the keys to maintain compatibility.

@pprados
Copy link
Contributor Author

pprados commented Feb 12, 2025

@eyurtsev ping

@dosubot dosubot bot added the lgtm label Feb 16, 2025
@pprados
Copy link
Contributor Author

pprados commented Mar 26, 2025

@eyurtsev
To fix the Issue 30454 can you merge this PR?

Note: Current implemention of PDFPlumber has several limitations:

  • It does not provide a parser
  • Uses load()`` instead of lazy_load()`
  • Does not handle tables
  • Does not support image conversions.

…mber

# Conflicts:
#	docs/docs/integrations/document_loaders/pdfplumber.ipynb
#	docs/docs/integrations/document_loaders/pypdfium2.ipynb
#	docs/docs/integrations/document_loaders/pypdfloader.ipynb
@pprados pprados marked this pull request as draft April 3, 2025 16:24
pprados added 2 commits April 29, 2025 11:03
…mber

# Conflicts:
#	libs/community/langchain_community/document_loaders/parsers/pdf.py
#	libs/community/langchain_community/document_loaders/pdf.py
@pprados
Copy link
Contributor Author

pprados commented Apr 29, 2025

@ccurme or @baskaryan, can you review this PR?

Copy link
Collaborator

@ccurme ccurme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Closing as langchain-community has been moved to a standalone repo: https://github.com/langchain-ai/langchain-community

@ccurme ccurme closed this Apr 29, 2025
@github-project-automation github-project-automation bot moved this from In review to Closed in PR Reviews Apr 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

No open projects
Archived in project

Development

Successfully merging this pull request may close these issues.

3 participants