-
Notifications
You must be signed in to change notification settings - Fork 71
Pprados/fix pdfminer dep #410
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Coniferish
merged 24 commits into
Unstructured-IO:main
from
pprados:pprados/fix_pdfminer_dep
Feb 20, 2025
Merged
Changes from 23 commits
Commits
Show all changes
24 commits
Select commit
Hold shift + click to select a range
ee62dc4
Add password with PDF files
pprados 454cb35
Merge branch 'main' into pprados/fix_password
pprados 16776f3
Merge branch 'main' into pprados/fix_password
pprados ed43d82
Add TU
pprados ce13fa1
merge with main
pprados fd35cb3
Merge branch 'main' into pprados/fix_password
Coniferish ecc95eb
Fix CHANGELOG.md
pprados feea1f8
Add password with PDF files
pprados 70be207
Add TU
pprados ee93ee2
merge with main
pprados 458f05b
Fix CHANGELOG.md
pprados c3ca1dd
Fix make check
pprados d276d6e
Merge remote-tracking branch 'origin/pprados/fix_password' into pprad…
pprados 71bb5f7
Merge branch 'main' into pprados/fix_password
pprados ee6e638
Merge branch 'main' into pprados/fix_password
Coniferish 307e901
Merge, make tidy and fix CHANGELOG.md
pprados cdd4fef
Fix __version__.py
pprados 30805b8
Fix mypy
pprados de80975
use unstructured-inference 0.8.7
pprados e4a08f9
Fix pdfminer-six dependencies
pprados 2e9a149
Merge branch 'main' of https://github.com/Unstructured-IO/unstructure…
pprados 75f74cc
Merge branch 'upstream/main' into pprados/fix_pdfminer_dep
pprados cff2c7b
Update CHANGELOG.md
Coniferish 64ecdc0
Update unstructured_inference/__version__.py
Coniferish File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,3 +1,7 @@ | ||
| ## 0.8.8-dev0 | ||
|
|
||
| * fix: pdfminer-six dependencies | ||
|
|
||
| ## 0.8.7 | ||
|
|
||
| * fix: add `password` for PDF | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -14,4 +14,4 @@ rapidfuzz | |
| pandas | ||
| scipy | ||
| pypdfium2 | ||
| pdfminer-six==20240706 | ||
| pdfminer-six>=20240706 | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1 +1 @@ | ||
| __version__ = "0.8.7" # pragma: no cover | ||
| __version__ = "0.8.8-dev1" # pragma: no cover | ||
Coniferish marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
Coniferish marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't this always resolve to the same package, currently the latest one? just remove the pin altogether?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but in future the following versions will be mandatory.
Langchain follows the evolution of versions for PDFMinerLoader and it will not be possible to combine it with unstructured.
The final objective of my series of Pull Request for LangChain is to be able to choose the parser for each case, with
PDFRouterLoader. This means being able to have several parsers at the same time. Freezing a version prevents this.no problem for you to do it yourself.
Take this opportunity to publish a new version, and adjust, in unstructured, extra-pdf-image.in, with the new version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just an fyi, @cragwolfe, but the initial pin for
pdfminer-sixwas added just a few weeks ago when removingpdfplumber(here) to maintain required packages used by scripts to pass CI. It sounds like it's a workaround we might want to fix. For easy reference, though, here's extra-pdf-image.in@pprados, I'm confused what you're saying needs to be adjusted in
extra-pdf-image.in. Thepdfminer.sixversion isn't pinned there, so it should be install the latest as @cragwolfe mentioned, right?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cragwolfe, if this PR looks good to you, can you approve my duplicate PR here? The CI failure in this one is due to a secret that's needed in CI itself and not just particular tests, so I'm unsure how to fix that for contributor PRs at the moment.