-
Notifications
You must be signed in to change notification settings - Fork 71
Pprados/fix pdfminer dep #410
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pprados/fix pdfminer dep #410
Conversation
…os/fix_password # Conflicts: # test_unstructured_inference/inference/test_layout.py # unstructured_inference/inference/layout.py
…d-inference into upstream/main
|
@Coniferish can you revue this tiny PR ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
@pprados, I may have to clone your branch and make the PR myself to get the |
| scipy | ||
| pypdfium2 | ||
| pdfminer-six==20240706 | ||
| pdfminer-six>=20240706 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't this always resolve to the same package, currently the latest one? just remove the pin altogether?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but in future the following versions will be mandatory.
Langchain follows the evolution of versions for PDFMinerLoader and it will not be possible to combine it with unstructured.
The final objective of my series of Pull Request for LangChain is to be able to choose the parser for each case, with PDFRouterLoader. This means being able to have several parsers at the same time. Freezing a version prevents this.
no problem for you to do it yourself.
Take this opportunity to publish a new version, and adjust, in unstructured, extra-pdf-image.in, with the new version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just an fyi, @cragwolfe, but the initial pin for pdfminer-six was added just a few weeks ago when removing pdfplumber (here) to maintain required packages used by scripts to pass CI. It sounds like it's a workaround we might want to fix. For easy reference, though, here's extra-pdf-image.in
@pprados, I'm confused what you're saying needs to be adjusted in extra-pdf-image.in. The pdfminer.six version isn't pinned there, so it should be install the latest as @cragwolfe mentioned, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cragwolfe, if this PR looks good to you, can you approve my duplicate PR here? The CI failure in this one is due to a secret that's needed in CI itself and not just particular tests, so I'm unsure how to fix that for contributor PRs at the moment.
46b67eb to
64ecdc0
Compare
|
Hi, this is bad idea (as opposed to your previous version pin which was a really bad idea). The minimum Is there any particular reason we can't set a lower bound on an older, less broken version of |
Duplicate of #410 because of CI issues with secrets from contributor-initiated PR --------- Co-authored-by: Philippe Prados <[email protected]> Co-authored-by: Yao You <[email protected]>
With langchain or other libraries, forcing a version of pdfminer.six makes it impossible to combine different modules.