Skip to content
Merged
Show file tree
Hide file tree
Changes from 22 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
## 0.8.8-dev1

* fix: pdfminer-six dependencies

## 0.8.7

* fix: add `password` for PDF
Expand Down
2 changes: 1 addition & 1 deletion requirements/base.in
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,4 @@ rapidfuzz
pandas
scipy
pypdfium2
pdfminer-six==20240706
pdfminer-six>=20240706
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't this always resolve to the same package, currently the latest one? just remove the pin altogether?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but in future the following versions will be mandatory.

Langchain follows the evolution of versions for PDFMinerLoader and it will not be possible to combine it with unstructured.
The final objective of my series of Pull Request for LangChain is to be able to choose the parser for each case, with PDFRouterLoader. This means being able to have several parsers at the same time. Freezing a version prevents this.

no problem for you to do it yourself.
Take this opportunity to publish a new version, and adjust, in unstructured, extra-pdf-image.in, with the new version.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an fyi, @cragwolfe, but the initial pin for pdfminer-six was added just a few weeks ago when removing pdfplumber (here) to maintain required packages used by scripts to pass CI. It sounds like it's a workaround we might want to fix. For easy reference, though, here's extra-pdf-image.in

@pprados, I'm confused what you're saying needs to be adjusted in extra-pdf-image.in. The pdfminer.six version isn't pinned there, so it should be install the latest as @cragwolfe mentioned, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cragwolfe, if this PR looks good to you, can you approve my duplicate PR here? The CI failure in this one is due to a secret that's needed in CI itself and not just particular tests, so I'm unsure how to fix that for contributor PRs at the moment.

2 changes: 1 addition & 1 deletion unstructured_inference/__version__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = "0.8.7" # pragma: no cover
__version__ = "0.8.8-dev1" # pragma: no cover
Loading