Skip to content

Commit 1f4edf3

Browse files
ppradoseyurtsev
authored andcommitted
community[minor]: 04 - Refactoring PDFMiner parser (#29526)
This is one part of a larger Pull Request (PR) that is too large to be submitted all at once. This specific part focuses on updating the XXX parser. For more details, see [PR 28970](langchain-ai/langchain#28970). --------- Co-authored-by: Eugene Yurtsev <[email protected]>
1 parent 5cf13a9 commit 1f4edf3

File tree

8 files changed

+2551
-765
lines changed

8 files changed

+2551
-765
lines changed

docs/docs/integrations/document_loaders/pdfminer.ipynb

Lines changed: 1975 additions & 55 deletions
Large diffs are not rendered by default.

docs/docs/integrations/document_loaders/pymupdf.ipynb

Lines changed: 164 additions & 548 deletions
Large diffs are not rendered by default.

libs/community/extended_testing_deps.txt

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@ openapi-pydantic>=0.3.2,<0.4
5959
oracle-ads>=2.9.1,<3
6060
oracledb>=2.2.0,<3
6161
pandas>=2.0.1,<3
62-
pdfminer-six>=20221105,<20240706
62+
pdfminer-six==20231228
6363
pdfplumber>=0.11
6464
pgvector>=0.1.6,<0.2
6565
playwright>=1.48.0,<2
@@ -104,3 +104,4 @@ mlflow[genai]>=2.14.0
104104
databricks-sdk>=0.30.0
105105
websocket>=0.2.1,<1
106106
writer-sdk>=1.2.0
107+
unstructured[pdf]>=0.15

0 commit comments

Comments
 (0)