Skip to content

Commit db29855

Browse files
committed
Refactor pdfminer
Fix PyMuPDFLoader
1 parent 0c782ee commit db29855

File tree

8 files changed

+2574
-782
lines changed

8 files changed

+2574
-782
lines changed

β€Ždocs/docs/integrations/document_loaders/pdfminer.ipynbβ€Ž

Lines changed: 1975 additions & 55 deletions
Large diffs are not rendered by default.

β€Ždocs/docs/integrations/document_loaders/pymupdf.ipynbβ€Ž

Lines changed: 164 additions & 548 deletions
Large diffs are not rendered by default.

β€Žlibs/community/extended_testing_deps.txtβ€Ž

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@ openapi-pydantic>=0.3.2,<0.4
5959
oracle-ads>=2.9.1,<3
6060
oracledb>=2.2.0,<3
6161
pandas>=2.0.1,<3
62-
pdfminer-six>=20221105,<20240706
62+
pdfminer-six==20231228
6363
pdfplumber>=0.11
6464
pgvector>=0.1.6,<0.2
6565
playwright>=1.48.0,<2
@@ -104,3 +104,4 @@ mlflow[genai]>=2.14.0
104104
databricks-sdk>=0.30.0
105105
websocket>=0.2.1,<1
106106
writer-sdk>=1.2.0
107+
unstructured[pdf]>=0.15

0 commit comments

Comments
Β (0)