Skip to content

Commit 350bb1d

Browse files
quedrbiseck3cragwolfe
authored
enhancement: clean pdf elements (bump unstructured-inference) (#790)
More deterministic element ordering when using hi_res PDF parsing strategy (from unstructured-inference bump to 0.5.4) Make large model available (from unstructured-inference bump to 0.5.3) Combine inferred elements with extracted elements (from unstructured-inference bump to 0.5.2) --------- Co-authored-by: Roman Isecke <[email protected]> Co-authored-by: Crag Wolfe <[email protected]>
1 parent 642562b commit 350bb1d

26 files changed

+6475
-2832
lines changed

CHANGELOG.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,10 @@
1-
## 0.7.11-dev2
1+
## 0.7.11
22

33
### Enhancements
44

5+
* More deterministic element ordering when using `hi_res` PDF parsing strategy (from unstructured-inference bump to 0.5.4)
6+
* Make large model available (from unstructured-inference bump to 0.5.3)
7+
* Combine inferred elements with extracted elements (from unstructured-inference bump to 0.5.2)
58
* `partition_email` and `partition_msg` will now process attachments if `process_attachments=True`
69
and a attachment partitioning functions is passed through with `attachment_partitioner=partition`.
710

requirements/base.txt

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
#
77
anyio==3.7.0
88
# via httpcore
9-
argilla==1.10.0
9+
argilla==1.12.0
1010
# via -r requirements/base.in
1111
backoff==2.2.1
1212
# via argilla
@@ -53,7 +53,7 @@ idna==3.4
5353
# rfc3986
5454
importlib-metadata==6.7.0
5555
# via markdown
56-
joblib==1.2.0
56+
joblib==1.3.1
5757
# via nltk
5858
lxml==4.9.2
5959
# via
@@ -130,13 +130,12 @@ tqdm==4.65.0
130130
# via
131131
# argilla
132132
# nltk
133-
typer==0.9.0
133+
typer==0.7.0
134134
# via argilla
135-
typing-extensions==4.6.3
135+
typing-extensions==4.7.0
136136
# via
137137
# pydantic
138138
# rich
139-
# typer
140139
urllib3==1.26.16
141140
# via
142141
# -c requirements/constraints.in

requirements/dev.txt

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -82,7 +82,7 @@ importlib-metadata==6.7.0
8282
# nbconvert
8383
importlib-resources==5.12.0
8484
# via jsonschema
85-
ipykernel==6.23.2
85+
ipykernel==6.23.3
8686
# via
8787
# ipywidgets
8888
# jupyter
@@ -121,7 +121,7 @@ jsonschema[format-nongpl]==4.17.3
121121
# nbformat
122122
jupyter==1.0.0
123123
# via -r requirements/dev.in
124-
jupyter-client==8.2.0
124+
jupyter-client==8.3.0
125125
# via
126126
# ipykernel
127127
# jupyter-console
@@ -147,7 +147,7 @@ jupyter-core==5.3.1
147147
# qtconsole
148148
jupyter-events==0.6.3
149149
# via jupyter-server
150-
jupyter-server==2.6.0
150+
jupyter-server==2.7.0
151151
# via
152152
# nbclassic
153153
# notebook-shim
@@ -219,7 +219,7 @@ pip-tools==6.13.0
219219
# via -r requirements/dev.in
220220
pkgutil-resolve-name==1.3.10
221221
# via jsonschema
222-
platformdirs==3.6.0
222+
platformdirs==3.8.0
223223
# via
224224
# -c requirements/test.txt
225225
# jupyter-core
@@ -352,12 +352,12 @@ traitlets==5.9.0
352352
# nbformat
353353
# notebook
354354
# qtconsole
355-
typing-extensions==4.6.3
355+
typing-extensions==4.7.0
356356
# via
357357
# -c requirements/base.txt
358358
# -c requirements/test.txt
359359
# ipython
360-
uri-template==1.2.0
360+
uri-template==1.3.0
361361
# via jsonschema
362362
virtualenv==20.23.1
363363
# via pre-commit
@@ -369,7 +369,7 @@ webencodings==0.5.1
369369
# via
370370
# bleach
371371
# tinycss2
372-
websocket-client==1.6.0
372+
websocket-client==1.6.1
373373
# via jupyter-server
374374
wheel==0.40.0
375375
# via

requirements/huggingface.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ idna==3.4
3232
# requests
3333
jinja2==3.1.2
3434
# via torch
35-
joblib==1.2.0
35+
joblib==1.3.1
3636
# via
3737
# -c requirements/base.txt
3838
# sacremoses
@@ -92,7 +92,7 @@ tqdm==4.65.0
9292
# transformers
9393
transformers==4.30.2
9494
# via -r requirements/huggingface.in
95-
typing-extensions==4.6.3
95+
typing-extensions==4.7.0
9696
# via
9797
# -c requirements/base.txt
9898
# huggingface-hub

requirements/ingest-azure.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -93,7 +93,7 @@ six==1.16.0
9393
# azure-core
9494
# azure-identity
9595
# isodate
96-
typing-extensions==4.6.3
96+
typing-extensions==4.7.0
9797
# via
9898
# -c requirements/base.txt
9999
# azure-core

requirements/ingest-discord.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ charset-normalizer==3.1.0
1616
# via
1717
# -c requirements/base.txt
1818
# aiohttp
19-
discord-py==2.3.0
19+
discord-py==2.3.1
2020
# via -r requirements/ingest-discord.in
2121
frozenlist==1.3.3
2222
# via

requirements/ingest-gcs.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ google-api-core==2.11.1
4040
# via
4141
# google-cloud-core
4242
# google-cloud-storage
43-
google-auth==2.20.0
43+
google-auth==2.21.0
4444
# via
4545
# gcsfs
4646
# google-api-core
@@ -51,7 +51,7 @@ google-auth-oauthlib==1.0.0
5151
# via gcsfs
5252
google-cloud-core==2.3.2
5353
# via google-cloud-storage
54-
google-cloud-storage==2.9.0
54+
google-cloud-storage==2.10.0
5555
# via gcsfs
5656
google-crc32c==1.5.0
5757
# via google-resumable-media

requirements/ingest-google-drive.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,9 +17,9 @@ charset-normalizer==3.1.0
1717
# requests
1818
google-api-core==2.11.1
1919
# via google-api-python-client
20-
google-api-python-client==2.90.0
20+
google-api-python-client==2.91.0
2121
# via -r requirements/ingest-google-drive.in
22-
google-auth==2.20.0
22+
google-auth==2.21.0
2323
# via
2424
# google-api-core
2525
# google-api-python-client

requirements/ingest-reddit.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,5 +33,5 @@ urllib3==1.26.16
3333
# -c requirements/base.txt
3434
# -c requirements/constraints.in
3535
# requests
36-
websocket-client==1.6.0
36+
websocket-client==1.6.1
3737
# via praw

requirements/ingest-s3.txt

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
#
55
# pip-compile requirements/ingest-s3.in
66
#
7-
aiobotocore==2.5.0
7+
aiobotocore==2.5.1
88
# via s3fs
99
aiohttp==3.8.4
1010
# via
@@ -18,7 +18,7 @@ async-timeout==4.0.2
1818
# via aiohttp
1919
attrs==23.1.0
2020
# via aiohttp
21-
botocore==1.29.76
21+
botocore==1.29.161
2222
# via aiobotocore
2323
charset-normalizer==3.1.0
2424
# via
@@ -52,7 +52,7 @@ six==1.16.0
5252
# via
5353
# -c requirements/base.txt
5454
# python-dateutil
55-
typing-extensions==4.6.3
55+
typing-extensions==4.7.0
5656
# via
5757
# -c requirements/base.txt
5858
# aioitertools

0 commit comments

Comments
 (0)