Skip to content

Commit 1c519ef

Browse files
luke-kucingclaude
andauthored
Security Fixes - CVE Remediation (#4115)
Main Changes: 1. Removed Clarifai Dependency - Completely removed the clarifai dependency which is no longer used in the codebase - Removed clarifai from the unstructured-ingest extras list in requirements/ingest/ingest.txt:1 - Removed clarifai test script reference from test_unstructured_ingest/test-ingest-dest.sh:23 2. Updated Dependencies to Resolve CVEs - pypdf: Updated from 6.1.1 → 6.1.3 (fixes GHSA-vr63-x8vc-m265) - pip: Added explicit upgrade to >=25.3 in Dockerfile (fixes GHSA-4xh5-x5gv-qwph) - uv: Addressed GHSA-8qf3-x8v5-2pj8 and GHSA-pqhf-p39g-3x64 3. Dockerfile Security Enhancements (Dockerfile:17,28-29) - Added Alpine package upgrade for py3.12-pip - Added explicit pip upgrade step before installing Python dependencies 4. General Dependency Updates Ran pip-compile across all requirement files, resulting in updates to: - cryptography: 46.0.2 → 46.0.3 - psutil: 7.1.0 → 7.1.3 - rapidfuzz: 3.14.1 → 3.14.3 - regex: 2025.9.18 → 2025.11.3 - wrapt: 1.17.3 → 2.0.0 - Plus many other transitive dependencies across all extra requirement files 5. Version Bump - Updated version from 0.18.16 → 0.18.17 in unstructured/__version__.py:1 - Updated CHANGELOG.md with security fixes documentation Impact: This PR resolves 4 CVEs total without introducing breaking changes, making it a pure security maintenance release. --------- Co-authored-by: Claude <[email protected]>
1 parent c79cf3a commit 1c519ef

File tree

17 files changed

+168
-151
lines changed

17 files changed

+168
-151
lines changed

CHANGELOG.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,16 @@
1+
## 0.18.17
2+
3+
### Enhancement
4+
5+
### Features
6+
7+
### Fixes
8+
- Removed `Clardy` dependency as it is no longer used
9+
- Bumped dependencies via pip-compile to address the following CVEs:
10+
- **pypdf**: GHSA-vr63-x8vc-m265
11+
- **pip**: GHSA-4xh5-x5gv-qwph
12+
- **uv**: GHSA-8qf3-x8v5-2pj8 GHSA-pqhf-p39g-3x64
13+
114
## 0.18.16
215

316
### Enhancement

Dockerfile

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ COPY example-docs example-docs
1414

1515
RUN chown -R notebook-user:notebook-user /app && \
1616
apk add --no-cache font-ubuntu fontconfig git && \
17+
apk upgrade --no-cache py3.12-pip && \
1718
fc-cache -fv && \
1819
[ -e /usr/bin/python3 ] || ln -s /usr/bin/$PYTHON /usr/bin/python3
1920

@@ -24,6 +25,9 @@ ENV PATH="${PATH}:/home/notebook-user/.local/bin"
2425
ENV TESSDATA_PREFIX=/usr/local/share/tessdata
2526
ENV NLTK_DATA=/home/notebook-user/nltk_data
2627

28+
# Upgrade pip to fix CVE-2025-8869
29+
RUN $PIP install --no-cache-dir --user --upgrade "pip>=25.3"
30+
2731
# Install Python dependencies and download required NLTK packages
2832
RUN find requirements/ -type f -name "*.txt" ! -name "test.txt" ! -name "dev.txt" ! -name "constraints.txt" -exec $PIP install --no-cache-dir --user -r '{}' ';' && \
2933
mkdir -p ${NLTK_DATA} && \

requirements/base.txt

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ click==8.3.0
2727
# via
2828
# nltk
2929
# python-oxmsg
30-
cryptography==46.0.2
30+
cryptography==46.0.3
3131
# via unstructured-client
3232
dataclasses-json==0.6.7
3333
# via
@@ -85,11 +85,11 @@ packaging==25.0
8585
# via
8686
# marshmallow
8787
# unstructured-client
88-
psutil==7.1.0
88+
psutil==7.1.3
8989
# via -r ./base.in
9090
pycparser==2.23
9191
# via cffi
92-
pypdf==6.1.1
92+
pypdf==6.1.3
9393
# via unstructured-client
9494
python-dateutil==2.9.0.post0
9595
# via unstructured-client
@@ -99,9 +99,9 @@ python-magic==0.4.27
9999
# via -r ./base.in
100100
python-oxmsg==0.0.2
101101
# via -r ./base.in
102-
rapidfuzz==3.14.1
102+
rapidfuzz==3.14.3
103103
# via -r ./base.in
104-
regex==2025.9.18
104+
regex==2025.11.3
105105
# via nltk
106106
requests==2.32.5
107107
# via
@@ -141,14 +141,14 @@ typing-inspect==0.9.0
141141
# unstructured-client
142142
unstructured-client==0.25.9
143143
# via
144-
# -c ./deps/constraints.txt
144+
# -c /Users/luke/git/unstructured/requirements/deps/constraints.txt
145145
# -r ./base.in
146146
urllib3==2.5.0
147147
# via
148-
# -c ./deps/constraints.txt
148+
# -c /Users/luke/git/unstructured/requirements/deps/constraints.txt
149149
# requests
150150
# unstructured-client
151151
webencodings==0.5.1
152152
# via html5lib
153-
wrapt==1.17.3
153+
wrapt==2.0.0
154154
# via -r ./base.in

requirements/dev.txt

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,8 @@ cfgv==3.4.0
1010
# via pre-commit
1111
click==8.3.0
1212
# via
13-
# -c ./base.txt
14-
# -c ./test.txt
13+
# -c /Users/luke/git/unstructured/requirements/base.txt
14+
# -c /Users/luke/git/unstructured/requirements/test.txt
1515
# pip-tools
1616
distlib==0.4.0
1717
# via virtualenv
@@ -23,14 +23,14 @@ nodeenv==1.9.1
2323
# via pre-commit
2424
packaging==25.0
2525
# via
26-
# -c ./base.txt
27-
# -c ./test.txt
26+
# -c /Users/luke/git/unstructured/requirements/base.txt
27+
# -c /Users/luke/git/unstructured/requirements/test.txt
2828
# build
2929
pip-tools==7.5.1
3030
# via -r ./dev.in
3131
platformdirs==4.5.0
3232
# via
33-
# -c ./test.txt
33+
# -c /Users/luke/git/unstructured/requirements/test.txt
3434
# virtualenv
3535
pre-commit==4.3.0
3636
# via -r ./dev.in
@@ -42,15 +42,15 @@ pyyaml==6.0.3
4242
# via pre-commit
4343
tomli==2.3.0
4444
# via
45-
# -c ./test.txt
45+
# -c /Users/luke/git/unstructured/requirements/test.txt
4646
# build
4747
# pip-tools
4848
typing-extensions==4.15.0
4949
# via
50-
# -c ./base.txt
51-
# -c ./test.txt
50+
# -c /Users/luke/git/unstructured/requirements/base.txt
51+
# -c /Users/luke/git/unstructured/requirements/test.txt
5252
# virtualenv
53-
virtualenv==20.35.3
53+
virtualenv==20.35.4
5454
# via pre-commit
5555
wheel==0.45.1
5656
# via pip-tools

requirements/extra-csv.txt

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,19 +6,19 @@
66
#
77
numpy==2.2.6
88
# via
9-
# -c ./base.txt
9+
# -c /Users/luke/git/unstructured/requirements/base.txt
1010
# pandas
1111
pandas==2.3.3
1212
# via -r ./extra-csv.in
1313
python-dateutil==2.9.0.post0
1414
# via
15-
# -c ./base.txt
15+
# -c /Users/luke/git/unstructured/requirements/base.txt
1616
# pandas
1717
pytz==2025.2
1818
# via pandas
1919
six==1.17.0
2020
# via
21-
# -c ./base.txt
21+
# -c /Users/luke/git/unstructured/requirements/base.txt
2222
# python-dateutil
2323
tzdata==2025.2
2424
# via pandas

requirements/extra-docx.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,11 @@
66
#
77
lxml==6.0.2
88
# via
9-
# -c ./base.txt
9+
# -c /Users/luke/git/unstructured/requirements/base.txt
1010
# python-docx
1111
python-docx==1.2.0
1212
# via -r ./extra-docx.in
1313
typing-extensions==4.15.0
1414
# via
15-
# -c ./base.txt
15+
# -c /Users/luke/git/unstructured/requirements/base.txt
1616
# python-docx

requirements/extra-markdown.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,5 +4,5 @@
44
#
55
# pip-compile ./extra-markdown.in
66
#
7-
markdown==3.9
7+
markdown==3.10
88
# via -r ./extra-markdown.in

requirements/extra-odt.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,13 +6,13 @@
66
#
77
lxml==6.0.2
88
# via
9-
# -c ./base.txt
9+
# -c /Users/luke/git/unstructured/requirements/base.txt
1010
# python-docx
1111
pypandoc==1.15
1212
# via -r ./extra-odt.in
1313
python-docx==1.2.0
1414
# via -r ./extra-odt.in
1515
typing-extensions==4.15.0
1616
# via
17-
# -c ./base.txt
17+
# -c /Users/luke/git/unstructured/requirements/base.txt
1818
# python-docx

requirements/extra-paddleocr.txt

Lines changed: 31 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -14,65 +14,65 @@ annotated-types==0.7.0
1414
# via pydantic
1515
anyio==4.11.0
1616
# via
17-
# -c ./base.txt
17+
# -c /Users/luke/git/unstructured/requirements/base.txt
1818
# httpx
1919
beautifulsoup4==4.14.2
2020
# via
21-
# -c ./base.txt
21+
# -c /Users/luke/git/unstructured/requirements/base.txt
2222
# unstructured-paddleocr
2323
certifi==2025.10.5
2424
# via
25-
# -c ./base.txt
25+
# -c /Users/luke/git/unstructured/requirements/base.txt
2626
# httpcore
2727
# httpx
2828
# requests
2929
charset-normalizer==3.4.4
3030
# via
31-
# -c ./base.txt
31+
# -c /Users/luke/git/unstructured/requirements/base.txt
3232
# requests
33-
cython==3.1.4
33+
cython==3.2.0
3434
# via unstructured-paddleocr
3535
exceptiongroup==1.3.0
3636
# via
37-
# -c ./base.txt
37+
# -c /Users/luke/git/unstructured/requirements/base.txt
3838
# anyio
3939
fire==0.7.1
4040
# via unstructured-paddleocr
4141
fonttools==4.60.1
4242
# via unstructured-paddleocr
4343
h11==0.16.0
4444
# via
45-
# -c ./base.txt
45+
# -c /Users/luke/git/unstructured/requirements/base.txt
4646
# httpcore
4747
httpcore==1.0.9
4848
# via
49-
# -c ./base.txt
49+
# -c /Users/luke/git/unstructured/requirements/base.txt
5050
# httpx
5151
httpx==0.28.1
5252
# via
53-
# -c ./base.txt
53+
# -c /Users/luke/git/unstructured/requirements/base.txt
5454
# paddlepaddle
5555
idna==3.11
5656
# via
57-
# -c ./base.txt
57+
# -c /Users/luke/git/unstructured/requirements/base.txt
5858
# anyio
5959
# httpx
6060
# requests
61-
imageio==2.37.0
61+
imageio==2.37.2
6262
# via scikit-image
6363
lazy-loader==0.4
6464
# via scikit-image
6565
lxml==6.0.2
6666
# via
67-
# -c ./base.txt
67+
# -c /Users/luke/git/unstructured/requirements/base.txt
6868
# python-docx
6969
networkx==3.4.2
7070
# via
7171
# paddlepaddle
7272
# scikit-image
7373
numpy==2.2.6
7474
# via
75-
# -c ./base.txt
75+
# -c /Users/luke/git/unstructured/requirements/base.txt
7676
# albucore
7777
# albumentations
7878
# imageio
@@ -98,40 +98,40 @@ opt-einsum==3.3.0
9898
# via paddlepaddle
9999
packaging==25.0
100100
# via
101-
# -c ./base.txt
101+
# -c /Users/luke/git/unstructured/requirements/base.txt
102102
# lazy-loader
103103
# scikit-image
104-
paddlepaddle==3.2.0
104+
paddlepaddle==3.2.1
105105
# via -r ./extra-paddleocr.in
106-
pillow==11.3.0
106+
pillow==12.0.0
107107
# via
108108
# imageio
109109
# paddlepaddle
110110
# scikit-image
111111
# unstructured-paddleocr
112-
protobuf==6.32.1
112+
protobuf==6.33.0
113113
# via
114-
# -c ./deps/constraints.txt
114+
# -c /Users/luke/git/unstructured/requirements/deps/constraints.txt
115115
# paddlepaddle
116116
pyclipper==1.3.0.post6
117117
# via unstructured-paddleocr
118-
pydantic==2.12.2
118+
pydantic==2.12.4
119119
# via albumentations
120-
pydantic-core==2.41.4
120+
pydantic-core==2.41.5
121121
# via pydantic
122122
python-docx==1.2.0
123123
# via unstructured-paddleocr
124124
pyyaml==6.0.3
125125
# via
126126
# albumentations
127127
# unstructured-paddleocr
128-
rapidfuzz==3.14.1
128+
rapidfuzz==3.14.3
129129
# via
130-
# -c ./base.txt
130+
# -c /Users/luke/git/unstructured/requirements/base.txt
131131
# unstructured-paddleocr
132132
requests==2.32.5
133133
# via
134-
# -c ./base.txt
134+
# -c /Users/luke/git/unstructured/requirements/base.txt
135135
# unstructured-paddleocr
136136
safetensors==0.6.2
137137
# via paddlepaddle
@@ -147,25 +147,25 @@ simsimd==6.5.3
147147
# via albucore
148148
sniffio==1.3.1
149149
# via
150-
# -c ./base.txt
150+
# -c /Users/luke/git/unstructured/requirements/base.txt
151151
# anyio
152152
soupsieve==2.8
153153
# via
154-
# -c ./base.txt
154+
# -c /Users/luke/git/unstructured/requirements/base.txt
155155
# beautifulsoup4
156-
stringzilla==4.2.1
156+
stringzilla==4.2.3
157157
# via albucore
158-
termcolor==3.1.0
158+
termcolor==3.2.0
159159
# via fire
160160
tifffile==2025.5.10
161161
# via scikit-image
162162
tqdm==4.67.1
163163
# via
164-
# -c ./base.txt
164+
# -c /Users/luke/git/unstructured/requirements/base.txt
165165
# unstructured-paddleocr
166166
typing-extensions==4.15.0
167167
# via
168-
# -c ./base.txt
168+
# -c /Users/luke/git/unstructured/requirements/base.txt
169169
# anyio
170170
# beautifulsoup4
171171
# exceptiongroup
@@ -180,6 +180,6 @@ unstructured-paddleocr==2.10.0
180180
# via -r ./extra-paddleocr.in
181181
urllib3==2.5.0
182182
# via
183-
# -c ./base.txt
184-
# -c ./deps/constraints.txt
183+
# -c /Users/luke/git/unstructured/requirements/base.txt
184+
# -c /Users/luke/git/unstructured/requirements/deps/constraints.txt
185185
# requests

0 commit comments

Comments
 (0)