Skip to content

Commit ab25fb9

Browse files
authored
remove layoutparser lib (#403)
this PR is to remove the layoutparser lib as we no longer rely on it anymore hence the README.md is also updated to drop the note on supporting layoutparser model zoo
1 parent 6895ddb commit ab25fb9

File tree

9 files changed

+130
-191
lines changed

9 files changed

+130
-191
lines changed

CHANGELOG.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,9 @@
1+
## 0.8.3
2+
3+
* fix: removed `layoutelement.from_lp_textblock()` and related tests as it's not used
4+
* fix: update requirements to drop `layoutparser` lib
5+
* fix: update `README.md` to remove layoutparser model zoo support note
6+
17
## 0.8.2
28

39
* fix: fix bug when an empty list is passed into `TextRegions.from_list` triggers `IndexError`

README.md

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -72,10 +72,6 @@ model = get_model("yolox")
7272
layout = DocumentLayout.from_file("sample-docs/layout-parser-paper.pdf", detection_model=model)
7373
```
7474

75-
### Using models from the layoutparser model zoo
76-
77-
The `UnstructuredDetectronModel` class in `unstructured_inference.modelts.detectron2` uses the `faster_rcnn_R_50_FPN_3x` model pretrained on DocLayNet, but by using different construction parameters, any model in the `layoutparser` [model zoo](https://layout-parser.readthedocs.io/en/latest/notes/modelzoo.html) can be used. `UnstructuredDetectronModel` is a light wrapper around the `layoutparser` `Detectron2LayoutModel` object, and accepts the same arguments. See [layoutparser documentation](https://layout-parser.readthedocs.io/en/latest/api_doc/models.html#layoutparser.models.Detectron2LayoutModel) for details.
78-
7975
### Using your own model
8076

8177
Any detection model can be used for in the `unstructured_inference` pipeline by wrapping the model in the `UnstructuredObjectDetectionModel` class. To integrate with the `DocumentLayout` class, a subclass of `UnstructuredObjectDetectionModel` must have a `predict` method that accepts a `PIL.Image.Image` and returns a list of `LayoutElement`s, and an `initialize` method, which loads the model and prepares it for inference.

requirements/base.in

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,4 @@
11
-c constraints.in
2-
layoutparser
32
python-multipart
43
huggingface-hub
54
numpy<2
@@ -12,3 +11,6 @@ timm
1211
# NOTE(alan): Pinned because this is when the most recent module we import appeared
1312
transformers>=4.25.1
1413
rapidfuzz
14+
pandas
15+
scipy
16+
pdfplumber

requirements/base.txt

Lines changed: 41 additions & 57 deletions
Original file line numberDiff line numberDiff line change
@@ -4,58 +4,54 @@
44
#
55
# pip-compile requirements/base.in
66
#
7-
certifi==2024.8.30
7+
certifi==2024.12.14
88
# via requests
99
cffi==1.17.1
1010
# via cryptography
11-
charset-normalizer==3.3.2
11+
charset-normalizer==3.4.1
1212
# via
1313
# pdfminer-six
1414
# requests
1515
coloredlogs==15.0.1
1616
# via onnxruntime
1717
contourpy==1.3.0
1818
# via matplotlib
19-
cryptography==43.0.1
19+
cryptography==44.0.0
2020
# via pdfminer-six
2121
cycler==0.12.1
2222
# via matplotlib
23-
filelock==3.16.0
23+
filelock==3.16.1
2424
# via
2525
# huggingface-hub
2626
# torch
2727
# transformers
28-
flatbuffers==24.3.25
28+
flatbuffers==24.12.23
2929
# via onnxruntime
30-
fonttools==4.53.1
30+
fonttools==4.55.3
3131
# via matplotlib
32-
fsspec==2024.9.0
32+
fsspec==2024.12.0
3333
# via
3434
# huggingface-hub
3535
# torch
36-
huggingface-hub==0.24.7
36+
huggingface-hub==0.27.1
3737
# via
3838
# -r requirements/base.in
3939
# timm
4040
# tokenizers
4141
# transformers
4242
humanfriendly==10.0
4343
# via coloredlogs
44-
idna==3.8
44+
idna==3.10
4545
# via requests
46-
importlib-resources==6.4.5
46+
importlib-resources==6.5.2
4747
# via matplotlib
48-
iopath==0.1.10
49-
# via layoutparser
50-
jinja2==3.1.4
48+
jinja2==3.1.5
5149
# via torch
5250
kiwisolver==1.4.7
5351
# via matplotlib
54-
layoutparser==0.3.4
55-
# via -r requirements/base.in
56-
markupsafe==2.1.5
52+
markupsafe==3.0.2
5753
# via jinja2
58-
matplotlib==3.9.2
54+
matplotlib==3.9.4
5955
# via -r requirements/base.in
6056
mpmath==1.3.0
6157
# via sympy
@@ -65,7 +61,6 @@ numpy==1.26.4
6561
# via
6662
# -r requirements/base.in
6763
# contourpy
68-
# layoutparser
6964
# matplotlib
7065
# onnx
7166
# onnxruntime
@@ -74,107 +69,96 @@ numpy==1.26.4
7469
# scipy
7570
# torchvision
7671
# transformers
77-
onnx==1.16.2
72+
onnx==1.17.0
7873
# via -r requirements/base.in
7974
onnxruntime==1.19.2
8075
# via -r requirements/base.in
81-
opencv-python==4.10.0.84
82-
# via
83-
# -r requirements/base.in
84-
# layoutparser
85-
packaging==24.1
76+
opencv-python==4.11.0.86
77+
# via -r requirements/base.in
78+
packaging==24.2
8679
# via
8780
# huggingface-hub
8881
# matplotlib
8982
# onnxruntime
9083
# transformers
91-
pandas==2.2.2
92-
# via layoutparser
93-
pdf2image==1.17.0
94-
# via layoutparser
84+
pandas==2.2.3
85+
# via -r requirements/base.in
9586
pdfminer-six==20231228
9687
# via pdfplumber
97-
pdfplumber==0.11.4
98-
# via layoutparser
99-
pillow==10.4.0
88+
pdfplumber==0.11.5
89+
# via -r requirements/base.in
90+
pillow==11.1.0
10091
# via
101-
# layoutparser
10292
# matplotlib
103-
# pdf2image
10493
# pdfplumber
10594
# torchvision
106-
portalocker==2.10.1
107-
# via iopath
108-
protobuf==5.28.1
95+
protobuf==5.29.3
10996
# via
11097
# onnx
11198
# onnxruntime
11299
pycparser==2.22
113100
# via cffi
114-
pyparsing==3.1.4
101+
pyparsing==3.2.1
115102
# via matplotlib
116-
pypdfium2==4.30.0
103+
pypdfium2==4.30.1
117104
# via pdfplumber
118105
python-dateutil==2.9.0.post0
119106
# via
120107
# matplotlib
121108
# pandas
122-
python-multipart==0.0.9
109+
python-multipart==0.0.20
123110
# via -r requirements/base.in
124111
pytz==2024.2
125112
# via pandas
126113
pyyaml==6.0.2
127114
# via
128115
# huggingface-hub
129-
# layoutparser
130116
# timm
131117
# transformers
132-
rapidfuzz==3.9.7
118+
rapidfuzz==3.11.0
133119
# via -r requirements/base.in
134-
regex==2024.9.11
120+
regex==2024.11.6
135121
# via transformers
136122
requests==2.32.3
137123
# via
138124
# huggingface-hub
139125
# transformers
140-
safetensors==0.4.5
126+
safetensors==0.5.2
141127
# via
142128
# timm
143129
# transformers
144130
scipy==1.13.1
145-
# via layoutparser
146-
six==1.16.0
131+
# via -r requirements/base.in
132+
six==1.17.0
147133
# via python-dateutil
148-
sympy==1.13.2
134+
sympy==1.13.1
149135
# via
150136
# onnxruntime
151137
# torch
152-
timm==1.0.9
138+
timm==1.0.13
153139
# via -r requirements/base.in
154-
tokenizers==0.19.1
140+
tokenizers==0.21.0
155141
# via transformers
156-
torch==2.4.1
142+
torch==2.5.1
157143
# via
158144
# -r requirements/base.in
159145
# timm
160146
# torchvision
161-
torchvision==0.19.1
147+
torchvision==0.20.1
162148
# via timm
163-
tqdm==4.66.5
149+
tqdm==4.67.1
164150
# via
165151
# huggingface-hub
166-
# iopath
167152
# transformers
168-
transformers==4.44.2
153+
transformers==4.48.0
169154
# via -r requirements/base.in
170155
typing-extensions==4.12.2
171156
# via
172157
# huggingface-hub
173-
# iopath
174158
# torch
175-
tzdata==2024.1
159+
tzdata==2024.2
176160
# via pandas
177-
urllib3==2.2.3
161+
urllib3==2.3.0
178162
# via requests
179-
zipp==3.20.2
163+
zipp==3.21.0
180164
# via importlib-resources

0 commit comments

Comments
 (0)