Skip to content

Commit 6bc9140

Browse files
PeterStaar-IBMcau-gitnikos-livathinosmaxmnemonicMaksym Lysak
authored
Add omnidocbench, many optimizations (#4)
* adding the omnidocbench benchmarkl Signed-off-by: Peter Staar <taa@zurich.ibm.com> * added the table-parsing in omnidocbench Signed-off-by: Peter Staar <taa@zurich.ibm.com> * finished the OmniDocBench implementation Signed-off-by: Peter Staar <taa@zurich.ibm.com> * reformatted the code Signed-off-by: Peter Staar <taa@zurich.ibm.com> * updated the README Signed-off-by: Peter Staar <taa@zurich.ibm.com> * updated the README and the cli Signed-off-by: Peter Staar <taa@zurich.ibm.com> * clean up the DP-Bench example Signed-off-by: Peter Staar <taa@zurich.ibm.com> * made the DPBench and OmniDocBench follow the same example code Signed-off-by: Peter Staar <taa@zurich.ibm.com> * reformatted the code Signed-off-by: Peter Staar <taa@zurich.ibm.com> * cleaned up the dp-bench create script Signed-off-by: Peter Staar <taa@zurich.ibm.com> * added the ability to see the clusters and reading order for layout Signed-off-by: Peter Staar <taa@zurich.ibm.com> * reformatted the code Signed-off-by: Peter Staar <taa@zurich.ibm.com> * working on making datasets from pdf collections Signed-off-by: Peter Staar <taa@zurich.ibm.com> * added the package_pdfs example Signed-off-by: Peter Staar <taa@zurich.ibm.com> * added the FinTabNet-OTSL benchmark Signed-off-by: Peter Staar <taa@zurich.ibm.com> * reformatted the code Signed-off-by: Peter Staar <taa@zurich.ibm.com> * added the fintabnet example evaluation Signed-off-by: Peter Staar <taa@zurich.ibm.com> * updated the README fort FinTabNet Signed-off-by: Peter Staar <taa@zurich.ibm.com> * updated the README Signed-off-by: Peter Staar <taa@zurich.ibm.com> * refactored the table evaluations Signed-off-by: Peter Staar <taa@zurich.ibm.com> * added the text inclusion in the table prediction Signed-off-by: Peter Staar <taa@zurich.ibm.com> * fixed the header of the HTML Signed-off-by: Peter Staar <taa@zurich.ibm.com> * reformatted the code Signed-off-by: Peter Staar <taa@zurich.ibm.com> * fix: Formatting and unused code cleanup Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * feat: Extend the CLI to create the OMNIDOCBENCH datasets for the layout and tableformer modalities Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> * Added exit to benchmark end-to-end scripts in case git-lfs is not installed (#5) Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> Co-authored-by: Maksym Lysak <mly@zurich.ibm.com> * fix: Use TableStructureModel from docling, use backends, fix boundingbox coordinates Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Reinstate layout test on dpbench Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Comments Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Comments Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Remove unused code Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Remove more unused code Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Fixes for Omnidoc Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Fixes for layout eval bounding boxes Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * More fixes for OmniDoc, README updates Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * More fixes for OmniDoc, README updates Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Replace git-lsf with HF snapshot_download Signed-off-by: Christoph Auer <cau@zurich.ibm.com> --------- Signed-off-by: Peter Staar <taa@zurich.ibm.com> Signed-off-by: Christoph Auer <cau@zurich.ibm.com> Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com> Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> Co-authored-by: Christoph Auer <cau@zurich.ibm.com> Co-authored-by: Nikos Livathinos <nli@zurich.ibm.com> Co-authored-by: Maxim Lysak <101627549+maxmnemonic@users.noreply.github.com> Co-authored-by: Maksym Lysak <mly@zurich.ibm.com>
1 parent 80bf928 commit 6bc9140

28 files changed

+3199
-824
lines changed

README.md

Lines changed: 255 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -61,18 +61,19 @@ poetry run evaluate -t create -m layout -b DPBench -i <location-of-dpbench> -o .
6161
poetry run evaluate -t evaluate -m layout -b DPBench -i ./benchmarks/dpbench-layout -o ./benchmarks/dpbench-layout
6262
```
6363

64-
| id | label | MaP[0.5:0.95] |
65-
| -- | -------------- | ------------- |
66-
| 0 | page_header | 0.151 |
67-
| 1 | text | 0.678 |
68-
| 2 | section_header | 0.443 |
69-
| 3 | footnote | 0.221 |
70-
| 4 | picture | 0.761 |
71-
| 5 | caption | 0.458 |
72-
| 6 | page_footer | 0.344 |
73-
| 7 | document_index | 0.755 |
74-
| 8 | formula | 0.066 |
75-
| 9 | table | 0.891 |
64+
| label | Class mAP[0.5:0.95] |
65+
|----------------|-----------------------|
66+
| table | 89.08 |
67+
| picture | 76.1 |
68+
| document_index | 75.52 |
69+
| text | 67.8 |
70+
| caption | 45.8 |
71+
| section_header | 44.26 |
72+
| page_footer | 34.42 |
73+
| list_item | 29.04 |
74+
| footnote | 22.08 |
75+
| page_header | 15.11 |
76+
| formula | 6.62 |
7677
</details>
7778

7879
<details>
@@ -82,26 +83,265 @@ poetry run evaluate -t evaluate -m layout -b DPBench -i ./benchmarks/dpbench-lay
8283
👉 Create the dataset,
8384

8485
```sh
85-
poetry run evaluate -t create -m tableformer -b DPBench -i <location-of-dpbench> -o ./benchmarks/dpbench-tableformer
86+
poetry run evaluate -t create -m tableformer -b DPBench -i ./benchmarks/dpbench-original -o ./benchmarks/dpbench-dataset/tableformer
8687
```
8788

8889
👉 Evaluate the dataset,
8990

9091
```sh
91-
poetry run evaluate -t evaluate -m tableformer -b DPBench -i ./benchmarks/dpbench-tableformer -o ./benchmarks/dpbench-tableformer
92+
poetry run evaluate -t evaluate -m tableformer -b DPBench -i ./benchmarks/dpbench-dataset/tableformer -o ./benchmarks/dpbench-dataset/tableformer
9293
```
9394

9495
👉 Visualise the dataset,
9596

9697
```sh
97-
poetry run evaluate -t visualize -m tableformer -b DPBench -i ./benchmarks/dpbench-tableformer -o ./benchmarks/dpbench-tableformer
98+
poetry run evaluate -t visualize -m tableformer -b DPBench -i ./benchmarks/dpbench-dataset/tableformer -o ./benchmarks/dpbench-dataset/tableformer
9899
```
99100

100101
The final result can be visualised as,
101102

102103
![DPBench_TEDS](./docs/evaluations/evaluation_DPBench_tableformer.png)
103104
</details>
104105

106+
### OmniDocBench
107+
108+
Using a single command,
109+
110+
```sh
111+
poetry run python ./docs/examples/benchmark_omnidocbench.py
112+
```
113+
114+
<details>
115+
<summary><b>Layout evaluation for OmniDocBench</b></summary>
116+
<br>
117+
118+
👉 Create the dataset,
119+
120+
```sh
121+
poetry run evaluate -t create -m layout -b OmniDocBench -i ./benchmarks/omnidocbench-original -o ./benchmarks/omnidocbench-dataset/layout
122+
```
123+
124+
👉 Evaluate the dataset,
125+
126+
```sh
127+
poetry run evaluate -t evaluate -m layout -b OmniDocBench -i ./benchmarks/omnidocbench-dataset/layout -o ./benchmarks/omnidocbench-dataset/layout
128+
```
129+
130+
👉 Visualise the dataset,
131+
132+
```sh
133+
poetry run evaluate -t visualize -m tableformer -b OmniDocBench -i ./benchmarks/OmniDocBench-dataset/layout -o ./benchmarks/OmniDocBench-dataset/layout
134+
```
135+
136+
| label | Class mAP[0.5:0.95] |
137+
|----------------|-----------------------|
138+
| table | 69.32 |
139+
| picture | 29.29 |
140+
| text | 23.99 |
141+
| page_footer | 16.14 |
142+
| section_header | 13.09 |
143+
| caption | 10.74 |
144+
| page_header | 10.02 |
145+
| formula | 3.83 |
146+
| footnote | 2.48 |
147+
</details>
148+
149+
<details>
150+
<summary><b>Table evaluations for OmniDocBench</b></summary>
151+
<br>
152+
153+
👉 Create the dataset,
154+
155+
```sh
156+
poetry run evaluate -t create -m tableformer -b OmniDocBench -i ./benchmarks/omnidocbench-original -o ./benchmarks/omnidocbench-dataset/tableformer
157+
```
158+
159+
👉 Evaluate the dataset,
160+
161+
```sh
162+
poetry run evaluate -t evaluate -m tableformer -b OmniDocBench -i ./benchmarks/omnidocbench-dataset/tableformer -o ./benchmarks/omnidocbench-dataset/tableformer
163+
```
164+
165+
👉 Visualise the dataset,
166+
167+
```sh
168+
poetry run evaluate -t visualize -m tableformer -b OmniDocBench -i ./benchmarks/OmniDocBench-dataset/tableformer -o ./benchmarks/OmniDocBench-dataset/tableformer
169+
```
170+
171+
The final result can be visualised as,
172+
173+
| x0<=TEDS | TEDS<=x1 | prob [%] | acc [%] | 1-acc [%] | total |
174+
|------------|------------|------------|-----------|-------------|---------|
175+
| 0 | 0.05 | 0.61 | 0 | 100 | 2 |
176+
| 0.05 | 0.1 | 0 | 0.61 | 99.39 | 0 |
177+
| 0.1 | 0.15 | 0.61 | 0.61 | 99.39 | 2 |
178+
| 0.15 | 0.2 | 0 | 1.21 | 98.79 | 0 |
179+
| 0.2 | 0.25 | 0.3 | 1.21 | 98.79 | 1 |
180+
| 0.25 | 0.3 | 1.21 | 1.52 | 98.48 | 4 |
181+
| 0.3 | 0.35 | 2.12 | 2.73 | 97.27 | 7 |
182+
| 0.35 | 0.4 | 0.91 | 4.85 | 95.15 | 3 |
183+
| 0.4 | 0.45 | 0.91 | 5.76 | 94.24 | 3 |
184+
| 0.45 | 0.5 | 0.91 | 6.67 | 93.33 | 3 |
185+
| 0.5 | 0.55 | 2.12 | 7.58 | 92.42 | 7 |
186+
| 0.55 | 0.6 | 3.03 | 9.7 | 90.3 | 10 |
187+
| 0.6 | 0.65 | 3.33 | 12.73 | 87.27 | 11 |
188+
| 0.65 | 0.7 | 3.94 | 16.06 | 83.94 | 13 |
189+
| 0.7 | 0.75 | 7.27 | 20 | 80 | 24 |
190+
| 0.75 | 0.8 | 6.97 | 27.27 | 72.73 | 23 |
191+
| 0.8 | 0.85 | 13.33 | 34.24 | 65.76 | 44 |
192+
| 0.85 | 0.9 | 13.33 | 47.58 | 52.42 | 44 |
193+
| 0.9 | 0.95 | 22.12 | 60.91 | 39.09 | 73 |
194+
| 0.95 | 1 | 16.97 | 83.03 | 16.97 | 56 |
195+
</details>
196+
197+
### FinTabNet
198+
199+
Using a single command (loading the dataset from Huggingface: [FinTabNet_OTSL](https://huggingface.co/datasets/ds4sd/FinTabNet_OTSL)),
200+
201+
```sh
202+
poetry run python ./docs/examples/benchmark_fintabnet.py
203+
```
204+
205+
<details>
206+
<summary><b>Table evaluations for FinTabNet</b></summary>
207+
<br>
208+
209+
👉 Evaluate the dataset,
210+
211+
```sh
212+
poetry run evaluate -t evaluate -m tableformer -b FinTabNet -i ./benchmarks/fintabnet-dataset/tableformer -o ./benchmarks/fintabnet-dataset/tableformer
213+
```
214+
215+
👉 Visualise the dataset,
216+
217+
```sh
218+
poetry run evaluate -t visualize -m tableformer -b FinTabNet -i ./benchmarks/fintabnet-dataset/tableformer -o ./benchmarks/fintabnet-dataset/tableformer
219+
```
220+
221+
The final result (struct only here) can be visualised as,
222+
223+
| x0<=TEDS | TEDS<=x1 | prob [%] | acc [%] | 1-acc [%] | total |
224+
|------------|------------|------------|-----------|-------------|---------|
225+
| 0 | 0.05 | 0 | 0 | 100 | 0 |
226+
| 0.05 | 0.1 | 0 | 0 | 100 | 0 |
227+
| 0.1 | 0.15 | 0 | 0 | 100 | 0 |
228+
| 0.15 | 0.2 | 0.2 | 0 | 100 | 2 |
229+
| 0.2 | 0.25 | 0 | 0.2 | 99.8 | 0 |
230+
| 0.25 | 0.3 | 0 | 0.2 | 99.8 | 0 |
231+
| 0.3 | 0.35 | 0 | 0.2 | 99.8 | 0 |
232+
| 0.35 | 0.4 | 0 | 0.2 | 99.8 | 0 |
233+
| 0.4 | 0.45 | 0 | 0.2 | 99.8 | 0 |
234+
| 0.45 | 0.5 | 0 | 0.2 | 99.8 | 0 |
235+
| 0.5 | 0.55 | 0.3 | 0.2 | 99.8 | 3 |
236+
| 0.55 | 0.6 | 0.5 | 0.5 | 99.5 | 5 |
237+
| 0.6 | 0.65 | 0.7 | 1 | 99 | 7 |
238+
| 0.65 | 0.7 | 0.6 | 1.7 | 98.3 | 6 |
239+
| 0.7 | 0.75 | 1.5 | 2.3 | 97.7 | 15 |
240+
| 0.75 | 0.8 | 3.3 | 3.8 | 96.2 | 33 |
241+
| 0.8 | 0.85 | 15.3 | 7.1 | 92.9 | 153 |
242+
| 0.85 | 0.9 | 19 | 22.4 | 77.6 | 190 |
243+
| 0.9 | 0.95 | 30.7 | 41.4 | 58.6 | 307 |
244+
| 0.95 | 1 | 27.9 | 72.1 | 27.9 | 279 |
245+
</details>
246+
247+
### Pub1M
248+
249+
Using a single command (loading the dataset from Huggingface: [Pub1M_OTSL](https://huggingface.co/datasets/ds4sd/Pub1M_OTSL)),
250+
251+
```sh
252+
poetry run python ./docs/examples/benchmark_p1m.py
253+
```
254+
255+
<details>
256+
<summary><b>Table evaluations for Pub1M</b></summary>
257+
<br>
258+
259+
👉 Evaluate the dataset,
260+
261+
```sh
262+
poetry run evaluate -t evaluate -m tableformer -b Pub1M -i ./benchmarks/Pub1M-dataset/tableformer -o ./benchmarks/Pub1M-dataset/tableformer
263+
```
264+
265+
👉 Visualise the dataset,
266+
267+
```sh
268+
poetry run evaluate -t visualize -m tableformer -b Pub1M -i ./benchmarks/Pub1M-dataset/tableformer -o ./benchmarks/Pub1M-dataset/tableformer
269+
```
270+
271+
| x0<=TEDS | TEDS<=x1 | prob [%] | acc [%] | 1-acc [%] | total |
272+
|------------|------------|------------|-----------|-------------|---------|
273+
| 0 | 0.05 | 1.3 | 0 | 100 | 13 |
274+
| 0.05 | 0.1 | 0.8 | 1.3 | 98.7 | 8 |
275+
| 0.1 | 0.15 | 0.2 | 2.1 | 97.9 | 2 |
276+
| 0.15 | 0.2 | 0.2 | 2.3 | 97.7 | 2 |
277+
| 0.2 | 0.25 | 0 | 2.5 | 97.5 | 0 |
278+
| 0.25 | 0.3 | 0 | 2.5 | 97.5 | 0 |
279+
| 0.3 | 0.35 | 0.3 | 2.5 | 97.5 | 3 |
280+
| 0.35 | 0.4 | 0 | 2.8 | 97.2 | 0 |
281+
| 0.4 | 0.45 | 0.1 | 2.8 | 97.2 | 1 |
282+
| 0.45 | 0.5 | 0.3 | 2.9 | 97.1 | 3 |
283+
| 0.5 | 0.55 | 0.8 | 3.2 | 96.8 | 8 |
284+
| 0.55 | 0.6 | 1.6 | 4 | 96 | 16 |
285+
| 0.6 | 0.65 | 1.6 | 5.6 | 94.4 | 16 |
286+
| 0.65 | 0.7 | 2.3 | 7.2 | 92.8 | 23 |
287+
| 0.7 | 0.75 | 4.6 | 9.5 | 90.5 | 46 |
288+
| 0.75 | 0.8 | 10.8 | 14.1 | 85.9 | 108 |
289+
| 0.8 | 0.85 | 15.3 | 24.9 | 75.1 | 153 |
290+
| 0.85 | 0.9 | 21.6 | 40.2 | 59.8 | 216 |
291+
| 0.9 | 0.95 | 22.9 | 61.8 | 38.2 | 229 |
292+
| 0.95 | 1 | 15.3 | 84.7 | 15.3 | 153 |
293+
</details>
294+
295+
### PubTabNet
296+
297+
Using a single command (loading the dataset from Huggingface: [Pubtabnet_OTSL](https://huggingface.co/datasets/ds4sd/Pubtabnet_OTSL)),
298+
299+
```sh
300+
poetry run python ./docs/examples/benchmark_pubtabnet.py
301+
```
302+
303+
<details>
304+
<summary><b>Table evaluations for Pubtabnet</b></summary>
305+
<br>
306+
307+
👉 Evaluate the dataset,
308+
309+
```sh
310+
poetry run evaluate -t evaluate -m tableformer -b Pubtabnet -i ./benchmarks/pubtabnet-dataset/tableformer -o ./benchmarks/pubtabnet-dataset/tableformer
311+
```
312+
313+
👉 Visualise the dataset,
314+
315+
```sh
316+
poetry run evaluate -t visualize -m tableformer -b Pubtabnet -i ./benchmarks/pubtabnet-dataset/tableformer -o ./benchmarks/pubtabnet-dataset/tableformer
317+
```
318+
319+
The final result (struct only here) can be visualised as,
320+
321+
| x0<=TEDS | TEDS<=x1 | prob [%] | acc [%] | 1-acc [%] | total |
322+
|------------|------------|------------|-----------|-------------|---------|
323+
| 0 | 0.05 | 0 | 0 | 100 | 0 |
324+
| 0.05 | 0.1 | 0.01 | 0 | 100 | 1 |
325+
| 0.1 | 0.15 | 0.01 | 0.01 | 99.99 | 1 |
326+
| 0.15 | 0.2 | 0.02 | 0.02 | 99.98 | 2 |
327+
| 0.2 | 0.25 | 0 | 0.04 | 99.96 | 0 |
328+
| 0.25 | 0.3 | 0 | 0.04 | 99.96 | 0 |
329+
| 0.3 | 0.35 | 0 | 0.04 | 99.96 | 0 |
330+
| 0.35 | 0.4 | 0 | 0.04 | 99.96 | 0 |
331+
| 0.4 | 0.45 | 0.02 | 0.04 | 99.96 | 2 |
332+
| 0.45 | 0.5 | 0.1 | 0.06 | 99.94 | 10 |
333+
| 0.5 | 0.55 | 0.1 | 0.15 | 99.85 | 10 |
334+
| 0.55 | 0.6 | 0.24 | 0.25 | 99.75 | 25 |
335+
| 0.6 | 0.65 | 0.47 | 0.49 | 99.51 | 49 |
336+
| 0.65 | 0.7 | 1.04 | 0.96 | 99.04 | 108 |
337+
| 0.7 | 0.75 | 2.44 | 2 | 98 | 254 |
338+
| 0.75 | 0.8 | 4.65 | 4.44 | 95.56 | 483 |
339+
| 0.8 | 0.85 | 13.71 | 9.09 | 90.91 | 1425 |
340+
| 0.85 | 0.9 | 21.2 | 22.8 | 77.2 | 2204 |
341+
| 0.9 | 0.95 | 28.48 | 43.99 | 56.01 | 2961 |
342+
| 0.95 | 1 | 27.53 | 72.47 | 27.53 | 2862 |
343+
</details>
344+
105345
## Contributing
106346

107347
Please read [Contributing to Docling](https://github.com/DS4SD/docling/blob/main/CONTRIBUTING.md) for details.

docling_eval/benchmarks/constants.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ class BenchMarkColumns(str, Enum):
1515
PICTURES = "pictures"
1616

1717
MIMETYPE = "mimetype"
18+
TIMINGS = "timings"
1819

1920

2021
class EvaluationModality(str, Enum):
@@ -28,7 +29,7 @@ class BenchMarkNames(str, Enum):
2829

2930
# End-to-End
3031
DPBENCH = "DPBench"
31-
OMNIDOCBENCH = "OmniDcoBench"
32+
OMNIDOCBENCH = "OmniDocBench"
3233
WORDSCAPE = "WordScape"
3334

3435
# Layout

0 commit comments

Comments
 (0)