Skip to content

Commit 917f6fc

Browse files
committed
chore: add vendored test documents for benchmark fixtures
Add 342 test documents from docling, markitdown, and pdfplumber that are referenced by benchmark fixture JSONs. Without these files, all benchmark CI jobs fail with "321 fixture document(s) not found".
1 parent 828c26c commit 917f6fc

File tree

342 files changed

+246016
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

342 files changed

+246016
-0
lines changed
Lines changed: 146 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,146 @@
1+
<!DOCTYPE html>
2+
<html>
3+
<head>
4+
<meta charset="UTF-8"/>
5+
<title>word_tables</title>
6+
<meta name="generator" content="Docling HTML Serializer"/>
7+
<style>
8+
html {
9+
background-color: #f5f5f5;
10+
font-family: Arial, sans-serif;
11+
line-height: 1.6;
12+
}
13+
body {
14+
max-width: 800px;
15+
margin: 0 auto;
16+
padding: 2rem;
17+
background-color: white;
18+
box-shadow: 0 0 10px rgba(0,0,0,0.1);
19+
}
20+
h1, h2, h3, h4, h5, h6 {
21+
color: #333;
22+
margin-top: 1.5em;
23+
margin-bottom: 0.5em;
24+
}
25+
h1 {
26+
font-size: 2em;
27+
border-bottom: 1px solid #eee;
28+
padding-bottom: 0.3em;
29+
}
30+
table {
31+
border-collapse: collapse;
32+
margin: 1em 0;
33+
width: 100%;
34+
}
35+
th, td {
36+
border: 1px solid #ddd;
37+
padding: 8px;
38+
text-align: left;
39+
}
40+
th {
41+
background-color: #f2f2f2;
42+
font-weight: bold;
43+
}
44+
figure {
45+
margin: 1.5em 0;
46+
text-align: center;
47+
}
48+
figcaption {
49+
color: #666;
50+
font-style: italic;
51+
margin-top: 0.5em;
52+
}
53+
img {
54+
max-width: 100%;
55+
height: auto;
56+
}
57+
pre {
58+
background-color: #f6f8fa;
59+
border-radius: 3px;
60+
padding: 1em;
61+
overflow: auto;
62+
}
63+
code {
64+
font-family: monospace;
65+
background-color: #f6f8fa;
66+
padding: 0.2em 0.4em;
67+
border-radius: 3px;
68+
}
69+
pre code {
70+
background-color: transparent;
71+
padding: 0;
72+
}
73+
.formula {
74+
text-align: center;
75+
padding: 0.5em;
76+
margin: 1em 0;
77+
background-color: #f9f9f9;
78+
}
79+
.formula-not-decoded {
80+
text-align: center;
81+
padding: 0.5em;
82+
margin: 1em 0;
83+
background: repeating-linear-gradient(
84+
45deg,
85+
#f0f0f0,
86+
#f0f0f0 10px,
87+
#f9f9f9 10px,
88+
#f9f9f9 20px
89+
);
90+
}
91+
.page-break {
92+
page-break-after: always;
93+
border-top: 1px dashed #ccc;
94+
margin: 2em 0;
95+
}
96+
.key-value-region {
97+
background-color: #f9f9f9;
98+
padding: 1em;
99+
border-radius: 4px;
100+
margin: 1em 0;
101+
}
102+
.key-value-region dt {
103+
font-weight: bold;
104+
}
105+
.key-value-region dd {
106+
margin-left: 1em;
107+
margin-bottom: 0.5em;
108+
}
109+
.form-container {
110+
border: 1px solid #ddd;
111+
padding: 1em;
112+
border-radius: 4px;
113+
margin: 1em 0;
114+
}
115+
.form-item {
116+
margin-bottom: 0.5em;
117+
}
118+
.image-classification {
119+
font-size: 0.9em;
120+
color: #666;
121+
margin-top: 0.5em;
122+
}
123+
</style>
124+
</head>
125+
<body>
126+
<div class='page'>
127+
<h2>Test with tables</h2>
128+
<p>A uniform table</p>
129+
<table><tbody><tr><th>Header 0.0</th><th>Header 0.1</th><th>Header 0.2</th></tr><tr><td>Cell 1.0</td><td>Cell 1.1</td><td>Cell 1.2</td></tr><tr><td>Cell 2.0</td><td>Cell 2.1</td><td>Cell 2.2</td></tr></tbody></table>
130+
<p></p>
131+
<p>A non-uniform table with horizontal spans</p>
132+
<table><tbody><tr><th>Header 0.0</th><th>Header 0.1</th><th>Header 0.2</th></tr><tr><td>Cell 1.0</td><td colspan="2">Merged Cell 1.1 1.2</td></tr><tr><td>Cell 2.0</td><td colspan="2">Merged Cell 2.1 2.2</td></tr></tbody></table>
133+
<p></p>
134+
<p>A non-uniform table with horizontal spans in inner columns</p>
135+
<table><tbody><tr><th>Header 0.0</th><th>Header 0.1</th><th>Header 0.2</th><th>Header 0.3</th></tr><tr><td>Cell 1.0</td><td colspan="2">Merged Cell 1.1 1.2</td><td>Cell 1.3</td></tr><tr><td>Cell 2.0</td><td colspan="2">Merged Cell 2.1 2.2</td><td>Cell 2.3</td></tr></tbody></table>
136+
<p></p>
137+
<p>A non-uniform table with vertical spans</p>
138+
<table><tbody><tr><th>Header 0.0</th><th>Header 0.1</th><th>Header 0.2</th></tr><tr><td>Cell 1.0</td><td rowspan="2">Merged Cell 1.1 2.1</td><td>Cell 1.2</td></tr><tr><td>Cell 2.0</td><td>Cell 2.2</td></tr><tr><td>Cell 3.0</td><td rowspan="2">Merged Cell 3.1 4.1</td><td>Cell 3.2</td></tr><tr><td>Cell 4.0</td><td>Cell 4.2</td></tr></tbody></table>
139+
<p></p>
140+
<p>A non-uniform table with all kinds of spans and empty cells</p>
141+
<table><tbody><tr><th>Header 0.0</th><th>Header 0.1</th><th>Header 0.2</th><th></th><th></th></tr><tr><td>Cell 1.0</td><td rowspan="2">Merged Cell 1.1 2.1</td><td>Cell 1.2</td><td></td><td></td></tr><tr><td>Cell 2.0</td><td>Cell 2.2</td><td></td><td></td></tr><tr><td>Cell 3.0</td><td rowspan="2">Merged Cell 3.1 4.1</td><td>Cell 3.2</td><td rowspan="3"></td><td></td></tr><tr><td>Cell 4.0</td><td>Cell 4.2</td><td rowspan="2">Merged Cell 4.4 5.4</td></tr><tr><td></td><td></td><td></td></tr><tr><td></td><td></td><td></td><td></td><td></td></tr><tr><td colspan="5"></td></tr><tr><td></td><td></td><td></td><td></td><td>Cell 8.4</td></tr></tbody></table>
142+
<p></p>
143+
<p></p>
144+
</div>
145+
</body>
146+
</html>

0 commit comments

Comments
 (0)