Skip to content

Commit 6afdc2b

Browse files
committed
feat(benchmark-harness): add Rust fixture generation and size measurement
- Add generate.rs module for fixture generation from vendored test docs - Add sizes.rs module for framework installation size measurement - Add CLI commands: generate-fixtures, measure-framework-sizes - Add 554 benchmark fixture JSON files across 20 file formats - Update framework_sizes.json with 27 framework configurations - Fix ground_truth schema to use object format {text_file, source} Replaces Python scripts with proper Rust implementation including tests.
1 parent 629ed63 commit 6afdc2b

File tree

1,238 files changed

+333651
-50
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

1,238 files changed

+333651
-50
lines changed

.github/workflows/benchmarks.yaml

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,8 @@ env:
3434
RUSTFLAGS: "-C strip=symbols"
3535
MEASURE_QUALITY: "true"
3636
OCR_ENABLED: "true"
37+
RUN_OCR_BENCHMARKS: "true"
38+
GROUND_TRUTH_DIR: "test_documents/ground_truth"
3739

3840
permissions:
3941
contents: read
@@ -43,8 +45,27 @@ defaults:
4345
shell: bash
4446

4547
jobs:
48+
validate-ground-truth:
49+
name: Validate Ground Truth
50+
runs-on: ubuntu-latest
51+
permissions:
52+
contents: read
53+
steps:
54+
- uses: actions/checkout@v4
55+
with:
56+
ref: ${{ github.event.inputs.branch || github.ref }}
57+
58+
- name: Setup Python
59+
uses: actions/setup-python@v5
60+
with:
61+
python-version: "3.11"
62+
63+
- name: Validate ground truth organization
64+
run: python tools/benchmark-harness/scripts/validate_ground_truth.py
65+
4666
setup:
4767
name: Build harness + native libs
68+
needs: validate-ground-truth
4869
runs-on: ubuntu-latest
4970
timeout-minutes: 360
5071
permissions:
@@ -1960,6 +1981,17 @@ jobs:
19601981
echo "Found $ARTIFACT_COUNT artifact directories"
19611982
find benchmark-artifacts -mindepth 1 -maxdepth 1 -type d -exec basename {} \;
19621983
1984+
- name: Setup Python
1985+
uses: actions/setup-python@v5
1986+
with:
1987+
python-version: "3.11"
1988+
1989+
- name: Measure framework sizes
1990+
continue-on-error: true
1991+
run: |
1992+
python tools/benchmark-harness/scripts/measure_framework_sizes.py \
1993+
--output consolidated-output/framework-sizes.json || true
1994+
19631995
- name: Consolidate results
19641996
run: |
19651997
set -euo pipefail

Cargo.lock

Lines changed: 17 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
Sample Document Title
2+
3+
Section 1
4+
5+
This is some introductory text in section 1.
6+
7+
Subsection 1.1
8+
9+
- First list item
10+
11+
- Second list item
12+
13+
This is some introductory text in section 1.1.
14+
15+
- - A dash list item
16+
17+
Section 2
18+
19+
This is some text in section 2.
20+
21+
Header 1 Header 2
22+
23+
Value 1 Value 2
24+
Value 3 Value 4
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
{
2+
"source": "docling_export",
3+
"source_file": "vendored/docling/groundtruth/docling_v2/test_01.asciidoc.md",
4+
"document_name": "test_01",
5+
"original_md_filename": "test_01.asciidoc",
6+
"file_type": "asciidoc",
7+
"original_source_doc": null
8+
}
Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
2nd Sample Document Title
2+
3+
This is an abstract.
4+
5+
Section 1: Testing nestedlists
6+
7+
- First item
8+
- Nested item 1
9+
- Nested item 2
10+
- Second item
11+
- Nested ordered item 1
12+
- Nested ordered item 2
13+
- Deeper nested unordered item
14+
- Third item
15+
- Nested ordered item 1
16+
- Nested ordered item 2
17+
- Deeper nested unordered item
18+
- Nested ordered item 2
19+
20+
Section 2
21+
22+
bla bla
23+
24+
bla bla bla
25+
26+
Section 3: test image
27+
28+
image::images/example1.png[Example Image, width=200, height=150, align=center]
29+
30+
.An example caption for the image
31+
32+
image::images/example2.png[Example Image, width=200, height=150, align=center]
33+
34+
Section 4: test tables
35+
36+
Header 1 Header 2
37+
38+
Value 1 Value 2
39+
Value 3 Value 4
40+
41+
.Caption for the table 1
42+
43+
===
44+
45+
Header 1 Header 2
46+
47+
Value 1 Value 2
48+
Value 3 Value 4
49+
50+
.Caption for the table 2
51+
52+
===
53+
54+
Column 1 Heading Column 2 Heading Column 3 Heading
55+
56+
Cell 1 Cell 2 Cell 3
57+
Cell 4 Cell 5 colspan=2 Cell spans two columns
58+
59+
.Caption for the table 3
60+
61+
===
62+
63+
Column 1 Heading Column 2 Heading Column 3 Heading
64+
65+
Rowspan=2 Cell 2 Cell 3
66+
Cell 5 Cell 6
67+
68+
.Caption for the table 4
69+
70+
===
71+
72+
Col 1 Col 2 Col 3 Col 4
73+
74+
Rowspan=2.Colspan=2 Cell spanning 2 rows and 2 columns Col 3 Col 4
75+
Col 3 Col 4
76+
Col 1 Col 2 Col 3 Col 4
77+
78+
SubSubSection 2.1.1
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
{
2+
"source": "docling_export",
3+
"source_file": "vendored/docling/groundtruth/docling_v2/test_02.asciidoc.md",
4+
"document_name": "test_02",
5+
"original_md_filename": "test_02.asciidoc",
6+
"file_type": "asciidoc",
7+
"original_source_doc": null
8+
}
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
:\mod-docs-content-type: PROCEDURE :experimental:
2+
3+
Renaming a bookmark
4+
5+
[id="renaming-a-bookmark\{context}"]
6+
7+
You can rename a bookmark to distinguish it from other bookmarks. If you have bookmarks to several folders that all share the same name, you can tell the bookmarks apart if you rename them.
8+
9+
Renaming the bookmark does not rename the folder.
10+
11+
- Check that the side bar lists the bookmark under the new name.
12+
13+
Procedure . Right-click the bookmark in the side bar. . Select Rename…. +
14+
15+
<!-- image -->
16+
17+
In the Name field, enter the new name for the bookmark. +
18+
19+
<!-- image -->
20+
21+
Click btn:[Rename]. .Verification
22+
23+
<!-- image -->
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
{
2+
"source": "docling_export",
3+
"source_file": "vendored/docling/groundtruth/docling_v2/test_03.asciidoc.md",
4+
"document_name": "test_03",
5+
"original_md_filename": "test_03.asciidoc",
6+
"file_type": "asciidoc",
7+
"original_source_doc": null
8+
}
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
1 2 3 4
2+
3+
a b c d
4+
a , c d
5+
a b c d
6+
a b c d
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
{
2+
"source": "docling_export",
3+
"source_file": "vendored/docling/groundtruth/docling_v2/csv-comma-in-cell.csv.md",
4+
"document_name": "csv-comma-in-cell",
5+
"original_md_filename": "csv-comma-in-cell.csv",
6+
"file_type": "csv",
7+
"original_source_doc": "vendored/docling/csv/csv-comma-in-cell.csv"
8+
}

0 commit comments

Comments
 (0)