feat: Visualizer tool and command for datasets by cau-git · Pull Request #186 · docling-project/docling-eval

cau-git · 2025-12-08T14:43:14Z

No description provided.

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

…mmy entries in all evaluators. Extend the CLI to support the --external-predictions-path Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

…various formats Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

…th. Add unit test Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

…d unit test. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

…it test. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

…unit test Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

…dd unit test. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

…. Add unit test Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

…dd unit test Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

…ngOrderEvaluator. Fix main Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

…oclingDocument from doctags and the GT image. - Introduce the staticmethod load_doctags() which covers all cases on page image loading. - Refactor the FilePredictionProvider to use the load_doctags() from ExternalDoclingDocumentLoader. - Refactor all evaluators to use the new ExternalDoclingDocumentLoader. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

…sing the API and the CLI. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

github-actions · 2025-12-08T14:43:24Z

✅ DCO Check Passed

Thanks @cau-git, all your commits are properly signed off. 🎉

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

Copilot

Pull request overview

This PR adds a new standalone visualization tool for dataset predictions, allowing users to generate HTML visualizations without creating full evaluation datasets. The tool supports both embedded predictions (in dataset parquet files) and external predictions (from separate DoclingDocument files).

Key Changes:

Added PredictionsVisualizer utility class for generating GT vs. prediction HTML visualizations
Added CLI command create_viz for invoking the visualizer from the command line
Added tests for both embedded and external prediction visualization modes

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.

File	Description
`docling_eval/utils/external_predictions_visualizer.py`	New visualizer class that renders paired ground-truth vs. prediction HTML outputs from datasets
`docling_eval/cli/main.py`	Added `create_viz` CLI command and unrelated OCR configuration changes
`tests/test_predictions_visualizer.py`	Integration tests for both embedded and external prediction visualization scenarios

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

docling_eval/utils/external_predictions_visualizer.py

tests/test_predictions_visualizer.py

docling_eval/cli/main.py

docling_eval/utils/external_predictions_visualizer.py

mergify · 2025-12-08T15:52:24Z

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

🟢 Require two reviewer for test updates

Wonderful, this rule succeeded.

When test data is updated, we require two reviewers

#approved-reviews-by >= 2

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>

…ternal predictions Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

nikos-livathinos and others added 22 commits December 4, 2025 16:34

chore: Move the teds.py inside the subdir evaluators/table

6331723

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

feat: Introduce the external_predictions_path in BaseEvaluator and du…

85890fb

…mmy entries in all evaluators. Extend the CLI to support the --external-predictions-path Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

feat: Extend test_dataset_builder.py to save document predictions in …

5f9a279

…various formats Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

feat: Extend MarkDownTextEvaluator to support external_predictions_pa…

e6e8409

…th. Add unit test Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

feat: Extend LayoutEvaluator to support external_predictions_path. Ad…

5624e61

…d unit test. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

Merge branch 'main' into nli/external_predictions

426b6d1

fix: Add missing pytest dependencies in tests

171ad74

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

fix: Fix loading the external predictions in LayoutEvaluator

0f0cfb5

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

feat: Introduce external predictions in DocStructureEvaluator. Add un…

8069571

…it test. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

feat: Extend the TableEvaluator to support external predictions. Add …

8ba6b45

…unit test Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

feat: Extend the KeyValueEvaluator to support external predictions. A…

949d6cc

…dd unit test. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

feat: Extend the PixelLayoutEvaluator to support external predictions…

13badc5

…. Add unit test Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

feat: Extend the BboxTextEvaluator to support external predictions. A…

8c2a065

…dd unit test Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

feat: Disable the OCREvaluator when using the external predictions

08391b3

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

fix: Fixing guard for external predictions in TimingsEvaluator, Readi…

595ba6c

…ngOrderEvaluator. Fix main Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

fix: Export the doctag files with the correct file extension

406b122

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

chore: Rename code file as external_docling_document_loader.py

33511c9

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

Merge branch 'main' into nli/external_predictions

b1525b6

fix: Fix typo

94b3938

Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

feat: Introduce examples how to evaluate using external predictions u…

ae10646

…sing the API and the CLI. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

feat: Prediction vizualizer

8c52e36

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

cau-git changed the base branch from main to nli/external_predictions December 8, 2025 14:43

feat: Prediction vizualizer

71f5e17

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

cau-git force-pushed the cau/add-external-vis-tool branch from 3794359 to 71f5e17 Compare December 8, 2025 14:45

cau-git requested review from Copilot and nikos-livathinos December 8, 2025 14:45

cau-git marked this pull request as ready for review December 8, 2025 14:46

Copilot started reviewing on behalf of cau-git December 8, 2025 14:46 View session

Copilot AI reviewed Dec 8, 2025

View reviewed changes

Base automatically changed from nli/external_predictions to main December 8, 2025 15:51

cau-git and others added 3 commits December 9, 2025 08:20

Update docling_eval/utils/external_predictions_visualizer.py

6f7331c

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>

Merge branch 'main' into cau/add-external-vis-tool

57bd131

feat: Update examples bash script to demonstrate visualisations on ex…

21eae30

…ternal predictions Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>

nikos-livathinos approved these changes Dec 9, 2025

View reviewed changes

nikos-livathinos requested review from PeterStaar-IBM and vagenas December 9, 2025 12:39

PeterStaar-IBM approved these changes Dec 9, 2025

View reviewed changes

nikos-livathinos merged commit 373f959 into main Dec 9, 2025
10 checks passed

nikos-livathinos deleted the cau/add-external-vis-tool branch December 9, 2025 13:47

nikos-livathinos mentioned this pull request Dec 10, 2025

Allow the direct evaluation of externally provided DocTag and DoclingDocument json files without having a HF parquet prediction dataset #112

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Visualizer tool and command for datasets#186

feat: Visualizer tool and command for datasets#186
nikos-livathinos merged 26 commits intomainfrom
cau/add-external-vis-tool

cau-git commented Dec 8, 2025

Uh oh!

github-actions bot commented Dec 8, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mergify bot commented Dec 8, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

cau-git commented Dec 8, 2025

Uh oh!

github-actions bot commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Key Changes:

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mergify bot commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge Protections

🟢 Enforce conventional commit

🟢 Require two reviewer for test updates

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

github-actions bot commented Dec 8, 2025 •

edited

Loading

mergify bot commented Dec 8, 2025 •

edited

Loading