feat: Visualizer tool and command for datasets#186
Conversation
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
…mmy entries in all evaluators. Extend the CLI to support the --external-predictions-path Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
…various formats Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
…th. Add unit test Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
…d unit test. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
…it test. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
…unit test Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
…dd unit test. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
…. Add unit test Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
…dd unit test Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
…ngOrderEvaluator. Fix main Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
…oclingDocument from doctags and the GT image. - Introduce the staticmethod load_doctags() which covers all cases on page image loading. - Refactor the FilePredictionProvider to use the load_doctags() from ExternalDoclingDocumentLoader. - Refactor all evaluators to use the new ExternalDoclingDocumentLoader. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
…sing the API and the CLI. Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
|
✅ DCO Check Passed Thanks @cau-git, all your commits are properly signed off. 🎉 |
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
3794359 to
71f5e17
Compare
There was a problem hiding this comment.
Pull request overview
This PR adds a new standalone visualization tool for dataset predictions, allowing users to generate HTML visualizations without creating full evaluation datasets. The tool supports both embedded predictions (in dataset parquet files) and external predictions (from separate DoclingDocument files).
Key Changes:
- Added
PredictionsVisualizerutility class for generating GT vs. prediction HTML visualizations - Added CLI command
create_vizfor invoking the visualizer from the command line - Added tests for both embedded and external prediction visualization modes
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.
| File | Description |
|---|---|
docling_eval/utils/external_predictions_visualizer.py |
New visualizer class that renders paired ground-truth vs. prediction HTML outputs from datasets |
docling_eval/cli/main.py |
Added create_viz CLI command and unrelated OCR configuration changes |
tests/test_predictions_visualizer.py |
Integration tests for both embedded and external prediction visualization scenarios |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🟢 Enforce conventional commitWonderful, this rule succeeded.Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/
🟢 Require two reviewer for test updatesWonderful, this rule succeeded.When test data is updated, we require two reviewers
|
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Christoph Auer <60343111+cau-git@users.noreply.github.com>
…ternal predictions Signed-off-by: Nikos Livathinos <nli@zurich.ibm.com>
No description provided.