Skip to content

Releases: docling-project/docling-eval

v0.10.0

05 Nov 18:26

Choose a tag to compare

Feature

  • Extend the CLI for create-eval to receive the vlm-options and max_new_tokens parameters when the provider is GraniteDocling (#164) (8be2e83)
  • Harmonizing pic classes for cvat to docling conversion (#167) (740157d)
  • Add more specific validation for reading-order, enhance validation report (5e5f2db)
  • Integrate textline_cells based OCR evaluation (#156) (3a9543c)

Fix

  • Validation fixes for list item impurity check (#169) (74e7b3e)
  • Don't report content-layer group violation multiple times (cb71009)
  • Handle merged elements regarding inclusion, don't flag single element pages (c10fdfd)
  • Missing transform to storage_scale for some items and table cells (1eb6b4e)
  • More CVAT validation and docling conversion fixes (#163) (6f59c7a)
  • Better control over scaling in CVAT transform, fixes for OCR (#162) (ef17b5a)
  • Fixes for CVAT validation, OCR in CVAT pipeline, logging, and more (#161) (80e449d)

Performance

v0.9.0

01 Oct 03:42

Choose a tag to compare

Feature

  • Exposed forced-ocr-option (#157) (ac21644)
  • Implementation of table structure conversion from CVAT to DoclingDocument (208cd14)

v0.8.1

16 Sep 08:23

Choose a tag to compare

Fix

  • Ocr visualization and add ocr recognition metrics (#144) (d63a439)

v0.8.0

02 Sep 21:18
758f6dc

Choose a tag to compare

What's Changed

  • feat: Extend the Consolidator to export Latex files alongside the excel report by @nikos-livathinos in #143
  • feat: Extend the DoclingEvalCOCOExporter to export a parquet dataset in COCO format by @nikos-livathinos in #145
  • feat: Several fixes and campaign tools extensions by @cau-git in #150
  • feat: Add Table structure evaluations for TEDS by @praveenmidde in #94

Full Changelog: v0.7.0...v0.8.0

v0.7.0

30 Jul 08:06

Choose a tag to compare

Feature

Fix

  • Prevent crash from invalid bbox coordinates in HTML export (#142) (c31b107)

v0.6.0

02 Jul 09:01

Choose a tag to compare

Feature

  • Layout evaluation fixes, mode control and cleanup (#133) (629a451)
  • Introduce utility to export layout predictions from HF parquet files into pycocotools format. (#125) (54f7c81)
  • Add specific language support for XFUND dataset builder (#122) (4ca6a0e)
  • Tooling for CVAT validation, to DoclingDocument transformation, new Evaluators (#119) (2ee1104)

Fix

  • Move ibm-cos to hyperscaler (#135) (9aff6c1)
  • Update hyperscalers to support multiple image file types (#118) (a34f264)
  • Misc fixes (#131) (518e1ba)
  • CVAT to DoclingDoc: Ensure that nested list handling works across page boundaries (#129) (1b58377)
  • Important fixes for parquet serialization / deserialization, optimizations (#128) (53c22ef)
  • Fixes for the dataset visualizers (#127) (a127ea9)

Performance

  • Improve parquet writing with plain pyarrow (#134) (c08950b)

v0.5.0

11 Jun 15:48

Choose a tag to compare

Feature

  • Integrate OCR visualization (#121) (b39f2e7)
  • Add the segmentation layout evaluations in the consolidated excel report. Update mypy overrides. (#120) (c4e7de0)
  • Update OCREvaluator with additional metrics (#78) (17e9fde)

Fix

  • Add the bbox to TableData from annotations (#123) (c4fe51f)
  • Treat th and td as equal for TEDS calculation (#114) (dbf9db7)
  • Add support for Google, AWS, and Azure prediction providers in cli (#115) (e8e7421)

v0.4.0

28 May 09:22

Choose a tag to compare

Feature

  • Extend the FileProvider and the CLI to accept parameters that control the source of the prediction images (#111) (42e1615)
  • Improvements for the MultiEvaluator (#95) (04fe2d9)
  • Add extra args for docling-provider and default annotations for CVAT (#98) (7903b6a)
  • Introduce SegmentedPage for OCR (#91) (be0ff6a)
  • Update CVAT for multi-page annotation, utility to create sliced PDFs (#90) (28d166d)
  • Add area level f1 (#86) (54d013b)

Fix

  • Small fixes (#108) (0628fa6)
  • Layout text not correctly populated in AWS prediction provider, add tests (#100) (6441688)
  • Dataset feature spec fixes, cvat improvements (#97) (b79dd19)
  • Update boto3 AWS client to accept service credentials (#88) (4e01d0b)
  • Handle unsupported END2END evaluation and fix variable name in OCR (#87) (75311da)
  • Propagate cvat parameters (#82) (1e2040a)

Documentation

Release v0.3.0

22 Apr 13:59
be62102

Choose a tag to compare

What's Changed

  • feat: Update GoogleDocAIPredictionProvider to use service account creds by @samiuc in #73
  • fix: Add CLI option for FileDatasetBuilder by @cau-git in #76
  • feat: Consolidate multiple evaluation results and generate a comparison matrix by @nikos-livathinos in #64
  • feat: OCR evaluator by @cau-git @samiuc in #63

Full Changelog: v0.2.0...v0.3.0

Release v0.2.0

17 Apr 14:47
fb6c623

Choose a tag to compare

What's Changed

  • dev: Add DocVQA questions, more fixes by @cau-git in #58
  • docs: Add README for Docling-DPBench by @cau-git in #60
  • feat: Azure prediction provider by @cau-git in #50
  • fix: Ensure that evaluators skip data samples without the SUCCESS, PARTIAL_SUCCESS status by @nikos-livathinos in #66
  • feat: Support for S3 datasource by @praveenmidde in #65
  • feat: PixParse OCR dataset builder by @cau-git in #61
  • fix: Address table export_to_html deprecation by @cau-git in #67
  • feat: AWS Textract and Google DocAI Prediction providers by @cau-git in #62
  • feat: Refactor CVAT builder by @cau-git in #68
  • fix: Address missing conversion status (PENDING), add artifacts path, remove unused CLI args by @cau-git in #69
  • feat: FileDatasetBuilder by @cau-git in #70

Full Changelog: v0.1.0...v0.2.0