Skip to content

feat: Allow subset to split routing in CVAT to HF exporter#182

Merged
cau-git merged 3 commits intomainfrom
cau/cvat-to-hf-exporter
Dec 5, 2025
Merged

feat: Allow subset to split routing in CVAT to HF exporter#182
cau-git merged 3 commits intomainfrom
cau/cvat-to-hf-exporter

Conversation

@cau-git
Copy link
Member

@cau-git cau-git commented Dec 3, 2025

Example to run the script:

uv run python -m docling_eval.campaign_tools.cvat_deliveries_to_hf \
  /path/to/submission_dirs/ \
  /path/to/huggingface_dataset_export/ \
  --export-kind ground_truth \
  --split default --subset-split pdf_val=validation --subset-split pdf_test=test --subset-split 'pdf_train_*'=train \
  --chunk-size 200 --datasets-root /path/to/base_cvat_dataset/

load originals from CVAT overviews, filling hashes, mime types, and streams for combined builds
add datasets-root/assets-dirname options plus submission allowlist support for discovery
warn or fail on missing assets/subsets to keep delivery export deterministic

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
@github-actions
Copy link
Contributor

github-actions bot commented Dec 3, 2025

DCO Check Passed

Thanks @cau-git, all your commits are properly signed off. 🎉

@mergify
Copy link

mergify bot commented Dec 3, 2025

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

Copy link
Member

@PeterStaar-IBM PeterStaar-IBM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please add some documentation. I have the feeling your comment will get lost.

@cau-git cau-git merged commit ebb8800 into main Dec 5, 2025
10 checks passed
@cau-git cau-git deleted the cau/cvat-to-hf-exporter branch December 5, 2025 10:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants