Skip to content

feat: ingest CVAT assets and filter submissions#180

Merged
cau-git merged 1 commit intomainfrom
cau/cvat-to-hf-exporter
Dec 3, 2025
Merged

feat: ingest CVAT assets and filter submissions#180
cau-git merged 1 commit intomainfrom
cau/cvat-to-hf-exporter

Conversation

@cau-git
Copy link
Member

@cau-git cau-git commented Dec 3, 2025

When creating a HuggingFace dataset dump from CVAT-generated DoclingDocuments,

  • load original PDF or images from CVAT overviews, filling hashes, mime types, and streams
  • add datasets-root/assets-dirname options plus submission allowlist support for discovery
  • warn or fail on missing assets/subsets

load originals from CVAT overviews, filling hashes, mime types, and streams for combined builds
add datasets-root/assets-dirname options plus submission allowlist support for discovery
warn or fail on missing assets/subsets to keep delivery export deterministic

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
@cau-git cau-git requested a review from maxmnemonic December 3, 2025 08:49
@github-actions
Copy link
Contributor

github-actions bot commented Dec 3, 2025

DCO Check Passed

Thanks @cau-git, all your commits are properly signed off. 🎉

@mergify
Copy link

mergify bot commented Dec 3, 2025

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

@cau-git cau-git merged commit b55b2ea into main Dec 3, 2025
10 checks passed
@cau-git cau-git deleted the cau/cvat-to-hf-exporter branch December 3, 2025 09:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants