Skip to content

fix: normalize hyphens and spaces in layout label mapping for Egret models#3118

Closed
majiayu000 wants to merge 1 commit intodocling-project:mainfrom
majiayu000:fix/issue-3053-egret-label-hyphen
Closed

fix: normalize hyphens and spaces in layout label mapping for Egret models#3118
majiayu000 wants to merge 1 commit intodocling-project:mainfrom
majiayu000:fix/issue-3053-egret-label-hyphen

Conversation

@majiayu000
Copy link
Contributor

Summary

Fixes #3053

All Egret layout models (DOCLING_LAYOUT_EGRET_MEDIUM/LARGE/XLARGE) fail on init because _build_label_map() only calls .upper() on HuggingFace label names. Egret configs use hyphenated labels (e.g. List-item, Page-footer, Document Index) which don't match the DocItemLabel enum that uses underscores (LIST_ITEM, PAGE_FOOTER, DOCUMENT_INDEX).

Changes

  • Added .replace("-", "_").replace(" ", "_") after .upper() in _build_label_map() to normalize all separator styles
  • Added tests/test_layout_label_map.py with tests for hyphenated, space-separated, underscore, and invalid labels

Test plan

  • pre-commit run --all-files passes (Ruff + MyPy)
  • 4 new unit tests covering all label separator styles pass
  • Verified the exact Egret label names from the issue (List-item, Page-footer, Document Index, etc.) map correctly to DocItemLabel enums

…odels

Fixes docling-project#3053

Signed-off-by: majiayu000 <1835304752@qq.com>
@github-actions
Copy link
Contributor

DCO Check Passed

Thanks @majiayu000, all your commits are properly signed off. 🎉

@mergify
Copy link

mergify bot commented Mar 13, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

@codecov
Copy link

codecov bot commented Mar 13, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@majiayu000 majiayu000 marked this pull request as ready for review March 13, 2026 12:04
@PeterStaar-IBM PeterStaar-IBM requested a review from cau-git March 13, 2026 12:06
@cau-git
Copy link
Member

cau-git commented Mar 13, 2026

@majiayu000 thanks for the PR, but we will fix this at the source by updating the EGRET family models on hugging-face with the proper id2label config. Please see also #3053

@cau-git cau-git closed this Mar 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Egret layout models fail with RuntimeError: label hyphen/underscore mismatch in _build_label_map

2 participants