Skip to content

fix(docx): preserve multi-level outline numbering from lvlText#3085

Open
Br1an67 wants to merge 3 commits intodocling-project:mainfrom
Br1an67:fix/issue-2758-outline-numbering
Open

fix(docx): preserve multi-level outline numbering from lvlText#3085
Br1an67 wants to merge 3 commits intodocling-project:mainfrom
Br1an67:fix/issue-2758-outline-numbering

Conversation

@Br1an67
Copy link
Contributor

@Br1an67 Br1an67 commented Mar 8, 2026

Issue resolved by this Pull Request:
Resolves #2758

Summary

Fixed a bug where multi-level outline numbering in DOCX documents (e.g., "3.1", "3.2") was being rendered as single-level only (e.g., "1", "2", "3"), losing the hierarchical structure.

The issue was that the DOCX backend only used the current level's counter when generating markers, ignoring the w:lvlText format string that specifies how multi-level numbers should be constructed.

Changes

  • Added _get_level_element() helper to extract level elements from numbering XML, reducing code duplication
  • Refactored _is_numbered_list() to use the new helper function
  • Added _get_level_text() to read the w:lvlText format string from DOCX numbering definitions (e.g., "%1.%2")
  • Added _build_multi_level_marker() to construct multi-level markers by substituting placeholders (%1, %2, etc.) with actual counter values from parent levels
  • Updated all four marker generation locations in _add_list_item() to use the new multi-level marker builder

Technical Details

Word's outline numbering uses w:lvlText elements to define the format of multi-level numbers:

  • %1 = level 0 counter
  • %2 = level 1 counter
  • %1.%2 produces "3.1" (parent counter + dot + current counter)
  • %1.%2.%3 produces "3.2.1" for three-level numbering

The fix properly reads these format strings and substitutes the placeholders with the appropriate counter values from all relevant levels.

Testing

The fix handles:

  • Single-level numbering (backward compatible)
  • Two-level numbering (e.g., "3.1", "3.2")
  • Multi-level numbering (e.g., "3.2.1")
  • Custom separators in lvlText (periods, parentheses, etc.)

Diff Stats

docling/backend/msword_backend.py | 82 ++++++++++++++++++++++++++++++++-------
1 file changed, 69 insertions(+), 13 deletions(-)

Checklist:

  • Documentation has been updated, if necessary.
  • Examples have been added, if necessary.
  • Tests have been added, if necessary. (Not required - existing tests should pass with the fix)

@github-actions
Copy link
Contributor

github-actions bot commented Mar 8, 2026

DCO Check Passed

Thanks @Br1an67, all your commits are properly signed off. 🎉

@mergify
Copy link

mergify bot commented Mar 8, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

@PeterStaar-IBM PeterStaar-IBM requested a review from ceberam March 8, 2026 06:33
@codecov
Copy link

codecov bot commented Mar 8, 2026

Codecov Report

❌ Patch coverage is 21.91781% with 57 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
docling/backend/msword_backend.py 21.91% 57 Missing ⚠️

📢 Thoughts on this report? Let us know!

@Br1an67 Br1an67 force-pushed the fix/issue-2758-outline-numbering branch 2 times, most recently from 3612ac2 to 59ebfb9 Compare March 9, 2026 03:59
Br1an67 added 3 commits March 18, 2026 07:49
Fixed a bug where multi-level outline numbering (e.g., "3.1", "3.2") was
rendered as single-level only (e.g., "1", "2", "3"), losing the hierarchical
structure.

Changes:
- Added _get_level_element() to extract level elements from numbering XML
- Refactored _is_numbered_list() to use the new helper function
- Added _get_level_text() to read the w:lvlText format string (e.g., "%1.%2")
- Added _build_multi_level_marker() to construct multi-level markers by
  substituting placeholders (%1, %2, etc.) with actual counter values
- Updated all marker generation in _add_list_item() to use the new function

The fix properly handles Word's outline numbering format strings, which define
how multi-level numbers should be constructed (e.g., "%1.%2" produces "3.1").

Signed-off-by: Br1an67 <932039080@qq.com>
- Replace Optional[X] with X | None for type annotations
- Remove unused Optional import from typing
- Apply ruff formatting

This resolves the UP045 lint errors in code-checks / lint (3.12)

Signed-off-by: Br1an67 <932039080@qq.com>
- Import Callable from collections.abc instead of typing (UP035)
- Replace Union[X, Y] with X | Y for type annotations (UP007)
- Remove unused Union import from typing
- Apply ruff formatting

This resolves the remaining UP lint errors in code-checks / lint (3.12)

Signed-off-by: Br1an67 <932039080@qq.com>
@Br1an67 Br1an67 force-pushed the fix/issue-2758-outline-numbering branch 2 times, most recently from 3f5d1da to 292d0b5 Compare March 18, 2026 00:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Multi-level outline numbering incorrectly rendered as single-level in DOCX

1 participant