Skip to content

Commit d0a49da

Browse files
fix: HTML serialization for single image documents (#261)
* fix for HTML serialization for single image documents Signed-off-by: Peter Staar <[email protected]> * minor refactor, add test Signed-off-by: Panos Vagenas <[email protected]> --------- Signed-off-by: Peter Staar <[email protected]> Signed-off-by: Panos Vagenas <[email protected]> Co-authored-by: Panos Vagenas <[email protected]> Co-authored-by: Panos Vagenas <[email protected]>
1 parent 1af0721 commit d0a49da

File tree

6 files changed

+619
-3
lines changed

6 files changed

+619
-3
lines changed

docling_core/experimental/serializer/html.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -802,6 +802,8 @@ def serialize_doc(
802802
]
803803

804804
if self.params.output_style == HTMLOutputStyle.SPLIT_PAGE:
805+
applicable_pages = self._get_applicable_pages()
806+
805807
html_content = "\n".join([p.text for p in parts if p.text])
806808
next_page: Optional[int] = None
807809
prev_full_match_end = 0
@@ -814,11 +816,12 @@ def serialize_doc(
814816
# capture last page
815817
if next_page is not None:
816818
pages[next_page] = html_content[prev_full_match_end:]
819+
elif applicable_pages is not None and len(applicable_pages) == 1:
820+
pages[applicable_pages[0]] = html_content
817821

818822
html_parts.append("<table>")
819823
html_parts.append("<tbody>")
820824

821-
applicable_pages = self._get_applicable_pages()
822825
for page_no, page in pages.items():
823826

824827
if isinstance(page_no, int):

0 commit comments

Comments
 (0)