Skip to content

Commit 26c1a10

Browse files
committed
Merge branch 'release/2.3.1'
2 parents 53a5e8d + 499b937 commit 26c1a10

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

chunking/chunkers/doc_analysis_chunker.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -243,13 +243,13 @@ def _analyze_split_pdf(self, source_path: str, retries=3):
243243

244244
if doc and doc.get("content"):
245245
markdown = doc["content"]
246+
# Count actual pages in this part BEFORE renumbering adds synthetic markers.
247+
page_breaks_in_part = markdown.count("<!-- PageBreak -->")
246248
# Offset page markers so downstream numbering is absolute.
247249
markdown = renumber_page_markers(markdown, page_offset)
248250
combined_content_parts.append(markdown)
249251

250-
# Count actual pages in this part for next offset.
251-
page_breaks = markdown.count("<!-- PageBreak -->")
252-
page_offset += page_breaks + 1 # pages = breaks + 1
252+
page_offset += page_breaks_in_part + 1 # pages = breaks + 1
253253

254254
# Delete part temp file right away to free disk.
255255
if not is_original:

0 commit comments

Comments
 (0)