Add page_indexes parameter to transform functions#325
Add page_indexes parameter to transform functions#325HuSharp wants to merge 2 commits intooomol-lab:mainfrom
Conversation
Signed-off-by: Jinhao Hu <jihu00003@stud.uni-saarland.de>
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings. WalkthroughThis change adds an optional Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 1 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. 📜 Recent review detailsConfiguration used: Organization UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pdf_craft/transform.py (1)
129-152: Critical:page_indexesparameter not forwarded to_extract_from_pdf.The
page_indexesparameter is declared in thetransform_epubsignature but is never passed to the_extract_from_pdfcall at lines 136-152. This means EPUB transformations will ignore thepage_indexesparameter entirely, causing silent malfunction for users who expect selective page processing.🐛 Proposed fix: Forward page_indexes to _extract_from_pdf
asserts_path, chapters_path, toc_path, cover_path, metering = self._extract_from_pdf( pdf_path=pdf_path, analysing_path=analysing_path, ocr_size=ocr_size, dpi=dpi, max_page_image_file_size=max_page_image_file_size, includes_cover=includes_cover, includes_footnotes=includes_footnotes, ignore_pdf_errors=ignore_pdf_errors, ignore_ocr_errors=ignore_ocr_errors, generate_plot=generate_plot, toc_assumed=toc_assumed, aborted=aborted, max_tokens=max_ocr_tokens, max_output_tokens=max_ocr_output_tokens, on_ocr_event=on_ocr_event, + page_indexes=page_indexes, )
🧹 Nitpick comments (1)
pdf_craft/transform.py (1)
215-234: LGTM: Clean conditional parameter forwarding pattern.The
recognize_kwargsdictionary construction with conditional inclusion ofpage_indexesis well-implemented and correctly forwards parameters to the OCR layer.Consider whether restricting the parameter type to
range | Noneis intentional. The underlyingOCR.recognizemethod acceptsContainer[int], which would allow users to passlist[int],set[int], or other containers for more flexible page selection (e.g., non-contiguous pages like[1, 5, 10, 15]). If the API intentionally restricts to ranges for simplicity, this is fine; otherwise, broadening the type might improve flexibility.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
pdf_craft/functions.pypdf_craft/transform.py
🧰 Additional context used
🧬 Code graph analysis (1)
pdf_craft/transform.py (1)
pdf_craft/pdf/ocr.py (1)
recognize(66-211)
🪛 Ruff (0.14.10)
pdf_craft/transform.py
129-129: Unused method argument: page_indexes
(ARG002)
🔇 Additional comments (4)
pdf_craft/transform.py (2)
58-58: LGTM: Parameter addition to transform_markdown.The
page_indexesparameter is properly added and correctly forwarded to_extract_from_pdfat line 85.
196-196: LGTM: Parameter addition to _extract_from_pdf.The parameter is properly integrated and correctly used in the
recognize_kwargsconstruction.pdf_craft/functions.py (2)
46-71: LGTM: Parameter addition and forwarding in transform_markdown.The
page_indexesparameter is correctly added to the public API and properly forwarded to the Transform layer.
100-129: LGTM: Parameter addition and forwarding in transform_epub.The parameter forwarding is correct at this layer. However, note that there's a critical bug downstream in
Transform.transform_epub(flagged in the transform.py review) where the parameter is not forwarded to_extract_from_pdf.
Signed-off-by: Jinhao Hu <jihu00003@stud.uni-saarland.de>
|
@coderabbitai help |
|
Tip For best results, initiate chat on the files or code changes. @HuSharp, hello! How can I assist you with this PR? I can help with:
Just let me know what you need! |
ChatThere are 3 ways to chat with CodeRabbit:
CodeRabbit commands
Other keywords and placeholders
CodeRabbit configuration file (
|
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
Usage Example