Skip to content

Conversation

@tibisabau
Copy link

@tibisabau tibisabau commented Dec 8, 2025

Summary

This enhancement introduces consistent page-range filtering across three CLI subcommands: extract-images, extract-annotated-pages, and extract-text. Each now supports two new optional arguments, --from and --end, enabling selective processing of only a portion of the PDF rather than the entire document.

The Code was generated by GitHub Copilot. (GPT5)

e.g. Closes #194

Checklist:

  • A unit test is covering the code added / modified by this PR

  • In case of a new feature, docstrings have been added, with also some documentation in the docs/ folder

  • A mention of the change is present in CHANGELOG.md

  • This PR is ready to be merged

By submitting this pull request, I confirm that my contribution is made under the terms of the BSD 3-Clause license.

…s, and extract-text

This enhancement introduces consistent page-range filtering across three
CLI subcommands: extract-images, extract-annotated-pages, and
extract-text. Each now supports two new optional arguments,
--from-page and --to-page, enabling selective processing of only a
portion of the PDF rather than the entire document.
Copilot AI review requested due to automatic review settings December 8, 2025 13:54
@tibisabau tibisabau changed the title Draft: ENH: Add page-range support to extract-images, extract-annotated-pages, and extract-text ENH: Add page-range support to extract-images, extract-annotated-pages, and extract-text Dec 8, 2025
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces page-range filtering functionality to three CLI subcommands (extract-images, extract-annotated-pages, and extract-text) by adding optional --from and --end parameters. These parameters enable users to selectively process specific portions of a PDF document rather than the entire file, using 0-based inclusive indexing.

Key Changes:

  • Added start and end optional parameters to all three extraction commands
  • Implemented filtering logic in each command's main function
  • Added comprehensive test coverage for the new range functionality
  • Updated documentation with usage examples for the new parameters

Reviewed changes

Copilot reviewed 10 out of 11 changed files in this pull request and generated 15 comments.

Show a summary per file
File Description
pdfly/extract_images.py Added range filtering for image extraction using global image index
pdfly/extract_annotated_pages.py Added range filtering for page extraction based on page index
pdfly/cli.py Added --from and --end options to extract-images, extract-annotated-pages, and extract-text commands
tests/test_extract_images.py Added tests for single-image and multi-image range extraction
tests/test_extract_annotated_pages.py Added tests for page range filtering with annotations
tests/test_cli.py Added tests for extract-text command with range parameters
docs/user/subcommand-extract-text.md Updated documentation with range parameter usage examples
docs/user/subcommand-extract-images.md Updated documentation with range parameter usage examples
docs/user/subcommand-extract-annotated-pages.md Updated documentation with range parameter usage examples
resources/file-with-invalid-offsets.pdf File appears modified but change seems unrelated to this PR

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ENH: Add command to extract image from specific page(s)

1 participant