-
-
Notifications
You must be signed in to change notification settings - Fork 26
Migrate from PyPDF2 to pypdf and remove obsolete mobi_to_json test #88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
codeperfectplus
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for improving the audiobook.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR modernizes the PDF handling library by migrating from the deprecated PyPDF2 to its actively maintained successor pypdf. The migration updates the dependency, refactors all PDF-related code to use the new API, and cleans up an obsolete test case for removed mobi functionality.
Key changes:
- Updated dependency from PyPDF2 3.0.1 to pypdf 4.0.1 with corresponding API migrations (PdfFileReader → PdfReader, method name updates)
- Renamed
PyPDF2DocParserclass toPyPDFDocParserto reflect the new library name - Removed obsolete mobi_to_json test case that referenced a function no longer in the codebase
Reviewed changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| requirements.txt | Updated PDF library dependency from PyPDF2 3.0.1 to pypdf 4.0.1 |
| audiobook/doc_parser/pdf_parser.py | Migrated to pypdf API: updated imports, class name, and all method calls (PdfFileReader→PdfReader, numPages→len(pages), getPage→pages[], extractText→extract_text, getOutlines→outline) |
| audiobook/utils.py | Updated import statement to use PyPDFDocParser instead of PyPDF2DocParser |
| audiobook/main.py | Updated logger name from "PyPDF2" to "pypdf" to align with new library |
| tests/test_create_json_book.py | Renamed test from test_pdf_to_json_pypdf2 to test_pdf_to_json_pypdf; commented out obsolete mobi_to_json test |
| docs/command_line_usage.rst | Updated documentation to reference pypdf instead of pypdf2 in extraction engine table |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # def test_docs_to_json(self): | ||
| # self.assertEqual(ab.create_json_book("assets/sample.doc"), (output['docs'], {'book_name': 'sample', 'pages': 1})) |
Copilot
AI
Dec 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment appears to contain commented-out code.
| # def test_docs_to_json(self): | |
| # self.assertEqual(ab.create_json_book("assets/sample.doc"), (output['docs'], {'book_name': 'sample', 'pages': 1})) | |
| @unittest.skip("DOC to JSON test is currently disabled (e.g., due to missing support or failing test).") | |
| def test_docs_to_json(self): | |
| self.assertEqual(ab.create_json_book("assets/sample.doc"), (output['docs'], {'book_name': 'sample', 'pages': 1})) |
Co-authored-by: Copilot <[email protected]> Signed-off-by: Deepak Raj <[email protected]>
Pull Request Template
What have you Changed
Updated the code to use the modern PdfReader API.
Issue no.(must) - #87
Self Check(Tick After Making pull Request)
Join Us on Discord:- https://discord.gg/JfbK3bS