Skip to content

Preserve PDF metadata when splitting chapters#1

Merged
shankarpandala merged 1 commit intomainfrom
codex/find-and-fix-a-bug-in-important-code
Feb 28, 2026
Merged

Preserve PDF metadata when splitting chapters#1
shankarpandala merged 1 commit intomainfrom
codex/find-and-fix-a-bug-in-important-code

Conversation

@shankarpandala
Copy link
Owner

Motivation

  • Splitting PDF chapters called set_metadata() twice which unintentionally overwrote the original document metadata (e.g. author, producer) when applying a per-chapter title.
  • Add a regression test to ensure per-chapter titles are applied without losing existing metadata fields.

Description

  • Replace the double set_metadata() calls in src/pdf_splitter/splitter.py with a single merged metadata dictionary so existing metadata is preserved while setting the chapter title.
  • Construct a copy of the source metadata (dict(metadata)), set "title" to the chapter title, and pass that to new_doc.set_metadata().
  • Add a new unit test tests/test_splitter.py that mocks fitz documents and verifies the split document receives merged metadata containing title, author, and producer.

Testing

  • Ran PYTHONPATH=src pytest -q which executed the test suite including the new tests/test_splitter.py test.
  • Test results: collected 5 items and the run completed with 4 passed, 1 skipped (the full test suite passed with the new regression test).
  • The new test confirms the metadata merge behavior and the change prevents metadata loss when splitting.

Codex Task

@shankarpandala shankarpandala merged commit 78a4f6a into main Feb 28, 2026
16 of 17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant