Preserve PDF metadata when splitting chapters by shankarpandala · Pull Request #1 · shankarpandala/lazy-splitter

shankarpandala · 2026-02-28T11:42:03Z

Splitting PDF chapters called set_metadata() twice which unintentionally overwrote the original document metadata (e.g. author, producer) when applying a per-chapter title.
Add a regression test to ensure per-chapter titles are applied without losing existing metadata fields.

Replace the double set_metadata() calls in src/pdf_splitter/splitter.py with a single merged metadata dictionary so existing metadata is preserved while setting the chapter title.
Construct a copy of the source metadata (dict(metadata)), set "title" to the chapter title, and pass that to new_doc.set_metadata().
Add a new unit test tests/test_splitter.py that mocks fitz documents and verifies the split document receives merged metadata containing title, author, and producer.

Ran PYTHONPATH=src pytest -q which executed the test suite including the new tests/test_splitter.py test.
Test results: collected 5 items and the run completed with 4 passed, 1 skipped (the full test suite passed with the new regression test).
The new test confirms the metadata merge behavior and the change prevents metadata loss when splitting.

Fix PDF split metadata overwrite bug

b38841d

shankarpandala added the codex label Feb 28, 2026 — with ChatGPT Codex Connector

shankarpandala merged commit 78a4f6a into main Feb 28, 2026
16 of 17 checks passed

Provide feedback