web2PDFbook crawls a website and compiles its pages into a single PDF. It is useful for archiving or offline reading.
Install from PyPI:
pip install web2pdfbookTo test a pre-release from TestPyPI:
pip install -i https://test.pypi.org/simple web2pdfbookIf you cloned the repository and want to invoke web2pdfbook locally, install the dependencies first:
pip install -r requirements.txt- Link crawling –
crawler.extract_links()retrieves all internal HTML links starting from the base URL. - PDF rendering –
renderer.render_to_pdf()uses Playwright to save each page as a PDF. - Merging –
merger.merge_documents()merges the PDFs into a single document.
Generate a book via the CLI:
web2pdfbook --help
web2pdfbook https://example.com output.pdf --timeout 20000 --use-index--timeout– render timeout in milliseconds.--use-index– only crawl links from index pages.
Install dependencies first:
pip install -r requirements.txtpython -m coverage run -m pytest -q
python -m coverage reportInstall packaging dependencies:
pip install -r dev-requirements.txtBuild and upload the distribution (defaults to TestPyPI):
./release/publish.shThis script runs python -m build and uploads with twine. Set REPOSITORY_URL to publish elsewhere.
The repository contains a .pypirc template with placeholder credentials for
TestPyPI and PyPI. Fill in your tokens (or copy it to ~/.pypirc) so twine
can authenticate during the upload.