Skip to content

nomenarkt/web2PDFbook

Repository files navigation

web2PDFbook

Python CI

Coverage

🧠 What it does

web2PDFbook crawls a website and compiles its pages into a single PDF. It is useful for archiving or offline reading.

⚙️ How to install

Install from PyPI:

pip install web2pdfbook

To test a pre-release from TestPyPI:

pip install -i https://test.pypi.org/simple web2pdfbook

If you cloned the repository and want to invoke web2pdfbook locally, install the dependencies first:

pip install -r requirements.txt

🔄 How it works

  1. Link crawlingcrawler.extract_links() retrieves all internal HTML links starting from the base URL.
  2. PDF renderingrenderer.render_to_pdf() uses Playwright to save each page as a PDF.
  3. Mergingmerger.merge_documents() merges the PDFs into a single document.

Generate a book via the CLI:

web2pdfbook --help
web2pdfbook https://example.com output.pdf --timeout 20000 --use-index
  • --timeout – render timeout in milliseconds.
  • --use-index – only crawl links from index pages.

✅ How to test

Install dependencies first:

pip install -r requirements.txt
python -m coverage run -m pytest -q
python -m coverage report

📦 How to release

Install packaging dependencies:

pip install -r dev-requirements.txt

Build and upload the distribution (defaults to TestPyPI):

./release/publish.sh

This script runs python -m build and uploads with twine. Set REPOSITORY_URL to publish elsewhere.

The repository contains a .pypirc template with placeholder credentials for TestPyPI and PyPI. Fill in your tokens (or copy it to ~/.pypirc) so twine can authenticate during the upload.

About

Web2PDFbook is a tool that allows users to convert web pages or entire websites into well-formatted PDF books. It streamlines online content archiving, making it easy to save, share, and read web content offline in a book-like PDF format.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages