Skip to content

Latest commit

 

History

History
78 lines (51 loc) · 1.93 KB

File metadata and controls

78 lines (51 loc) · 1.93 KB

web2PDFbook

Python CI

Coverage

🧠 What it does

web2PDFbook crawls a website and compiles its pages into a single PDF. It is useful for archiving or offline reading.

⚙️ How to install

Install from PyPI:

pip install web2pdfbook

To test a pre-release from TestPyPI:

pip install -i https://test.pypi.org/simple web2pdfbook

If you cloned the repository and want to invoke web2pdfbook locally, install the dependencies first:

pip install -r requirements.txt

🔄 How it works

  1. Link crawlingcrawler.extract_links() retrieves all internal HTML links starting from the base URL.
  2. PDF renderingrenderer.render_to_pdf() uses Playwright to save each page as a PDF.
  3. Mergingmerger.merge_documents() merges the PDFs into a single document.

Generate a book via the CLI:

web2pdfbook --help
web2pdfbook https://example.com output.pdf --timeout 20000 --use-index
  • --timeout – render timeout in milliseconds.
  • --use-index – only crawl links from index pages.

✅ How to test

Install dependencies first:

pip install -r requirements.txt
python -m coverage run -m pytest -q
python -m coverage report

📦 How to release

Install packaging dependencies:

pip install -r dev-requirements.txt

Build and upload the distribution (defaults to TestPyPI):

./release/publish.sh

This script runs python -m build and uploads with twine. Set REPOSITORY_URL to publish elsewhere.

The repository contains a .pypirc template with placeholder credentials for TestPyPI and PyPI. Fill in your tokens (or copy it to ~/.pypirc) so twine can authenticate during the upload.