Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
e5d8d06
create pypi package
kumaF Oct 8, 2020
304dfde
update readme
kumaF Oct 8, 2020
ef32c85
update readme
kumaF Oct 8, 2020
611591a
file compression added
kumaF Oct 24, 2020
1ffbe2c
pyhtml2pdf v0.0.2
kumaF Oct 24, 2020
e234d46
update readme.md
kumaF Oct 24, 2020
912bb0d
update install chrome driver
kumaF Oct 28, 2020
9dc4837
Merge branch 'dev'
kumaF Oct 28, 2020
e329803
fix selenium tag selector issue
kumaF Jun 24, 2022
d4b7244
Merge branch 'dev'
kumaF Jun 24, 2022
b5db357
update version
kumaF Jun 24, 2022
fb3a2d8
fix version with dependecies
kumaF Jun 24, 2022
1791478
Merge branch 'dev'
kumaF Jun 24, 2022
fa5723f
Added print_options to convert method
Klius Aug 27, 2022
2c5eb23
Merge pull request #1 from Klius/master
kumaF Oct 20, 2022
acbe6d8
update version
kumaF Nov 28, 2022
0e8098d
update readme
kumaF Nov 28, 2022
ed8d6f1
update readme
kumaF Nov 28, 2022
9b2e66b
Document install_driver parameter to convert(), which was previously …
Jun 9, 2023
a5b58b5
Update for compatibility with Selenium 4.10 which removed some deprec…
Jun 9, 2023
a7a9140
Merge pull request #2 from macnewbold/master
kumaF Aug 27, 2023
82722ca
Create python-publish.yml
kumaF Aug 27, 2023
af3a7c0
Compatibility with Windows
KarelKenens Nov 17, 2023
455fe0b
Change version
KarelKenens Nov 17, 2023
e3d44cc
Changed naming from "__private" convention to "_internal" convention,…
KarelKenens Nov 17, 2023
cc28a5b
Add support for BytesIO object and html content
m-abdi Mar 9, 2024
1e22299
Release driver in all situations
m-abdi Mar 9, 2024
91ab167
Merge pull request #3 from KarelKenens/windows-bugfix
kumaF Dec 10, 2024
4752b8d
Merge branch 'master' into master
kumaF Jan 2, 2025
0aded26
Merge pull request #4 from m-abdi/master
kumaF Jan 2, 2025
d0932c9
Fix merge conflict
inganault Apr 27, 2025
fabc26b
Fix selenium internal API changes
inganault Apr 27, 2025
ed647d4
Merge pull request #5 from inganault/fix-new-selenium
kumaF Dec 20, 2025
1111fd5
update workflow configs
kumaF Dec 20, 2025
a2e8c0f
add new version
kumaF Dec 20, 2025
ef8592d
add more validations to ensure the pdf file type
kumaF Dec 20, 2025
6d0ccae
code refactor
kumaF Dec 20, 2025
564ec85
fix pdf related sec issues
kumaF Dec 20, 2025
758158a
add security.md
kumaF Dec 20, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 39 additions & 0 deletions .github/workflows/python-publish.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# This workflow will upload a Python Package using Twine when a release is created
# For more information see: https://docs.github.com/en/actions/automating-builds-and-tests/building-and-testing-python#publishing-to-package-registries

# This workflow uses actions that are not certified by GitHub.
# They are provided by a third-party and are governed by
# separate terms of service, privacy policy, and support
# documentation.

name: Upload Python Package

on:
release:
types: [published]

permissions:
contents: read

jobs:
deploy:

runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v3
with:
python-version: '3.10'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install build
- name: Build package
run: python -m build
- name: Publish package
uses: pypa/gh-action-pypi-publish@27b31702a0e7fc50959f5ad993c78deac1bdfc29
with:
user: __token__
password: ${{ secrets.PYPI_API_TOKEN }}
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -126,3 +126,5 @@ dmypy.json
*.pdf
*.html
chromedriver
requirements.txt
main.py
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
MIT License

Copyright (c) 2019 Maksim
Copyright (c) 2020

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
104 changes: 59 additions & 45 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,66 +1,80 @@
# python-selenium-chrome-html-to-pdf-converter
# pyhtml2pdf
Simple python wrapper to convert HTML to PDF with headless Chrome via selenium.

## Installation
Clone repository, move to project root dir, install virtualenv, install dependencies:
## Install
```
git clone https://github.com/maxvst/python-selenium-chrome-html-to-pdf-converter.git
cd python-selenium-chrome-html-to-pdf-converter
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
pip install pyhtml2pdf
```
Install chrome (chromium) browser.

Download chromedriver from http://chromedriver.chromium.org/ and put it to project root directory.
## Dependencies

- [Selenium Chrome Webdriver](https://chromedriver.chromium.org/downloads) (If Chrome is installed on the machine you won't need to install the chrome driver)
- [Ghostscript](https://www.ghostscript.com/download.html)

## Example

### **Convert to PDF**

**Use with website url**

## Demo
```
cd examples
python converter.py https://google.com google.pdf
from pyhtml2pdf import converter

converter.convert('https://pypi.org', 'sample.pdf')
```

## Why use selenium?
TODO: Add description
**Use with html file from local machine**

## CSS recomendations
```
import os
from pyhtml2pdf import converter

Basic configuration for single page:
path = os.path.abspath('index.html')
converter.convert(f'file:///{path}', 'sample.pdf')
```
@page {
size: A4;
margin: 0mm;
}

**Some JS objects may have animations or take a some time to render. You can set a time out in order to help render those objects. You can set timeout in seconds**

```
converter.convert(source, target, timeout=2)
```

For printing double-sided documents use
**Compress the converted PDF**

Some PDFs may be oversized. So there is a built in PDF compression feature.

The power of the compression,
- 0: default
- 1: prepress
- 2: printer
- 3: ebook
- 4: screen

```
@page :left {
margin-left: 4cm;
margin-right: 2cm;
}

@page :right {
margin-left: 4cm;
margin-right: 2cm;
}

@page :first {
margin-top: 10cm /* Top margin on first page 10cm */
}
converter.convert(source, target, compress=True, power=0)
```

Control pagination with page-break-before, page-break-after, page-break-inside like
### **Pass Print Options**

You can use print options mentioned [here](https://vanilla.aslushnikov.com/?Page.printToPDF)

```
h1 { page-break-before : right }
h2 { page-break-after : avoid }
table { page-break-inside : avoid }
converter.convert( f"file:///{path}", f"sample.pdf", print_options={"scale": 0.95} )
```
Control widows and оrphans like

### **Compress PDF**

**Use it to compress a PDF file from local machine**

```
@page {
orphans:4;
widows:2;
}
import os
from pyhtml2pdf import compressor

compressor.compress('sample.pdf', 'compressed_sample.pdf')
```
More descriptions see at https://www.tutorialspoint.com/css/css_paged_media.htm

Inspired the works from,

- https://github.com/maxvst/python-selenium-chrome-html-to-pdf-converter.git
- https://github.com/theeko74/pdfc

51 changes: 51 additions & 0 deletions SECURITY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# Security Policy

## Supported Versions

We support security fixes for the latest released version and the `master` branch.

| Version | Supported |
| ------- | --------- |
| Latest | ✅ |
| Older | ❌ |

## Reporting a Vulnerability

If you believe you’ve found a security vulnerability, **please do not open a public GitHub issue**.

Instead, report it privately using one of the following:

### Preferred: GitHub Private Vulnerability Reporting
- Go to: **Security** → **Advisories** → **Report a vulnerability**
- Provide as much detail as possible (see “What to include” below).

### Alternative: Email
- Email: **[email protected]**

## What to Include

Please include:
- A clear description of the issue and potential impact
- Steps to reproduce (proof-of-concept if available)
- Affected versions/branches
- Any suggested fix or mitigation (if you have one)

## Response Timeline

We aim to:
- Acknowledge receipt within **3 business days**
- Provide a status update within **7 business days**
- Release a fix as soon as practical based on severity and complexity

## Coordinated Disclosure

We follow coordinated disclosure practices. Please allow reasonable time to investigate and remediate before any public disclosure.

## Security Updates

Security fixes may be released as:
- Patch releases
- Advisory notes (GitHub Security Advisory)
- Changelog entries (when appropriate)

Thank you for helping keep this project and its users safe.
12 changes: 0 additions & 12 deletions examples/converter.py

This file was deleted.

Empty file added pyhtml2pdf/__init__.py
Empty file.
131 changes: 131 additions & 0 deletions pyhtml2pdf/compressor.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
import logging
import os
import platform
import subprocess
from pathlib import Path
from tempfile import NamedTemporaryFile, _TemporaryFileWrapper
from typing import Literal, Union

from .utils import _pdf_has_suspicious_content

MAX_BYTES = 25 * 1024 * 1024

logger = logging.getLogger(__name__)


def compress(
source: str | os.PathLike | _TemporaryFileWrapper,
target: str | os.PathLike,
power: int = 0,
ghostscript_command: Union[Literal["gs", "gswin64c", "gswin32c"], None] = None,
max_pdf_size: int = MAX_BYTES,
timeout: int = 10,
force_process: bool = False,
) -> None:
"""

:param source: Source PDF file
:param target: Target location to save the compressed PDF
:param power: Power of the compression. Default value is 0. This can be
0: default,
1: prepress,
2: printer,
3: ebook,
4: screen
:param ghostscript_command: The name of the ghostscript executable. If set to the default value None, is attempted
to be inferred from the OS.
If the OS is not Windows, "gs" is used as executable name.
If the OS is Windows, and it is a 64-bit version, "gswin64c" is used. If it is a 32-bit
version, "gswin32c" is used.
:param max_pdf_size: Maximum allowed size for the PDF in bytes. Default is 25 MB.
:param timeout: Timeout in seconds
:param force_process: Whether to process even if suspicious content is found (Be extra careful with this setting).
"""
quality = {0: "/default", 1: "/prepress", 2: "/printer", 3: "/ebook", 4: "/screen"}

if ghostscript_command is None:
if platform.system() == "Windows":
if platform.machine().endswith("64"):
ghostscript_command = "gswin64c"
else:
ghostscript_command = "gswin32c"
else:
ghostscript_command = "gs"

if isinstance(source, _TemporaryFileWrapper):
source = source.name

source = Path(source)
target = Path(target)

if not source.is_file():
raise FileNotFoundError("Source file does not exist")

if source.suffix.lower() != ".pdf":
raise ValueError("Source file is not a PDF")

issues = _pdf_has_suspicious_content(source, max_pdf_size)

if issues:
logger.warning(
"Warning: The PDF file has been flagged for suspicious content.\n\n- %s\n\nProcessing has been skipped to avoid potential security risks.\n\n"
"If you believe this is an error, you can set force_process=True to override this behavior. Proceed with caution!\n",
"\n- ".join(issues),
)

if not force_process:
logger.error(
"PDF file flagged for suspicious content. Process aborted.\n\n"
)
raise RuntimeError(
"PDF file flagged for suspicious content. Process aborted."
)

try:
subprocess.call(
[
ghostscript_command,
"-dSAFER",
"-sDEVICE=pdfwrite",
"-dCompatibilityLevel=1.4",
"-dPDFSETTINGS={}".format(quality[power]),
"-dNOPAUSE",
"-dQUIET",
"-dBATCH",
"-sOutputFile={}".format(target.as_posix()),
source.as_posix(),
],
shell=platform.system() == "Windows",
timeout=timeout,
)
except subprocess.TimeoutExpired:
logger.error(
"PDF processing took too long (DoS protection triggered). If you believe this is an error, try increasing the timeout parameter."
)

raise TimeoutError


def _compress(
result: bytes,
target: str | os.PathLike,
power: int,
timeout: int,
ghostscript_command: Union[Literal["gs", "gswin64c", "gswin32c"], None] = None,
):
with NamedTemporaryFile(
suffix=".pdf", delete=platform.system() != "Windows"
) as tmp_file:
tmp_file.write(result)

# Ensure minimum timeout of 20 seconds for compression when call from converter.py
_timeout: int = max(timeout, 20)

compress(
source=tmp_file,
target=target,
power=power,
ghostscript_command=ghostscript_command,
max_pdf_size=Path(tmp_file.name).stat().st_size + 1_000_000,
timeout=_timeout,
)
Loading