Convert any raw HTML string into a clean, printable A4 PDF with a single run. This tool streamlines HTML to PDF generation for invoices, reports, emails, and dynamic templates without manual browser actions. It’s designed for developers and automation workflows that need reliable, repeatable HTML to PDF conversion at scale.
Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for html-string-to-pdf you've just found your team — Let’s Chat. 👆👆
The HTML String to PDF Scraper takes an HTML string as input and transforms it into a high-quality A4 PDF file. Instead of rendering pages manually in a browser or fighting with OS-dependent print settings, you can generate PDFs programmatically in a controlled environment.
This project solves the common pain point of generating consistent PDFs from HTML templates across different machines and environments. It’s ideal for backend services, automation pipelines, and batch jobs that need predictable PDF output.
It’s built for:
- Developers who generate PDFs from HTML templates.
- Teams automating invoices, statements, or reports.
- Integrations that need to attach or store PDFs generated from dynamic HTML.
- Accepts a full HTML string as input (inline, from file, or JSON).
- Renders HTML in a headless browser using a modern rendering engine.
- Outputs a standardized A4-sized PDF with configurable margins and options.
- Stores the resulting PDF file and exposes its path and metadata.
- Provides structured logging and error reporting for failed conversions.
| Feature | Description |
|---|---|
| Single HTML string input | Pass a complete HTML string and get a fully rendered A4 PDF without additional setup. |
| A4 page size by default | Outputs PDFs in standard A4 format for easy printing and document sharing. |
| Headless browser rendering | Uses a headless browser engine (via Puppeteer) to accurately render modern HTML, CSS, and fonts. |
| Configurable PDF options | Adjust margins, orientation, print background, and other PDF options through configuration. |
| Robust error handling | Validates input and reports detailed errors for invalid HTML or rendering failures. |
| File-based output | Stores the generated PDF file on disk and returns file metadata for downstream use. |
| Field Name | Field Description |
|---|---|
| inputHtml | The original HTML string used for PDF generation. |
| pdfPath | Absolute or relative file path to the generated PDF file. |
| pdfUrl | Public or internal URL where the generated PDF can be accessed or downloaded. |
| fileName | Name of the generated PDF file, including extension. |
| fileSizeBytes | Size of the PDF file in bytes for storage and bandwidth planning. |
| pageCount | Number of pages generated in the PDF (typically 1+ for A4 documents). |
| createdAt | Timestamp when the PDF was generated. |
| metadata | Optional object holding extra information like orientation, margins, and print options. |
| status | Status of the conversion process (e.g., "success", "failed"). |
| errorMessage | Error description if PDF generation fails. |
[
{
"inputHtml": "<html><head><title>Invoice</title></head><body><h1>Order #1234</h1><p>Thank you for your purchase.</p></body></html>",
"pdfPath": "output/invoices/order-1234.pdf",
"pdfUrl": "https://example.com/files/order-1234.pdf",
"fileName": "order-1234.pdf",
"fileSizeBytes": 28432,
"pageCount": 1,
"createdAt": "2025-12-12T02:30:15.123Z",
"metadata": {
"format": "A4",
"orientation": "portrait",
"printBackground": true,
"margin": {
"top": "10mm",
"right": "10mm",
"bottom": "10mm",
"left": "10mm"
}
},
"status": "success",
"errorMessage": null
}
]
html-string-to-pdf-scraper/
├── src/
│ ├── index.js
│ ├── browser/
│ │ ├── launchBrowser.js
│ │ └── createPdf.js
│ ├── config/
│ │ └── defaultConfig.json
│ ├── services/
│ │ └── htmlToPdfService.js
│ └── utils/
│ ├── logger.js
│ └── validateInput.js
├── input_examples/
│ ├── simple-invoice.html
│ └── input.sample.json
├── output/
│ └── .gitkeep
├── tests/
│ ├── htmlToPdfService.test.js
│ └── validation.test.js
├── package.json
├── config.json
├── .gitignore
└── README.md
- SaaS platforms use it to generate branded invoices and billing statements from HTML templates, so they can deliver consistent PDF documents to customers automatically.
- Internal tools teams use it to convert HTML reports into PDFs on a schedule, so they can archive, email, or share standardized reports across the organization.
- Marketing teams use it to render email or landing page content as PDFs, so they can share campaign previews and approvals in a portable format.
- Developers use it to turn dynamic HTML dashboards into PDFs, so they can attach snapshots to notifications, tickets, or documentation.
- Consultants and agencies use it to automate proposal and contract generation from HTML templates, so they can save time and reduce manual formatting work.
Q1: What input format does this tool expect?
It expects a complete HTML string, including <html>, <head>, and <body> sections. You can supply this HTML from a file, a template engine, or a JSON payload, as long as the final value is a valid string of HTML.
Q2: Can I change the page size or orientation from A4?
Yes. While A4 portrait is the default, you can adjust page format (e.g., Letter), orientation (portrait or landscape), margins, and whether to print backgrounds via configuration options defined in config.json or environment variables.
Q3: Does it support external stylesheets, fonts, or images? As long as your HTML references reachable URLs or bundled assets, the headless browser will attempt to load them when rendering the page. For best results, use fully qualified URLs and ensure any required assets are accessible from the runtime environment.
Q4: How do I access the generated PDF after conversion?
After a successful run, the tool returns fields like pdfPath and pdfUrl. You can use pdfPath for local file system operations or pdfUrl if you integrate with a storage or delivery layer that exposes the file over HTTP.
Primary Metric: On a typical server with a modern CPU, generating a single-page A4 PDF from a medium-complexity HTML template takes on average 300–700 ms, including headless browser startup when reusing an existing browser instance.
Reliability Metric: In continuous use with clean HTML input, conversion success rates regularly exceed 99%, with failures usually tied to unreachable external assets or malformed HTML.
Efficiency Metric: When batching multiple HTML strings and reusing the same headless browser instance, the tool can reliably process 30–60 PDF conversions per minute while keeping CPU and memory usage within safe limits for standard container configurations.
Quality Metric: Generated PDFs maintain layout fidelity for modern HTML and CSS, including fonts, basic animations as static frames, and background images. Page content remains crisp when printed or zoomed, providing production-ready documents suitable for client delivery and archival.
