High-performance HTML → Markdown conversion powered by Rust. Shipping as a Rust crate, Python package, Node.js bindings, WebAssembly, and standalone CLI with identical rendering behaviour.
- JavaScript/TypeScript guides:
- Node.js/Bun (native) – Node.js README
- WebAssembly (universal) – WASM README
- Python guide – Python README
- Rust guide – Rust README
- Contributing – CONTRIBUTING.md ⭐ Start here!
- Changelog – CHANGELOG.md
Target | Command |
---|---|
Node.js/Bun (native) | npm install html-to-markdown-node |
WebAssembly (universal) | npm install html-to-markdown-wasm |
Deno | import { convert } from "npm:html-to-markdown-wasm" |
Python (bindings + CLI) | pip install html-to-markdown |
Rust crate | cargo add html-to-markdown-rs |
Rust CLI | cargo install html-to-markdown-cli |
Homebrew CLI | brew tap goldziher/tap brew install html-to-markdown |
Releases | GitHub Releases |
Node.js / Bun (Native - Fastest):
import { convert } from 'html-to-markdown-node';
const html = '<h1>Hello</h1><p>Rust ❤️ Markdown</p>';
const markdown = convert(html, {
headingStyle: 'Atx',
codeBlockStyle: 'Backticks',
wrap: true,
});
Deno / Browsers / Edge (Universal):
import { convert } from "npm:html-to-markdown-wasm"; // Deno
// or: import { convert } from 'html-to-markdown-wasm'; // Bundlers
const markdown = convert(html, {
headingStyle: 'atx',
listIndentWidth: 2,
});
Performance: Native bindings average ~19k ops/sec, WASM averages ~16k ops/sec (benchmarked on complex real-world documents).
See the JavaScript guides for full API documentation:
# Convert a file
html-to-markdown input.html > output.md
# Stream from stdin
curl https://example.com | html-to-markdown > output.md
# Apply options
html-to-markdown --heading-style atx --list-indent-width 2 input.html
from html_to_markdown import convert, convert_with_inline_images, InlineImageConfig
html = "<h1>Hello</h1><p>Rust ❤️ Markdown</p>"
markdown = convert(html)
markdown, inline_images, warnings = convert_with_inline_images(
'<img src="data:image/png;base64,...==" alt="Pixel">',
image_config=InlineImageConfig(max_decoded_size_bytes=1024, infer_dimensions=True),
)
use html_to_markdown_rs::{convert, ConversionOptions, HeadingStyle};
let html = "<h1>Welcome</h1><p>Fast conversion</p>";
let markdown = convert(html, None)?;
let options = ConversionOptions {
heading_style: HeadingStyle::Atx,
..Default::default()
};
let markdown = convert(html, Some(options))?;
See the language-specific READMEs for complete configuration, hOCR workflows, and inline image extraction.
Benchmarked on Apple M4 with complex real-world documents (Wikipedia articles, tables, lists):
Document Type | Node.js (NAPI) | WASM | Python (PyO3) | Speedup (Node vs Python) |
---|---|---|---|---|
Small (5 paragraphs) | 86,233 | 70,300 | 8,443 | 10.2× |
Medium (25 paragraphs) | 18,979 | 15,282 | 1,846 | 10.3× |
Large (100 paragraphs) | 4,907 | 3,836 | 438 | 11.2× |
Tables (complex) | 5,003 | 3,748 | 4,829 | 1.0× |
Lists (nested) | 1,819 | 1,391 | 1,165 | 1.6× |
Wikipedia (129KB) | 1,125 | 1,022 | - | - |
Wikipedia (653KB) | 156 | 147 | - | - |
Implementation | Avg ops/sec | vs WASM | vs Python | Best For |
---|---|---|---|---|
Node.js (NAPI-RS) | 18,162 | 1.17× faster | 7.4× faster | Maximum throughput in Node.js/Bun |
WebAssembly | 15,536 | baseline | 6.3× faster | Universal (Deno, browsers, edge) |
Python (PyO3) | 2,465 | 6.3× slower | baseline | Python ecosystem integration |
Rust CLI/Binary | 150-210 MB/s | - | - | Standalone processing |
- JavaScript bindings are fastest: Native Node.js bindings achieve ~18k ops/sec average, with WASM close behind at ~16k ops/sec
- Python is 6-10× slower: Despite using the same Rust core, PyO3 FFI overhead significantly impacts Python performance
- Small documents: Both JS implementations reach 70-90k ops/sec on simple HTML
- Large documents: Performance gap widens with complexity
Note on Python performance: The current Python bindings have optimization opportunities. The v2 API with direct convert()
calls performs best; avoid the v1 compatibility layer for performance-critical applications.
- V2’s Rust core sustains 150–210 MB/s throughput; V1 averaged ≈ 2.5 MB/s in its Python/BeautifulSoup implementation (60–80× faster).
- The Python package offers a compatibility shim in
html_to_markdown.v1_compat
(convert_to_markdown
,convert_to_markdown_stream
,markdownify
). Details and keyword mappings live in Python README. - CLI flag changes, option renames, and other breaking updates are summarised in CHANGELOG.
- Chat with us on Discord
- Explore the broader Kreuzberg document-processing ecosystem
- Sponsor development via GitHub Sponsors