html-to-markdown

High-performance HTML → Markdown conversion powered by Rust. Shipping as a Rust crate, Python package, Node.js bindings, WebAssembly, and standalone CLI with identical rendering behaviour.

Documentation

JavaScript/TypeScript guides:
- Node.js/Bun (native) – Node.js README
- WebAssembly (universal) – WASM README
Python guide – Python README
Rust guide – Rust README
Contributing – CONTRIBUTING.md ⭐ Start here!
Changelog – CHANGELOG.md

Installation

Target	Command
Node.js/Bun (native)	`npm install html-to-markdown-node`
WebAssembly (universal)	`npm install html-to-markdown-wasm`
Deno	`import { convert } from "npm:html-to-markdown-wasm"`
Python (bindings + CLI)	`pip install html-to-markdown`
Rust crate	`cargo add html-to-markdown-rs`
Rust CLI	`cargo install html-to-markdown-cli`
Homebrew CLI	`brew tap goldziher/tap` `brew install html-to-markdown`
Releases	GitHub Releases

Quick Start

JavaScript/TypeScript

Node.js / Bun (Native - Fastest):

import { convert } from 'html-to-markdown-node';

const html = '<h1>Hello</h1><p>Rust ❤️ Markdown</p>';
const markdown = convert(html, {
  headingStyle: 'Atx',
  codeBlockStyle: 'Backticks',
  wrap: true,
});

Deno / Browsers / Edge (Universal):

import { convert } from "npm:html-to-markdown-wasm"; // Deno
// or: import { convert } from 'html-to-markdown-wasm'; // Bundlers

const markdown = convert(html, {
  headingStyle: 'atx',
  listIndentWidth: 2,
});

Performance: Native bindings average ~19k ops/sec, WASM averages ~16k ops/sec (benchmarked on complex real-world documents).

See the JavaScript guides for full API documentation:

CLI

# Convert a file
html-to-markdown input.html > output.md

# Stream from stdin
curl https://example.com | html-to-markdown > output.md

# Apply options
html-to-markdown --heading-style atx --list-indent-width 2 input.html

Python (v2 API)

from html_to_markdown import convert, convert_with_inline_images, InlineImageConfig

html = "<h1>Hello</h1><p>Rust ❤️ Markdown</p>"
markdown = convert(html)

markdown, inline_images, warnings = convert_with_inline_images(
    '<img src="data:image/png;base64,...==" alt="Pixel">',
    image_config=InlineImageConfig(max_decoded_size_bytes=1024, infer_dimensions=True),
)

Rust

use html_to_markdown_rs::{convert, ConversionOptions, HeadingStyle};

let html = "<h1>Welcome</h1><p>Fast conversion</p>";
let markdown = convert(html, None)?;

let options = ConversionOptions {
    heading_style: HeadingStyle::Atx,
    ..Default::default()
};
let markdown = convert(html, Some(options))?;

See the language-specific READMEs for complete configuration, hOCR workflows, and inline image extraction.

Performance

Benchmarked on Apple M4 with complex real-world documents (Wikipedia articles, tables, lists):

Operations per Second (higher is better)

Document Type	Node.js (NAPI)	WASM	Python (PyO3)	Speedup (Node vs Python)
Small (5 paragraphs)	86,233	70,300	8,443	10.2×
Medium (25 paragraphs)	18,979	15,282	1,846	10.3×
Large (100 paragraphs)	4,907	3,836	438	11.2×
Tables (complex)	5,003	3,748	4,829	1.0×
Lists (nested)	1,819	1,391	1,165	1.6×
Wikipedia (129KB)	1,125	1,022	-	-
Wikipedia (653KB)	156	147	-	-

Average Performance Summary

Implementation	Avg ops/sec	vs WASM	vs Python	Best For
Node.js (NAPI-RS)	18,162	1.17× faster	7.4× faster	Maximum throughput in Node.js/Bun
WebAssembly	15,536	baseline	6.3× faster	Universal (Deno, browsers, edge)
Python (PyO3)	2,465	6.3× slower	baseline	Python ecosystem integration
Rust CLI/Binary	150-210 MB/s	-	-	Standalone processing

Key Insights

JavaScript bindings are fastest: Native Node.js bindings achieve ~18k ops/sec average, with WASM close behind at ~16k ops/sec
Python is 6-10× slower: Despite using the same Rust core, PyO3 FFI overhead significantly impacts Python performance
Small documents: Both JS implementations reach 70-90k ops/sec on simple HTML
Large documents: Performance gap widens with complexity

Note on Python performance: The current Python bindings have optimization opportunities. The v2 API with direct convert() calls performs best; avoid the v1 compatibility layer for performance-critical applications.

Compatibility (v1 → v2)

V2’s Rust core sustains 150–210 MB/s throughput; V1 averaged ≈ 2.5 MB/s in its Python/BeautifulSoup implementation (60–80× faster).
The Python package offers a compatibility shim in html_to_markdown.v1_compat (convert_to_markdown, convert_to_markdown_stream, markdownify). Details and keyword mappings live in Python README.
CLI flag changes, option renames, and other breaking updates are summarised in CHANGELOG.

Community

Chat with us on Discord
Explore the broader Kreuzberg document-processing ecosystem
Sponsor development via GitHub Sponsors

Name		Name	Last commit message	Last commit date
Latest commit History 550 Commits
.gemini		.gemini
.github		.github
crates		crates
examples		examples
html_to_markdown		html_to_markdown
scripts		scripts
test_documents		test_documents
tests		tests
.commitlintrc		.commitlintrc
.gitignore		.gitignore
.gitmodules		.gitmodules
.markdownlint.yaml		.markdownlint.yaml
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.toml		Cargo.toml
GEMINI.md		GEMINI.md
LICENSE		LICENSE
README.md		README.md
README_PYPI.md		README_PYPI.md
Taskfile.yaml		Taskfile.yaml
ai-rulez.yaml		ai-rulez.yaml
benchmark_parser_Cargo.toml		benchmark_parser_Cargo.toml
biome.json		biome.json
package.json		package.json
parser_benchmark.rs		parser_benchmark.rs
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
pyproject.toml		pyproject.toml
rustfmt.toml		rustfmt.toml
tsconfig.base.json		tsconfig.base.json
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

html-to-markdown

Documentation

Installation

Quick Start

JavaScript/TypeScript

CLI

Python (v2 API)

Rust

Performance

Operations per Second (higher is better)

Average Performance Summary

Key Insights

Compatibility (v1 → v2)

Community

About

Uh oh!

Releases 28

Packages

Uh oh!

Contributors 41

Languages

License

Goldziher/html-to-markdown

Folders and files

Latest commit

History

Repository files navigation

html-to-markdown

Documentation

Installation

Quick Start

JavaScript/TypeScript

CLI

Python (v2 API)

Rust

Performance

Operations per Second (higher is better)

Average Performance Summary

Key Insights

Compatibility (v1 → v2)

Community

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 28

Packages 0

Uh oh!

Contributors 41

Languages

Packages