Skip to content

HTML to markdown converter

License

MrTomRod/html-to-markdown

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

html-to-markdown

High-performance HTML → Markdown conversion powered by Rust. Shipping as a Rust crate, Python package, Node.js bindings, WebAssembly, and standalone CLI with identical rendering behaviour.

PyPI version npm version Crates.io Python Versions License: MIT Discord

🎮 Live Demo

Try it now: https://goldziher.github.io/html-to-markdown/

Experience the power of WebAssembly-based HTML to Markdown conversion directly in your browser!

Documentation

Installation

Target Command
Node.js/Bun (native) npm install html-to-markdown-node
WebAssembly (universal) npm install html-to-markdown-wasm
Deno import { convert } from "npm:html-to-markdown-wasm"
Python (bindings + CLI) pip install html-to-markdown
Rust crate cargo add html-to-markdown-rs
Rust CLI cargo install html-to-markdown-cli
Homebrew CLI brew tap goldziher/tap
brew install html-to-markdown
Releases GitHub Releases

Quick Start

JavaScript/TypeScript

Node.js / Bun (Native - Fastest):

import { convert } from 'html-to-markdown-node';

const html = '<h1>Hello</h1><p>Rust ❤️ Markdown</p>';
const markdown = convert(html, {
  headingStyle: 'Atx',
  codeBlockStyle: 'Backticks',
  wrap: true,
});

Deno / Browsers / Edge (Universal):

import { convert } from "npm:html-to-markdown-wasm"; // Deno
// or: import { convert } from 'html-to-markdown-wasm'; // Bundlers

const markdown = convert(html, {
  headingStyle: 'atx',
  listIndentWidth: 2,
});

Performance: Native bindings average ~19k ops/sec, WASM averages ~16k ops/sec (benchmarked on complex real-world documents).

See the JavaScript guides for full API documentation:

CLI

# Convert a file
html-to-markdown input.html > output.md

# Stream from stdin
curl https://example.com | html-to-markdown > output.md

# Apply options
html-to-markdown --heading-style atx --list-indent-width 2 input.html

Python (v2 API)

from html_to_markdown import convert, convert_with_inline_images, InlineImageConfig

html = "<h1>Hello</h1><p>Rust ❤️ Markdown</p>"
markdown = convert(html)

markdown, inline_images, warnings = convert_with_inline_images(
    '<img src="data:image/png;base64,...==" alt="Pixel">',
    image_config=InlineImageConfig(max_decoded_size_bytes=1024, infer_dimensions=True),
)

Rust

use html_to_markdown_rs::{convert, ConversionOptions, HeadingStyle};

let html = "<h1>Welcome</h1><p>Fast conversion</p>";
let markdown = convert(html, None)?;

let options = ConversionOptions {
    heading_style: HeadingStyle::Atx,
    ..Default::default()
};
let markdown = convert(html, Some(options))?;

See the language-specific READMEs for complete configuration, hOCR workflows, and inline image extraction.

Performance

Benchmarked on Apple M4 with complex real-world documents (Wikipedia articles, tables, lists):

Operations per Second (higher is better)

Document Type Node.js (NAPI) WASM Python (PyO3) Speedup (Node vs Python)
Small (5 paragraphs) 86,233 70,300 8,443 10.2×
Medium (25 paragraphs) 18,979 15,282 1,846 10.3×
Large (100 paragraphs) 4,907 3,836 438 11.2×
Tables (complex) 5,003 3,748 4,829 1.0×
Lists (nested) 1,819 1,391 1,165 1.6×
Wikipedia (129KB) 1,125 1,022 - -
Wikipedia (653KB) 156 147 - -

Average Performance Summary

Implementation Avg ops/sec vs WASM vs Python Best For
Node.js (NAPI-RS) 18,162 1.17× faster 7.4× faster Maximum throughput in Node.js/Bun
WebAssembly 15,536 baseline 6.3× faster Universal (Deno, browsers, edge)
Python (PyO3) 2,465 6.3× slower baseline Python ecosystem integration
Rust CLI/Binary 150-210 MB/s - - Standalone processing

Key Insights

  • JavaScript bindings are fastest: Native Node.js bindings achieve ~18k ops/sec average, with WASM close behind at ~16k ops/sec
  • Python is 6-10× slower: Despite using the same Rust core, PyO3 FFI overhead significantly impacts Python performance
  • Small documents: Both JS implementations reach 70-90k ops/sec on simple HTML
  • Large documents: Performance gap widens with complexity

Note on Python performance: The current Python bindings have optimization opportunities. The v2 API with direct convert() calls performs best; avoid the v1 compatibility layer for performance-critical applications.

Compatibility (v1 → v2)

  • V2’s Rust core sustains 150–210 MB/s throughput; V1 averaged ≈ 2.5 MB/s in its Python/BeautifulSoup implementation (60–80× faster).
  • The Python package offers a compatibility shim in html_to_markdown.v1_compat (convert_to_markdown, convert_to_markdown_stream, markdownify). Details and keyword mappings live in Python README.
  • CLI flag changes, option renames, and other breaking updates are summarised in CHANGELOG.

Community

About

HTML to markdown converter

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • HTML 71.6%
  • Rust 15.3%
  • Python 11.3%
  • TypeScript 1.7%
  • Other 0.1%