React PDF to Markdown (Modernized)

The fastest client-side PDF to Markdown converter. Rebuilt for the AI era.

Transform PDFs into clean, structured Markdown directly in the browser. Perfect for RAG (Retrieval-Augmented Generation) pipelines, LLM data ingestion, and content migration.

🚀 Live Demo

⚡ Why This Fork?

This is a complete modernization of the original pdf-to-markdown tool. We moved from a legacy 2017 stack to a cutting-edge 2025 architecture, resulting in significantly faster parsing and a better developer experience.

Performance & Stack Comparison

Metric	Legacy (v0.x)	Modern (v1.0)
Framework	React 15 (Class Components)	React 19 (Hooks/Functional)
Build Tool	Webpack	Vite (Instant HMR)
Language	JavaScript	TypeScript (Strict Mode)
Testing	Mocha/Chai	Vitest
Styling	Bootstrap 3	Tailwind CSS v4
Parsing Speed	~9s (500 pages)	~7s (500 pages) 🚀
PDF Engine	pdf.js v2	pdf.js v5.4

✨ Features

RAG-Ready Output: Extracts clean structure (headers, lists, tables) optimized for LLM context windows.
100% Client-Side: No server required. Your documents never leave the browser.
Smart Layout Detection: correctly identifies:
- H1-H6 Headers (based on font size/weight)
- Complex nested lists
- Code blocks & Blockquotes
- Table of Contents
Debug Mode: Visualise the 12-stage transformation pipeline step-by-step.

🚀 Getting Started

Prerequisites

Node.js 18+
npm or pnpm

Installation

git clone https://github.com/namtroi/pdf-to-markdown.git
cd pdf-to-markdown
npm install

Development

# Start the Vite dev server (Fast HMR)
npm run dev

# Run the test suite (Vitest)
npm run test

# Type-check and build for production
npm run build

🛠 Architecture

The conversion pipeline consists of 12 granular transformation stages:

Extract: Raw text & geometry from PDF.js.
Structure: Detect headers, list items, and blocks.
Refine: Remove repetition, merge lines, detect code blocks.
Output: Render final Markdown.

See ARCHITECTURE.md for a deep dive into the logic.

🚧 Current Status & Roadmap

This project is currently maintained by a solo developer. While the core logic is robust and fully typed, we are practically pragmatic about the codebase:

✅ Core Pipeline: Fully migrated to TypeScript with strict types.
✅ Tests: >77% coverage with Vitest.
⚠️ UI Components: You might spot a few @ts-ignore or any types in the legacy view components.

Contributions are welcome! If you're looking for a Good First Issue, helping us kill the last few any types in the UI would be amazing.

🤝 Contributing

Fork the repo.
Create your feature branch (git checkout -b feature/amazing-feature).
Commit your changes (git commit -m 'Add some amazing feature').
Push to the branch (git push origin feature/amazing-feature).
Open a Pull Request.

📜 Credits

Original Author: Heavily inspired by and forked from jzillmann/pdf-to-markdown. Massive kudos for the original algorithm.
PDF Engine: Powered by Mozilla's PDF.js.

📄 License

Distributed under the MIT License. See LICENSE for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 154 Commits
.claude		.claude
.github/workflows		.github/workflows
coverage		coverage
docs		docs
public		public
src		src
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
eslint.config.js		eslint.config.js
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
postcss.config.js		postcss.config.js
tailwind.config.ts		tailwind.config.ts
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
vite.config.ts		vite.config.ts
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

React PDF to Markdown (Modernized)

⚡ Why This Fork?

Performance & Stack Comparison

✨ Features

🚀 Getting Started

Prerequisites

Installation

Development

🛠 Architecture

🚧 Current Status & Roadmap

🤝 Contributing

📜 Credits

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

React PDF to Markdown (Modernized)

⚡ Why This Fork?

Performance & Stack Comparison

✨ Features

🚀 Getting Started

Prerequisites

Installation

Development

🛠 Architecture

🚧 Current Status & Roadmap

🤝 Contributing

📜 Credits

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages