pdfplumber-rs

Extract chars, words, lines, rects, and tables from PDF documents with precise coordinates.

pdfplumber-rs is a Rust port of Python's pdfplumber. It extracts structured content from PDF files with coordinate-accurate positioning, including characters, words, lines, rectangles, curves, images, and tables.

Features

Text extraction with spatial grouping into words, lines, and text blocks
Table detection using lattice (line-based), stream (text-alignment), and explicit strategies
Spatial filtering via crop, within_bbox, and outside_bbox
CJK support including CID fonts, Identity-H/V CMaps, and CJK-aware word grouping
Page-level streaming for memory-efficient processing of large documents
WASM support via wasm32-unknown-unknown target
Optional serde serialization for all data types
Optional parallel processing via rayon

Installation

Add to your Cargo.toml:

[dependencies]
pdfplumber = "0.1"

Feature Flags

Feature	Default	Description
`std`	Yes	Enables file-path APIs (`Pdf::open_file`). Disable for WASM.
`serde`	No	Adds `Serialize`/`Deserialize` to all public data types.
`parallel`	No	Enables `Pdf::pages_parallel()` via rayon. Not WASM-compatible.

Quick Start

Extract Text

use pdfplumber::{Pdf, TextOptions};

fn main() {
    let pdf = Pdf::open_file("document.pdf", None).unwrap();
    for page_result in pdf.pages_iter() {
        let page = page_result.unwrap();
        let text = page.extract_text(&TextOptions::default());
        println!("Page {}: {}", page.page_number(), text);
    }
}

Extract Tables

use pdfplumber::{Pdf, TableSettings};

fn main() {
    let pdf = Pdf::open_file("document.pdf", None).unwrap();
    let page = pdf.page(0).unwrap();
    let tables = page.find_tables(&TableSettings::default());
    for table in &tables {
        for row in &table.rows {
            let cells: Vec<&str> = row.iter()
                .map(|c| c.text.as_deref().unwrap_or(""))
                .collect();
            println!("{:?}", cells);
        }
    }
}

Extract Characters

use pdfplumber::Pdf;

fn main() {
    let pdf = Pdf::open_file("document.pdf", None).unwrap();
    let page = pdf.page(0).unwrap();
    for ch in page.chars() {
        println!(
            "'{}' at ({:.1}, {:.1}) font={} size={:.1}",
            ch.text, ch.bbox.x0, ch.bbox.top, ch.fontname, ch.size
        );
    }
}

WASM Support

For wasm32-unknown-unknown targets, disable the default std feature:

[dependencies]
pdfplumber = { version = "0.1", default-features = false }

Use the bytes-based API:

let pdf = Pdf::open(pdf_bytes, None)?;
let page = pdf.page(0)?;
let text = page.extract_text(&TextOptions::default());

Architecture

+--------------------------------------------------------------+
|  Layer 5: Table Detection (Lattice / Stream / Explicit)      |
+--------------------------------------------------------------+
|  Layer 4: Text Grouping & Reading Order                      |
|  Characters -> Words -> Lines -> TextBlocks                  |
+--------------------------------------------------------------+
|  Layer 3: Object Extraction                                  |
|  Chars (bbox/font/size/color), Paths (lines/rects/curves)    |
+--------------------------------------------------------------+
|  Layer 2: Content Stream Interpreter                         |
|  Text state, Graphics state, CTM, XObject Do                 |
+--------------------------------------------------------------+
|  Layer 1: PDF Parsing (pluggable backend via PdfBackend)     |
|  lopdf (default)                                             |
+--------------------------------------------------------------+

The library is split into three crates:

Crate	Description
`pdfplumber-core`	Backend-independent data types and algorithms
`pdfplumber-parse`	PDF parsing and content stream interpretation
`pdfplumber`	Public API facade (this is what you depend on)

Minimum Supported Rust Version

Rust 1.85 or later.

License

Licensed under either of:

at your option.

Name		Name	Last commit message	Last commit date
Latest commit History 467 Commits
.github/workflows		.github/workflows
crates		crates
references		references
scripts		scripts
tests		tests
.gitignore		.gitignore
.ralph-worktree		.ralph-worktree
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
METHODOLOGY.md		METHODOLOGY.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pdfplumber-rs

Features

Installation

Feature Flags

Quick Start

Extract Text

Extract Tables

Extract Characters

WASM Support

Architecture

Minimum Supported Rust Version

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

pdfplumber-rs

Features

Installation

Feature Flags

Quick Start

Extract Text

Extract Tables

Extract Characters

WASM Support

Architecture

Minimum Supported Rust Version

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages