pdfplumber-rs

Extract chars, words, lines, rects, and tables from PDF documents with precise coordinates.

pdfplumber-rs is a Rust port of Python's pdfplumber. It extracts structured content from PDF files with coordinate-accurate positioning, including characters, words, lines, rectangles, curves, images, and tables.

Features

Text extraction with spatial grouping into words, lines, and text blocks
Table detection using lattice (line-based), stream (text-alignment), and explicit strategies
Spatial filtering via crop, within_bbox, and outside_bbox
CJK support including CID fonts, Identity-H/V CMaps, and CJK-aware word grouping
Page-level streaming for memory-efficient processing of large documents
WASM support via wasm32-unknown-unknown target
Optional serde serialization for all data types
Optional parallel processing via rayon

Installation

Add to your Cargo.toml:

[dependencies]
pdfplumber = "0.1"

Feature Flags

Feature	Default	Description
`std`	Yes	Enables file-path APIs (`Pdf::open_file`). Disable for WASM.
`serde`	No	Adds `Serialize`/`Deserialize` to all public data types.
`parallel`	No	Enables `Pdf::pages_parallel()` via rayon. Not WASM-compatible.

Quick Start

Extract Text

use pdfplumber::{Pdf, TextOptions};

fn main() {
    let pdf = Pdf::open_file("document.pdf", None).unwrap();
    for page_result in pdf.pages_iter() {
        let page = page_result.unwrap();
        let text = page.extract_text(&TextOptions::default());
        println!("Page {}: {}", page.page_number(), text);
    }
}

Extract Tables

use pdfplumber::{Pdf, TableSettings};

fn main() {
    let pdf = Pdf::open_file("document.pdf", None).unwrap();
    let page = pdf.page(0).unwrap();
    let tables = page.find_tables(&TableSettings::default());
    for table in &tables {
        for row in &table.rows {
            let cells: Vec<&str> = row.iter()
                .map(|c| c.text.as_deref().unwrap_or(""))
                .collect();
            println!("{:?}", cells);
        }
    }
}

Extract Characters

use pdfplumber::Pdf;

fn main() {
    let pdf = Pdf::open_file("document.pdf", None).unwrap();
    let page = pdf.page(0).unwrap();
    for ch in page.chars() {
        println!(
            "'{}' at ({:.1}, {:.1}) font={} size={:.1}",
            ch.text, ch.bbox.x0, ch.bbox.top, ch.fontname, ch.size
        );
    }
}

WASM Support

For wasm32-unknown-unknown targets, disable the default std feature:

[dependencies]
pdfplumber = { version = "0.1", default-features = false }

Use the bytes-based API:

let pdf = Pdf::open(pdf_bytes, None)?;
let page = pdf.page(0)?;
let text = page.extract_text(&TextOptions::default());

Architecture

+--------------------------------------------------------------+
|  Layer 5: Table Detection (Lattice / Stream / Explicit)      |
+--------------------------------------------------------------+
|  Layer 4: Text Grouping & Reading Order                      |
|  Characters -> Words -> Lines -> TextBlocks                  |
+--------------------------------------------------------------+
|  Layer 3: Object Extraction                                  |
|  Chars (bbox/font/size/color), Paths (lines/rects/curves)    |
+--------------------------------------------------------------+
|  Layer 2: Content Stream Interpreter                         |
|  Text state, Graphics state, CTM, XObject Do                 |
+--------------------------------------------------------------+
|  Layer 1: PDF Parsing (pluggable backend via PdfBackend)     |
|  lopdf (default)                                             |
+--------------------------------------------------------------+

The library is split into three crates:

Crate	Description
`pdfplumber-core`	Backend-independent data types and algorithms
`pdfplumber-parse`	PDF parsing and content stream interpretation
`pdfplumber`	Public API facade (this is what you depend on)

Minimum Supported Rust Version

Rust 1.85 or later.

License

Licensed under either of:

at your option.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pdfplumber-rs

Features

Installation

Feature Flags

Quick Start

Extract Text

Extract Tables

Extract Characters

WASM Support

Architecture

Minimum Supported Rust Version

License

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

pdfplumber-rs

Features

Installation

Feature Flags

Quick Start

Extract Text

Extract Tables

Extract Characters

WASM Support

Architecture

Minimum Supported Rust Version

License