Skip to content

Add structured PDF link support and MuPDF URI parsing#199

Open
JustForFun88 wants to merge 3 commits intomessense:mainfrom
JustForFun88:mupdf_uri_parsing
Open

Add structured PDF link support and MuPDF URI parsing#199
JustForFun88 wants to merge 3 commits intomessense:mainfrom
JustForFun88:mupdf_uri_parsing

Conversation

@JustForFun88
Copy link
Contributor

@JustForFun88 JustForFun88 commented Feb 11, 2026

This PR adds structured PDF link support, enabling parsing of MuPDF URI strings into typed Rust structs and the construction of PDF annotation dictionaries.

Based on #198 and #200.

What's new:

  • New pdf::links module with types:
    pub struct PdfLink {
        pub bounds: Rect,
        pub action: PdfAction,
    }
    
    /// PDF link destination representing an action associated with a link annotation
    /// (see [PDF 32000-1:2008, 12.6.4]).
    #[derive(Debug, Clone, PartialEq)]
    pub enum PdfAction {
        /// Go-to action (`S`=`GoTo`): changes the view to a destination in the current document
        /// (see PDF 32000-1:2008, [12.6.4.2], Table 199).
        GoTo(PdfDestination),
        /// Remote go-to action (`S`=`GoToR`): jumps to a destination in another PDF file
        /// (see PDF 32000-1:2008, [12.6.4.3], Table 200).
        GoToR {
            file: FileSpec,
            dest: PdfDestination,
        },
        /// Launch action (`S`=`Launch`): launches an application or opens/prints a document
        /// (see PDF 32000-1:2008, [12.6.4.5], Table 203).
        Launch(FileSpec),
        /// URI action (`S`=`URI`): resolves a uniform resource identifier
        /// (see PDF 32000-1:2008, [12.6.4.7], Table 206).
        Uri(String),
    }
    
    /// PDF file specification (see [PDF 32000-1:2008, 7.11]).
    #[derive(Debug, Clone, PartialEq)]
    pub enum FileSpec {
        /// Local filesystem path (e.g., `/Docs/path/file.pdf`, `path/file.pdf`, or `../file.pdf`).
        Path(String),
    
        /// URL-based file specification (e.g., `http://example.com/file.pdf`).
        Url(String),
    }
    
    /// Destination within a PDF document (see [PDF 32000-1:2008], 12.3.2).
    #[derive(Debug, Clone, PartialEq)]
    pub enum PdfDestination {
        /// Explicit destination: zero-based page number with view settings (e.g., page 0, Fit).
        Page { page: u32, kind: DestinationKind },
        /// Named destination string resolved in the remote document's name tree (e.g., `"Chapter1"`).
        Named(String),
    }
  • URI parsing (parse_external_link): converts MuPDF's URI strings back into structured action types
  • URI formatting (Display for PdfAction/DestinationKind): serializes actions back to MuPDF-compatible URI strings
  • Annotation building (build_link_annotation): constructs PDF link annotation dictionaries from PdfLink structs
  • PdfPage::pdf_links() iterator for extracting typed link data from pages
  • PdfPage::insert_links() for adding link annotations with Fitz-to-PDF coordinate transforms
  • Moved encode_into from Destination to DestinationKind and added Default implementation
  • Added percent-encoding dependency for URI component encoding/decoding

Testing:

  • Added tests covering direct URI parsing and PdfAction string formatting.
  • Implemented parsing / encode tests: constructing PdfAction -> writing it into a PDF -> reading back via MuPDF -> verifying that:
    • the parsed PdfAction matches the original
    • the raw MuPDF URI string matches the Display output of the original PdfAction

P.S. Although the PR looks large (~4000 lines changed), it's mostly tests (60%) and docs. The actual logic changes are relatively small.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds comprehensive structured PDF link support to the mupdf-rs crate, enabling bidirectional conversion between MuPDF's URI strings and typed Rust structs. The implementation provides a high-level API for extracting and creating PDF link annotations while maintaining full compatibility with MuPDF's internal link representation.

Changes:

  • Introduces a new pdf::links module with types PdfLink, PdfAction, PdfDestination, and FileSpec to represent PDF link annotations and actions
  • Implements URI parsing (parse_external_link) to convert MuPDF URI strings into structured PdfAction variants
  • Implements URI formatting (Display for PdfAction/DestinationKind) to serialize actions back to MuPDF-compatible URI strings
  • Adds PdfPage::pdf_links() iterator for extracting typed link data from pages and PdfPage::add_links() for inserting link annotations
  • Refactors Matrix::invert() to return Option<Matrix> instead of falling back to identity (based on PR #198)
  • Moves encode_into from Destination to DestinationKind and adds Default implementation
  • Fixes memory leak in LinkIter by adding Drop implementation (based on PR #200)

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/pdf/links/mod.rs Module entry point defining public types (PdfLink, PdfAction, PdfDestination, FileSpec) with Display implementation for URI formatting
src/pdf/links/extraction.rs URI parsing logic to convert MuPDF URI strings to structured PdfAction types
src/pdf/links/build.rs Annotation building logic to construct PDF link annotation dictionaries from PdfLink structs
src/pdf/links/tests_format.rs Comprehensive tests for URI formatting (Display implementation)
src/pdf/links/tests_extraction.rs Comprehensive tests for URI parsing with edge cases
src/pdf/links/tests_build.rs Round-trip tests verifying write/read symmetry and MuPDF compatibility
src/pdf/page.rs Page-level API with pdf_links() iterator and add_links() methods, including PdfLinkIter implementation
src/pdf/object.rs Adds array_push_ref and dict_put_ref helpers and refactors dict_put to delegate to dict_put_ref
src/pdf/document.rs Adds new_array_with_capacity, new_dict_with_capacity, load_pdf_page helpers and updates existing code to handle Matrix::invert() returning Option
src/rect.rs Adds encode_into method for encoding rectangles into PDF arrays
src/destination.rs Moves encode_into from Destination to DestinationKind, adds Default impl, and adds Display impl for URI fragment formatting
src/matrix.rs Changes invert() to return Option<Matrix> instead of Matrix::IDENTITY on singular matrices
src/page.rs Adds Drop implementation for LinkIter to fix memory leak
Cargo.toml Adds percent-encoding = "2.3.1" dependency for URI encoding/decoding

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


impl Default for DestinationKind {
fn default() -> Self {
// This analogue of MuPDF's `fz_make_link_dest_none` function
Copy link

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammar: "This analogue" should be "This is an analogue" or "Analogue".

Suggested change
// This analogue of MuPDF's `fz_make_link_dest_none` function
// This is an analogue of MuPDF's `fz_make_link_dest_none` function

Copilot uses AI. Check for mistakes.
pub(super) fn is_pdf_path(file_name: &str) -> bool {
file_name
.get(file_name.len().saturating_sub(4)..)
.is_some_and(|extention| extention.eq_ignore_ascii_case(".pdf"))
Copy link

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in variable name: "extention" should be "extension".

Suggested change
.is_some_and(|extention| extention.eq_ignore_ascii_case(".pdf"))
.is_some_and(|extension| extension.eq_ignore_ascii_case(".pdf"))

Copilot uses AI. Check for mistakes.
// https://github.com/ArtifexSoftware/mupdf/blob/60bf95d09f496ab67a5e4ea872bdd37a74b745fe/source/pdf/pdf-link.c#L1325
dest.array_push_ref(dest_page_obj)?;

// MuPDF uses inv_ctm to transform coodinates
Copy link

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in comment: "coodinates" should be "coordinates".

Suggested change
// MuPDF uses inv_ctm to transform coodinates
// MuPDF uses inv_ctm to transform coordinates

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant