Skip to content

Add optional PDF image preservation and PyMuPDF pipeline #27

@mspinolaeie

Description

@mspinolaeie

Summary

Add an optional PDF-focused enhancement to the GUI wrapper so PDF conversions can preserve extracted images, preview them before save, and optionally use a PyMuPDF-based PDF pipeline while keeping MarkItDown as the default behavior.

Scope

  • Keep the GUI as a wrapper of MarkItDown by default.
  • Add a PDF pipeline toggle: markitdown or pymupdf.
  • Add optional PDF image preservation in Markdown.
  • Support asset layouts separate and single, independent from combined/separate Markdown save mode.
  • Preserve preview/save flows and batch conversion behavior.

Acceptance Criteria

  • Non-PDF formats keep existing behavior.
  • PDFs keep existing behavior when image preservation is disabled.
  • PDFs can preserve extracted images as files and link them from Markdown.
  • The rendered preview can resolve extracted PDF image assets before save.
  • The pymupdf pipeline can place extracted images near the closest preceding text block on a best-effort basis.
  • Combined and separate save modes both rewrite image links correctly.
  • Test coverage includes conversion routing, runtime UI paths, real generated PDFs, and packaging smoke coverage.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions