🚀 What's new in v0.9.7
- PaddleOCR-VL PDF Parser (with Restoration + Split Tables): New
PaddleOCRVL-powered PDF parser that combines layout-aware OCR, visual-language understanding, page restoration, and split table merging in a single high-level pipeline. - Split Table Merging Everywhere: Split table detection & merging is now available across ChartTablePDFParser and EnhancedPDFParser, so multi-page tables are reconstructed consistently whether you’re extracting text, tables, or charts.
- Restoration-Friendly Flow: The new parser plays nicely with restoration steps (denoising, deblurring, cleanups), improving OCR and structure extraction on noisy reports and scanned PDFs.
- Docs Upgrade: Documentation updated to explain when to use the new PaddleOCR-VL parser, how split table merging works across parsers, and how to configure these features in real-world workflows.
✅ Motivation
Doctra is increasingly used on messy, real-world PDFs where tables are split across pages and visual context matters (charts, complex layouts, degraded scans). This release focuses on:
- Making split-table merging a first-class feature across multiple parsers.
- Introducing a PaddleOCR-VL–based parser that can better understand visual + textual context.
- Tightening the integration with restoration so that users get more reliable structured outputs from imperfect documents.
🛠 What’s Changed
- feat: Add PaddleOCRVL PDF parser with restoration and split table merging by @AdemBoukhris457 in #82
- feat: Add split table merging to
ChartTablePDFParserby @AdemBoukhris457 in #81 - feat: Add split table merging support to
EnhancedPDFParserby @AdemBoukhris457 in #80 - docs: Document new PaddleOCR-VL parser & split-table merging behavior (usage, configuration, and examples)
📦 Version
v0.8.0 → v0.9.7
Minor feature-focused release that extends split-table merging across parsers and introduces a PaddleOCR-VL–powered PDF parser with restoration support. No breaking changes to the public API — existing workflows keep working, but gain access to smarter parsing options.