Skip to content

Releases: AdemBoukhris457/Doctra

v0.9.7

16 Nov 13:18
0d2ebf7

Choose a tag to compare

🚀 What's new in v0.9.7

  • PaddleOCR-VL PDF Parser (with Restoration + Split Tables): New PaddleOCRVL-powered PDF parser that combines layout-aware OCR, visual-language understanding, page restoration, and split table merging in a single high-level pipeline.
  • Split Table Merging Everywhere: Split table detection & merging is now available across ChartTablePDFParser and EnhancedPDFParser, so multi-page tables are reconstructed consistently whether you’re extracting text, tables, or charts.
  • Restoration-Friendly Flow: The new parser plays nicely with restoration steps (denoising, deblurring, cleanups), improving OCR and structure extraction on noisy reports and scanned PDFs.
  • Docs Upgrade: Documentation updated to explain when to use the new PaddleOCR-VL parser, how split table merging works across parsers, and how to configure these features in real-world workflows.

Motivation

Doctra is increasingly used on messy, real-world PDFs where tables are split across pages and visual context matters (charts, complex layouts, degraded scans). This release focuses on:

  • Making split-table merging a first-class feature across multiple parsers.
  • Introducing a PaddleOCR-VL–based parser that can better understand visual + textual context.
  • Tightening the integration with restoration so that users get more reliable structured outputs from imperfect documents.

🛠 What’s Changed

  • feat: Add PaddleOCRVL PDF parser with restoration and split table merging by @AdemBoukhris457 in #82
  • feat: Add split table merging to ChartTablePDFParser by @AdemBoukhris457 in #81
  • feat: Add split table merging support to EnhancedPDFParser by @AdemBoukhris457 in #80
  • docs: Document new PaddleOCR-VL parser & split-table merging behavior (usage, configuration, and examples)

📦 Version

v0.8.0 → v0.9.7
Minor feature-focused release that extends split-table merging across parsers and introduces a PaddleOCR-VL–powered PDF parser with restoration support. No breaking changes to the public API — existing workflows keep working, but gain access to smarter parsing options.

v0.8.0

10 Nov 20:04
95b5979

Choose a tag to compare

🚀 What's new in v0.8.0

  • Dependency-Based OCR & VLM Configuration: Doctra’s OCR and VLM engines now use a clean dependency pattern, making it easier to plug in, swap, or extend engines (PaddleOCR, Tesseract, VLMs, etc.) in a consistent way.
  • Cleaner Engine Setup: Centralized configuration logic reduces duplication, improves readability, and makes it simpler to maintain multi-backend pipelines.
  • Codebase Cleanup: Removed noisy / redundant comments and streamlined internals for a more professional, focused contributor experience.
  • Docs Alignment: Documentation updated to reflect the new dependency-based configuration flow so users and contributors can follow the architecture easily.

Motivation

As Doctra adds more OCR engines and VLM backends, a scalable configuration pattern becomes critical. This release focuses on making the engine wiring predictable, extensible, and maintainable, while keeping the public behavior stable and the onboarding experience clearer.


🛠 What’s Changed


📦 Version

v0.7.1 → v0.8.0
Minor release focused on architecture & maintainability (dependency-based configuration, cleaner code, updated docs). No breaking changes to the public API.

v0.7.1

10 Nov 19:58
95b5979

Choose a tag to compare

🚀 What's new in v0.7.1

  • PaddleOCR PP-OCRv5 Server Support: Doctra now supports the high-performance PP-OCRv5_server engine for faster and more accurate OCR in production-style workflows.
  • Seamless Engine Integration: PP-OCRv5_server plugs into the existing OCR selection flow, so users can easily switch between lightweight and server-grade models depending on their use case.
  • Docs Updated: README and docs now clearly show how to enable and configure PP-OCRv5_server within Doctra.

Motivation

Many users run Doctra in server or batch environments and need a stronger OCR backend without changing their pipeline. This release introduces first-class support for PP-OCRv5_server, making Doctra more flexible for heavy workloads while keeping configuration simple.


🛠 What’s Changed


📦 Version

v0.7.0 → v0.7.1 (patch release; new OCR engine option + documentation update, no breaking changes).

v0.7.0

02 Nov 09:59
7a7f8f9

Choose a tag to compare

🚀 What's new in v0.7.0

  • Automatic Split Table Detection & Merging: Doctra can now detect when a table is split across two pages (bottom of page → top of next page) and automatically merge them into a single structured table.
  • Configurable Heuristics: Merging is based on layout proximity + column alignment (via line/structure detection), making it robust for multi-page PDF reports.
  • New Documentation Section: Added a full “Split Table Merging” guide with flow, conditions, and examples.
  • Visual Diagrams: Mermaid diagrams added to explain the detection → matching → merge pipeline.
  • Docs Navigation Fixes: Split-table docs are now properly added to MkDocs nav and broken internal links are fixed.
  • Better Onboarding: README and docs now include Colab badges, quick start tutorial, and an interactive notebook showcase table.
  • Poppler Docs Update: Updated Poppler installation URLs to the official source.

Motivation

Many PDFs (invoices, financial statements, hotel reports, academic PDFs) break a long table across pages. Earlier, Doctra extracted them as separate tables. This release focuses on making multi-page tables “just work” out of the box and on documenting the feature clearly so users can extend or tune it.


🛠 What’s Changed


📦 Version

v0.6.2 → v0.7.0 (minor release; new feature: automatic split-table merging, plus large docs/navigation improvements; no breaking changes for existing parsers).

v0.6.2

25 Oct 15:24
9c5d74f

Choose a tag to compare

🚀 What's new in v0.6.2

Enhanced Output Suppression: Comprehensive silence context manager for cleaner PaddleOCR operations
Google Colab Compatibility: Resolved multiple dependency conflicts and installation issues
Improved User Experience: Cleaner console output during OCR model loading and processing
Dependency Management: Standardized google-genai usage and removed conflicting websockets dependencies
Warning Suppression: Better handling of Hugging Face token warnings in Google Colab environments

Motivation

Improve user experience by providing cleaner output during OCR operations and ensure seamless installation and usage in Google Colab environments. This patch focuses on reliability and user-friendliness.

What's Changed
feat: Enhance silence context manager for comprehensive PaddleOCR output suppression by @AdemBoukhris457 in #61
feat: Add silence context manager for PaddleOCR model loading by @AdemBoukhris457 in #60
fix: Improve Hugging Face warning suppression for Google Colab by @AdemBoukhris457 in #59
fix: Suppress Hugging Face token warnings in Google Colab by @AdemBoukhris457 in #58
fix: Resolve gradio-websockets dependency conflict in Google Colab by @AdemBoukhris457 in #57
fix: Remove google-genai version constraints to resolve websockets conflicts by @AdemBoukhris457 in #56
fix: Replace google-generativeai with google-genai and standardize versions by @AdemBoukhris457 in #55
fix: Remove unnecessary websockets dependency to resolve Google Colab installation conflicts by @AdemBoukhris457 in #54

📦 Version

v0.6.1 → v0.6.2 (patch release; enhanced output suppression and Google Colab compatibility fixes, no breaking changes)

v0.6.1

25 Oct 10:14
81330f7

Choose a tag to compare

🚀 What's new in v0.6.1

Dependency fixes: Added missing runtime dependencies to prevent ModuleNotFoundError on fresh installs
Packaging alignment: Synced pyproject/extras with docs to avoid environment drift
Install reliability: Smoother pip install doctra across clean environments
Docs tweak: Clarified install commands and extras usage

Motivation

Ensure that the v0.6.0 feature set is accessible without installation hiccups. This patch focuses on reliability so users can get running quickly in new environments.

What's Changed
fix: Add missing dependencies to resolve installation issues by @AdemBoukhris457 in #52

📦 Version

v0.6.0 → v0.6.1 (patch release; installation and packaging fixes, no breaking changes)

v0.6.0

18 Oct 19:56
eea1548

Choose a tag to compare

🚀 What's new in v0.6.0

DOCX Parser: Add Microsoft Word document parsing with VLM integration for enhanced layout analysis
Hugging Face Spaces: Add web-based deployment with Gradio interface for easy document processing
Documentation Updates: Updated banner image to Doctra_Banner_MultiDoc for better visual representation
Navigation Fixes: Resolved MkDocs navigation issues and broken internal links
Enhanced UX: Improved documentation structure and user experience

✅ Motivation

Expand document processing capabilities to support Microsoft Word documents and provide accessible web-based deployment options through Hugging Face Spaces. This release significantly enhances Doctra's parsing capabilities while making the tool more accessible to users through multiple deployment methods.

What's Changed

feat: Add DOCX parser with VLM integration by @AdemBoukhris457 in #48
feat: Add Hugging Face Spaces deployment with Gradio interface by @AdemBoukhris457 in #47
docs: Update banner image to Doctra_Banner_MultiDoc by @AdemBoukhris457 in #50
fix: Fix MkDocs navigation and broken internal links by @AdemBoukhris457 in #49
release: Prepare v0.6.0 - DOCX Parser & HF Spaces Deployment by @AdemBoukhris457 in [current PR]

📚 Documentation & Project Improvements

docs: Enhanced documentation structure and navigation
fix: Resolved broken internal links across documentation
ui: Updated banner images for better visual representation
deployment: Added comprehensive Hugging Face Spaces deployment guides

📦 Version

v0.5.1 → v0.6.0 (minor version, new features, no breaking changes)

🔧 Enhanced Document Processing

The release now supports multiple document formats with advanced parsing:

  1. PDF - Enhanced layout analysis and table extraction
  2. DOCX - Microsoft Word documents with VLM integration
  3. PowerPoint - Presentation document processing
  4. Images - OCR and layout detection capabilities

🚀 Deployment Options

  • CLI Interface - Command-line tool for developers
  • Python Library - Direct integration in Python projects
  • Hugging Face Spaces - Web-based interface for easy access
  • Gradio Interface - User-friendly web UI for document processing

📚 Documentation Improvements

  • Updated banner images and visual assets
  • Fixed navigation structure and internal links
  • Enhanced deployment guides for HF Spaces
  • Improved user experience across all documentation
  • Comprehensive setup instructions for new deployment options

v0.5.1

11 Oct 22:40
99441f9

Choose a tag to compare

🚀 What's new in v0.5.1

Qianfan Provider: Add Baidu AI Cloud ERNIE model support with OpenAI-compatible interface
OpenRouter Provider: Add access to multiple models via OpenRouter platform
Ollama Provider: Add local model support (no API key required)
Documentation Overhaul: Complete VLM provider documentation coverage across all guides
README Fixes: Correct Doctra logo display and update provider lists
Project Templates: Enhanced contribution workflow with comprehensive templates

✅ Motivation

Expand VLM provider options to support more use cases (cloud providers, local models) while ensuring comprehensive documentation coverage and improved project governance. Backward-compatible patch release with enhanced capabilities.

What's Changed

feat: Add Qianfan ERNIE model support to VLM provider by @AdemBoukhris457 in #43
docs: Complete VLM provider documentation coverage by @AdemBoukhris457 in #44
fix: Correct Doctra logo URL in README by @AdemBoukhris457 in #45
release: Prepare v0.5.1 - Enhanced VLM Support & Documentation by @AdemBoukhris457 in #46

📚 Documentation & Project Improvements

docs: Add comprehensive pull request template by @AdemBoukhris457 in #42
docs: Add/update issue templates (bug, feature, question) by @AdemBoukhris457 in #41
docs: Add SECURITY.md (coordinated vulnerability disclosure) by @AdemBoukhris457 in #40
docs: Add CONTRIBUTING.md (contribution guide) by @AdemBoukhris457 in #39

📦 Version

v0.5.0 → v0.5.1 (patch, no breaking changes)

🔧 Complete VLM Provider Support

The release now supports 6 VLM providers:

  1. OpenAI - GPT-4 Vision, GPT-4o
  2. Gemini - Google's vision models
  3. Anthropic - Claude with vision
  4. OpenRouter - Access multiple models
  5. Qianfan - Baidu AI Cloud ERNIE models
  6. Ollama - Local models (no API key required)

📚 Documentation Improvements

  • Complete VLM provider configuration guides
  • Updated README with all supported providers
  • Enhanced code examples and setup instructions
  • Fixed logo display issues
  • Consistent documentation across all guides
  • Comprehensive contribution and security guidelines
  • Enhanced issue and PR templates for better project governance

v0.5.0

04 Oct 18:34
7a0721c

Choose a tag to compare

🚀 What’s new in v0.5.0

• Ollama provider: Add support across Core, UI (Gradio), and CLI for chart/diagram understanding and table → structured output.
• Docs overhaul: Material for MkDocs site, rendering/logo fixes, asset-path CI fix, and README badges linking to the docs.

Motivation

Bring a new provider option (Ollama) to broaden vision/table pipelines while making the documentation easier to discover and maintain. Backward-compatible minor release.

What’s Changed
• feat: Add Ollama provider (core + UI + CLI) by @AdemBoukhris457 in #32
• docs: Add comprehensive documentation with Material for MkDocs by @AdemBoukhris457 in #33
• docs: Fix MkDocs rendering issues and update documentation logo by @AdemBoukhris457 in #34
• ci/docs: Fix MkDocs asset URLs to resolve CI build issues by @AdemBoukhris457 in #35
• docs(readme): add Doc Status/Docs badges linking to GitHub Pages by @AdemBoukhris457 in #36, #37
• docs: re-add README banner with improved resolution by @AdemBoukhris457 in #31

📦 Version
v0.4.3 → v0.5.0 (minor, no breaking changes)

v0.4.3

02 Oct 21:01
877b9d2

Choose a tag to compare

🚀 What’s new in v0.4.3

  • CLI restored: Re-enable doctra command by registering console_scripts in pyproject.toml and setup.py.
  • Docs polish: New README banner + clearer structure with acknowledgments.

Motivation

Fix the broken CLI for a smoother developer experience and make the project page clearer at a glance. Safe, non-breaking patch.

What’s Changed

  • Fix: restore CLI entrypoint (register console_scripts in pyproject.toml + setup.py) by @AdemBoukhris457 in #28
  • Docs: Refresh README with new banner, clearer structure, and acknowledgments by @AdemBoukhris457 in #27
  • Docs: Update README banner with a new design by @AdemBoukhris457 in #29