Releases: AdemBoukhris457/Doctra
v0.9.7
🚀 What's new in v0.9.7
- PaddleOCR-VL PDF Parser (with Restoration + Split Tables): New
PaddleOCRVL-powered PDF parser that combines layout-aware OCR, visual-language understanding, page restoration, and split table merging in a single high-level pipeline. - Split Table Merging Everywhere: Split table detection & merging is now available across ChartTablePDFParser and EnhancedPDFParser, so multi-page tables are reconstructed consistently whether you’re extracting text, tables, or charts.
- Restoration-Friendly Flow: The new parser plays nicely with restoration steps (denoising, deblurring, cleanups), improving OCR and structure extraction on noisy reports and scanned PDFs.
- Docs Upgrade: Documentation updated to explain when to use the new PaddleOCR-VL parser, how split table merging works across parsers, and how to configure these features in real-world workflows.
✅ Motivation
Doctra is increasingly used on messy, real-world PDFs where tables are split across pages and visual context matters (charts, complex layouts, degraded scans). This release focuses on:
- Making split-table merging a first-class feature across multiple parsers.
- Introducing a PaddleOCR-VL–based parser that can better understand visual + textual context.
- Tightening the integration with restoration so that users get more reliable structured outputs from imperfect documents.
🛠 What’s Changed
- feat: Add PaddleOCRVL PDF parser with restoration and split table merging by @AdemBoukhris457 in #82
- feat: Add split table merging to
ChartTablePDFParserby @AdemBoukhris457 in #81 - feat: Add split table merging support to
EnhancedPDFParserby @AdemBoukhris457 in #80 - docs: Document new PaddleOCR-VL parser & split-table merging behavior (usage, configuration, and examples)
📦 Version
v0.8.0 → v0.9.7
Minor feature-focused release that extends split-table merging across parsers and introduces a PaddleOCR-VL–powered PDF parser with restoration support. No breaking changes to the public API — existing workflows keep working, but gain access to smarter parsing options.
v0.8.0
🚀 What's new in v0.8.0
- Dependency-Based OCR & VLM Configuration: Doctra’s OCR and VLM engines now use a clean dependency pattern, making it easier to plug in, swap, or extend engines (PaddleOCR, Tesseract, VLMs, etc.) in a consistent way.
- Cleaner Engine Setup: Centralized configuration logic reduces duplication, improves readability, and makes it simpler to maintain multi-backend pipelines.
- Codebase Cleanup: Removed noisy / redundant comments and streamlined internals for a more professional, focused contributor experience.
- Docs Alignment: Documentation updated to reflect the new dependency-based configuration flow so users and contributors can follow the architecture easily.
✅ Motivation
As Doctra adds more OCR engines and VLM backends, a scalable configuration pattern becomes critical. This release focuses on making the engine wiring predictable, extensible, and maintainable, while keeping the public behavior stable and the onboarding experience clearer.
🛠 What’s Changed
- refactor: Apply dependency pattern to VLM configuration by @AdemBoukhris457 in #78
- refactor: Apply dependency pattern to OCR engine configuration by @AdemBoukhris457 in #77
- refactor: Remove unnecessary comments and tidy up codebase by @AdemBoukhris457 in #76
📦 Version
v0.7.1 → v0.8.0
Minor release focused on architecture & maintainability (dependency-based configuration, cleaner code, updated docs). No breaking changes to the public API.
v0.7.1
🚀 What's new in v0.7.1
- PaddleOCR PP-OCRv5 Server Support: Doctra now supports the high-performance PP-OCRv5_server engine for faster and more accurate OCR in production-style workflows.
- Seamless Engine Integration: PP-OCRv5_server plugs into the existing OCR selection flow, so users can easily switch between lightweight and server-grade models depending on their use case.
- Docs Updated: README and docs now clearly show how to enable and configure PP-OCRv5_server within Doctra.
✅ Motivation
Many users run Doctra in server or batch environments and need a stronger OCR backend without changing their pipeline. This release introduces first-class support for PP-OCRv5_server, making Doctra more flexible for heavy workloads while keeping configuration simple.
🛠 What’s Changed
- feat: Add PaddleOCR PP-OCRv5_server engine support by @AdemBoukhris457 in #73
- docs: Update README with PaddleOCR PP-OCRv5_server usage and configuration details by @AdemBoukhris457 in #74
📦 Version
v0.7.0 → v0.7.1 (patch release; new OCR engine option + documentation update, no breaking changes).
v0.7.0
🚀 What's new in v0.7.0
- Automatic Split Table Detection & Merging: Doctra can now detect when a table is split across two pages (bottom of page → top of next page) and automatically merge them into a single structured table.
- Configurable Heuristics: Merging is based on layout proximity + column alignment (via line/structure detection), making it robust for multi-page PDF reports.
- New Documentation Section: Added a full “Split Table Merging” guide with flow, conditions, and examples.
- Visual Diagrams: Mermaid diagrams added to explain the detection → matching → merge pipeline.
- Docs Navigation Fixes: Split-table docs are now properly added to MkDocs nav and broken internal links are fixed.
- Better Onboarding: README and docs now include Colab badges, quick start tutorial, and an interactive notebook showcase table.
- Poppler Docs Update: Updated Poppler installation URLs to the official source.
✅ Motivation
Many PDFs (invoices, financial statements, hotel reports, academic PDFs) break a long table across pages. Earlier, Doctra extracted them as separate tables. This release focuses on making multi-page tables “just work” out of the box and on documenting the feature clearly so users can extend or tune it.
🛠 What’s Changed
- feat: Add automatic split table detection and merging enhancement by @AdemBoukhris457 in #67
- docs: Add split table merging documentation by @AdemBoukhris457 in #68
- fix(mkdocs): Add split table merging docs to navigation and fix broken links by @AdemBoukhris457 in #69
- fix(docs): Fix broken link paths in split table merging documentation by @AdemBoukhris457 in #70
- docs: Add Mermaid diagrams to split table merging documentation by @AdemBoukhris457 in #71
- release: prepare v0.7.0 – Split Table Merging Feature, documentation enhancement by @AdemBoukhris457 in #72
- docs: Add Colab badges to README and docs headers by @AdemBoukhris457 in #65
- docs: Add interactive notebook showcase table to README and documentation by @AdemBoukhris457 in #64
- docs: Add comprehensive Doctra quick start tutorial notebook by @AdemBoukhris457 in #63
- fix(docs): Update Poppler URLs to official website by @AdemBoukhris457 in #66
📦 Version
v0.6.2 → v0.7.0 (minor release; new feature: automatic split-table merging, plus large docs/navigation improvements; no breaking changes for existing parsers).
v0.6.2
🚀 What's new in v0.6.2
• Enhanced Output Suppression: Comprehensive silence context manager for cleaner PaddleOCR operations
• Google Colab Compatibility: Resolved multiple dependency conflicts and installation issues
• Improved User Experience: Cleaner console output during OCR model loading and processing
• Dependency Management: Standardized google-genai usage and removed conflicting websockets dependencies
• Warning Suppression: Better handling of Hugging Face token warnings in Google Colab environments
✅ Motivation
Improve user experience by providing cleaner output during OCR operations and ensure seamless installation and usage in Google Colab environments. This patch focuses on reliability and user-friendliness.
What's Changed
• feat: Enhance silence context manager for comprehensive PaddleOCR output suppression by @AdemBoukhris457 in #61
• feat: Add silence context manager for PaddleOCR model loading by @AdemBoukhris457 in #60
• fix: Improve Hugging Face warning suppression for Google Colab by @AdemBoukhris457 in #59
• fix: Suppress Hugging Face token warnings in Google Colab by @AdemBoukhris457 in #58
• fix: Resolve gradio-websockets dependency conflict in Google Colab by @AdemBoukhris457 in #57
• fix: Remove google-genai version constraints to resolve websockets conflicts by @AdemBoukhris457 in #56
• fix: Replace google-generativeai with google-genai and standardize versions by @AdemBoukhris457 in #55
• fix: Remove unnecessary websockets dependency to resolve Google Colab installation conflicts by @AdemBoukhris457 in #54
📦 Version
v0.6.1 → v0.6.2 (patch release; enhanced output suppression and Google Colab compatibility fixes, no breaking changes)
v0.6.1
🚀 What's new in v0.6.1
• Dependency fixes: Added missing runtime dependencies to prevent ModuleNotFoundError on fresh installs
• Packaging alignment: Synced pyproject/extras with docs to avoid environment drift
• Install reliability: Smoother pip install doctra across clean environments
• Docs tweak: Clarified install commands and extras usage
✅ Motivation
Ensure that the v0.6.0 feature set is accessible without installation hiccups. This patch focuses on reliability so users can get running quickly in new environments.
What's Changed
• fix: Add missing dependencies to resolve installation issues by @AdemBoukhris457 in #52
📦 Version
v0.6.0 → v0.6.1 (patch release; installation and packaging fixes, no breaking changes)
v0.6.0
🚀 What's new in v0.6.0
• DOCX Parser: Add Microsoft Word document parsing with VLM integration for enhanced layout analysis
• Hugging Face Spaces: Add web-based deployment with Gradio interface for easy document processing
• Documentation Updates: Updated banner image to Doctra_Banner_MultiDoc for better visual representation
• Navigation Fixes: Resolved MkDocs navigation issues and broken internal links
• Enhanced UX: Improved documentation structure and user experience
✅ Motivation
Expand document processing capabilities to support Microsoft Word documents and provide accessible web-based deployment options through Hugging Face Spaces. This release significantly enhances Doctra's parsing capabilities while making the tool more accessible to users through multiple deployment methods.
What's Changed
• feat: Add DOCX parser with VLM integration by @AdemBoukhris457 in #48
• feat: Add Hugging Face Spaces deployment with Gradio interface by @AdemBoukhris457 in #47
• docs: Update banner image to Doctra_Banner_MultiDoc by @AdemBoukhris457 in #50
• fix: Fix MkDocs navigation and broken internal links by @AdemBoukhris457 in #49
• release: Prepare v0.6.0 - DOCX Parser & HF Spaces Deployment by @AdemBoukhris457 in [current PR]
📚 Documentation & Project Improvements
• docs: Enhanced documentation structure and navigation
• fix: Resolved broken internal links across documentation
• ui: Updated banner images for better visual representation
• deployment: Added comprehensive Hugging Face Spaces deployment guides
📦 Version
v0.5.1 → v0.6.0 (minor version, new features, no breaking changes)
🔧 Enhanced Document Processing
The release now supports multiple document formats with advanced parsing:
- PDF - Enhanced layout analysis and table extraction
- DOCX - Microsoft Word documents with VLM integration
- PowerPoint - Presentation document processing
- Images - OCR and layout detection capabilities
🚀 Deployment Options
- CLI Interface - Command-line tool for developers
- Python Library - Direct integration in Python projects
- Hugging Face Spaces - Web-based interface for easy access
- Gradio Interface - User-friendly web UI for document processing
📚 Documentation Improvements
- Updated banner images and visual assets
- Fixed navigation structure and internal links
- Enhanced deployment guides for HF Spaces
- Improved user experience across all documentation
- Comprehensive setup instructions for new deployment options
v0.5.1
🚀 What's new in v0.5.1
• Qianfan Provider: Add Baidu AI Cloud ERNIE model support with OpenAI-compatible interface
• OpenRouter Provider: Add access to multiple models via OpenRouter platform
• Ollama Provider: Add local model support (no API key required)
• Documentation Overhaul: Complete VLM provider documentation coverage across all guides
• README Fixes: Correct Doctra logo display and update provider lists
• Project Templates: Enhanced contribution workflow with comprehensive templates
✅ Motivation
Expand VLM provider options to support more use cases (cloud providers, local models) while ensuring comprehensive documentation coverage and improved project governance. Backward-compatible patch release with enhanced capabilities.
What's Changed
• feat: Add Qianfan ERNIE model support to VLM provider by @AdemBoukhris457 in #43
• docs: Complete VLM provider documentation coverage by @AdemBoukhris457 in #44
• fix: Correct Doctra logo URL in README by @AdemBoukhris457 in #45
• release: Prepare v0.5.1 - Enhanced VLM Support & Documentation by @AdemBoukhris457 in #46
📚 Documentation & Project Improvements
• docs: Add comprehensive pull request template by @AdemBoukhris457 in #42
• docs: Add/update issue templates (bug, feature, question) by @AdemBoukhris457 in #41
• docs: Add SECURITY.md (coordinated vulnerability disclosure) by @AdemBoukhris457 in #40
• docs: Add CONTRIBUTING.md (contribution guide) by @AdemBoukhris457 in #39
📦 Version
v0.5.0 → v0.5.1 (patch, no breaking changes)
🔧 Complete VLM Provider Support
The release now supports 6 VLM providers:
- OpenAI - GPT-4 Vision, GPT-4o
- Gemini - Google's vision models
- Anthropic - Claude with vision
- OpenRouter - Access multiple models
- Qianfan - Baidu AI Cloud ERNIE models
- Ollama - Local models (no API key required)
📚 Documentation Improvements
- Complete VLM provider configuration guides
- Updated README with all supported providers
- Enhanced code examples and setup instructions
- Fixed logo display issues
- Consistent documentation across all guides
- Comprehensive contribution and security guidelines
- Enhanced issue and PR templates for better project governance
v0.5.0
🚀 What’s new in v0.5.0
• Ollama provider: Add support across Core, UI (Gradio), and CLI for chart/diagram understanding and table → structured output.
• Docs overhaul: Material for MkDocs site, rendering/logo fixes, asset-path CI fix, and README badges linking to the docs.
✅ Motivation
Bring a new provider option (Ollama) to broaden vision/table pipelines while making the documentation easier to discover and maintain. Backward-compatible minor release.
What’s Changed
• feat: Add Ollama provider (core + UI + CLI) by @AdemBoukhris457 in #32
• docs: Add comprehensive documentation with Material for MkDocs by @AdemBoukhris457 in #33
• docs: Fix MkDocs rendering issues and update documentation logo by @AdemBoukhris457 in #34
• ci/docs: Fix MkDocs asset URLs to resolve CI build issues by @AdemBoukhris457 in #35
• docs(readme): add Doc Status/Docs badges linking to GitHub Pages by @AdemBoukhris457 in #36, #37
• docs: re-add README banner with improved resolution by @AdemBoukhris457 in #31
📦 Version
v0.4.3 → v0.5.0 (minor, no breaking changes)
v0.4.3
🚀 What’s new in v0.4.3
- CLI restored: Re-enable
doctracommand by registeringconsole_scriptsinpyproject.tomlandsetup.py. - Docs polish: New README banner + clearer structure with acknowledgments.
✅ Motivation
Fix the broken CLI for a smoother developer experience and make the project page clearer at a glance. Safe, non-breaking patch.
What’s Changed
- Fix: restore CLI entrypoint (register
console_scriptsinpyproject.toml+setup.py) by @AdemBoukhris457 in #28 - Docs: Refresh README with new banner, clearer structure, and acknowledgments by @AdemBoukhris457 in #27
- Docs: Update README banner with a new design by @AdemBoukhris457 in #29