Releases · AdemBoukhris457/Doctra

16 Nov 13:18

AdemBoukhris457

v0.9.7

0d2ebf7

v0.9.7 Latest

Latest

🚀 What's new in v0.9.7

PaddleOCR-VL PDF Parser (with Restoration + Split Tables): New PaddleOCRVL-powered PDF parser that combines layout-aware OCR, visual-language understanding, page restoration, and split table merging in a single high-level pipeline.
Split Table Merging Everywhere: Split table detection & merging is now available across ChartTablePDFParser and EnhancedPDFParser, so multi-page tables are reconstructed consistently whether you’re extracting text, tables, or charts.
Restoration-Friendly Flow: The new parser plays nicely with restoration steps (denoising, deblurring, cleanups), improving OCR and structure extraction on noisy reports and scanned PDFs.
Docs Upgrade: Documentation updated to explain when to use the new PaddleOCR-VL parser, how split table merging works across parsers, and how to configure these features in real-world workflows.

✅ Motivation

Doctra is increasingly used on messy, real-world PDFs where tables are split across pages and visual context matters (charts, complex layouts, degraded scans). This release focuses on:

Making split-table merging a first-class feature across multiple parsers.
Introducing a PaddleOCR-VL–based parser that can better understand visual + textual context.
Tightening the integration with restoration so that users get more reliable structured outputs from imperfect documents.

🛠 What’s Changed

feat: Add PaddleOCRVL PDF parser with restoration and split table merging by @AdemBoukhris457 in #82
feat: Add split table merging to ChartTablePDFParser by @AdemBoukhris457 in #81
feat: Add split table merging support to EnhancedPDFParser by @AdemBoukhris457 in #80
docs: Document new PaddleOCR-VL parser & split-table merging behavior (usage, configuration, and examples)

📦 Version

v0.8.0 → v0.9.7
Minor feature-focused release that extends split-table merging across parsers and introduces a PaddleOCR-VL–powered PDF parser with restoration support. No breaking changes to the public API — existing workflows keep working, but gain access to smarter parsing options.

Contributors

AdemBoukhris457

Assets 2

10 Nov 20:04

AdemBoukhris457

v0.8.0

95b5979

v0.8.0

🚀 What's new in v0.8.0

Dependency-Based OCR & VLM Configuration: Doctra’s OCR and VLM engines now use a clean dependency pattern, making it easier to plug in, swap, or extend engines (PaddleOCR, Tesseract, VLMs, etc.) in a consistent way.
Cleaner Engine Setup: Centralized configuration logic reduces duplication, improves readability, and makes it simpler to maintain multi-backend pipelines.
Codebase Cleanup: Removed noisy / redundant comments and streamlined internals for a more professional, focused contributor experience.
Docs Alignment: Documentation updated to reflect the new dependency-based configuration flow so users and contributors can follow the architecture easily.

✅ Motivation

As Doctra adds more OCR engines and VLM backends, a scalable configuration pattern becomes critical. This release focuses on making the engine wiring predictable, extensible, and maintainable, while keeping the public behavior stable and the onboarding experience clearer.

🛠 What’s Changed

refactor: Apply dependency pattern to VLM configuration by @AdemBoukhris457 in #78
refactor: Apply dependency pattern to OCR engine configuration by @AdemBoukhris457 in #77
refactor: Remove unnecessary comments and tidy up codebase by @AdemBoukhris457 in #76

📦 Version

v0.7.1 → v0.8.0
Minor release focused on architecture & maintainability (dependency-based configuration, cleaner code, updated docs). No breaking changes to the public API.

Contributors

AdemBoukhris457

Assets 2

10 Nov 19:58

AdemBoukhris457

v0.7.1

95b5979

v0.7.1

🚀 What's new in v0.7.1

PaddleOCR PP-OCRv5 Server Support: Doctra now supports the high-performance PP-OCRv5_server engine for faster and more accurate OCR in production-style workflows.
Seamless Engine Integration: PP-OCRv5_server plugs into the existing OCR selection flow, so users can easily switch between lightweight and server-grade models depending on their use case.
Docs Updated: README and docs now clearly show how to enable and configure PP-OCRv5_server within Doctra.

✅ Motivation

Many users run Doctra in server or batch environments and need a stronger OCR backend without changing their pipeline. This release introduces first-class support for PP-OCRv5_server, making Doctra more flexible for heavy workloads while keeping configuration simple.

🛠 What’s Changed

feat: Add PaddleOCR PP-OCRv5_server engine support by @AdemBoukhris457 in #73
docs: Update README with PaddleOCR PP-OCRv5_server usage and configuration details by @AdemBoukhris457 in #74

📦 Version

v0.7.0 → v0.7.1 (patch release; new OCR engine option + documentation update, no breaking changes).

Contributors

AdemBoukhris457

Assets 2

02 Nov 09:59

AdemBoukhris457

v0.7.0

7a7f8f9

v0.7.0

🚀 What's new in v0.7.0

Automatic Split Table Detection & Merging: Doctra can now detect when a table is split across two pages (bottom of page → top of next page) and automatically merge them into a single structured table.
Configurable Heuristics: Merging is based on layout proximity + column alignment (via line/structure detection), making it robust for multi-page PDF reports.
New Documentation Section: Added a full “Split Table Merging” guide with flow, conditions, and examples.
Visual Diagrams: Mermaid diagrams added to explain the detection → matching → merge pipeline.
Docs Navigation Fixes: Split-table docs are now properly added to MkDocs nav and broken internal links are fixed.
Better Onboarding: README and docs now include Colab badges, quick start tutorial, and an interactive notebook showcase table.
Poppler Docs Update: Updated Poppler installation URLs to the official source.

✅ Motivation

Many PDFs (invoices, financial statements, hotel reports, academic PDFs) break a long table across pages. Earlier, Doctra extracted them as separate tables. This release focuses on making multi-page tables “just work” out of the box and on documenting the feature clearly so users can extend or tune it.

🛠 What’s Changed

feat: Add automatic split table detection and merging enhancement by @AdemBoukhris457 in #67
docs: Add split table merging documentation by @AdemBoukhris457 in #68
fix(mkdocs): Add split table merging docs to navigation and fix broken links by @AdemBoukhris457 in #69
fix(docs): Fix broken link paths in split table merging documentation by @AdemBoukhris457 in #70
docs: Add Mermaid diagrams to split table merging documentation by @AdemBoukhris457 in #71
release: prepare v0.7.0 – Split Table Merging Feature, documentation enhancement by @AdemBoukhris457 in #72
docs: Add Colab badges to README and docs headers by @AdemBoukhris457 in #65
docs: Add interactive notebook showcase table to README and documentation by @AdemBoukhris457 in #64
docs: Add comprehensive Doctra quick start tutorial notebook by @AdemBoukhris457 in #63
fix(docs): Update Poppler URLs to official website by @AdemBoukhris457 in #66

📦 Version

v0.6.2 → v0.7.0 (minor release; new feature: automatic split-table merging, plus large docs/navigation improvements; no breaking changes for existing parsers).

Contributors

AdemBoukhris457

Assets 2

25 Oct 15:24

AdemBoukhris457

v0.6.2

9c5d74f

v0.6.2

🚀 What's new in v0.6.2

• Enhanced Output Suppression: Comprehensive silence context manager for cleaner PaddleOCR operations
• Google Colab Compatibility: Resolved multiple dependency conflicts and installation issues
• Improved User Experience: Cleaner console output during OCR model loading and processing
• Dependency Management: Standardized google-genai usage and removed conflicting websockets dependencies
• Warning Suppression: Better handling of Hugging Face token warnings in Google Colab environments

✅ Motivation

Improve user experience by providing cleaner output during OCR operations and ensure seamless installation and usage in Google Colab environments. This patch focuses on reliability and user-friendliness.

What's Changed
• feat: Enhance silence context manager for comprehensive PaddleOCR output suppression by @AdemBoukhris457 in #61
• feat: Add silence context manager for PaddleOCR model loading by @AdemBoukhris457 in #60
• fix: Improve Hugging Face warning suppression for Google Colab by @AdemBoukhris457 in #59
• fix: Suppress Hugging Face token warnings in Google Colab by @AdemBoukhris457 in #58
• fix: Resolve gradio-websockets dependency conflict in Google Colab by @AdemBoukhris457 in #57
• fix: Remove google-genai version constraints to resolve websockets conflicts by @AdemBoukhris457 in #56
• fix: Replace google-generativeai with google-genai and standardize versions by @AdemBoukhris457 in #55
• fix: Remove unnecessary websockets dependency to resolve Google Colab installation conflicts by @AdemBoukhris457 in #54

📦 Version

v0.6.1 → v0.6.2 (patch release; enhanced output suppression and Google Colab compatibility fixes, no breaking changes)

Contributors

AdemBoukhris457

Assets 2

25 Oct 10:14

AdemBoukhris457

v0.6.1

81330f7

v0.6.1

🚀 What's new in v0.6.1

• Dependency fixes: Added missing runtime dependencies to prevent ModuleNotFoundError on fresh installs
• Packaging alignment: Synced pyproject/extras with docs to avoid environment drift
• Install reliability: Smoother pip install doctra across clean environments
• Docs tweak: Clarified install commands and extras usage

✅ Motivation

Ensure that the v0.6.0 feature set is accessible without installation hiccups. This patch focuses on reliability so users can get running quickly in new environments.

What's Changed
• fix: Add missing dependencies to resolve installation issues by @AdemBoukhris457 in #52

📦 Version

v0.6.0 → v0.6.1 (patch release; installation and packaging fixes, no breaking changes)

Contributors

AdemBoukhris457

Assets 2

18 Oct 19:56

AdemBoukhris457

v0.6.0

eea1548

v0.6.0

🚀 What's new in v0.6.0

• DOCX Parser: Add Microsoft Word document parsing with VLM integration for enhanced layout analysis
• Hugging Face Spaces: Add web-based deployment with Gradio interface for easy document processing
• Documentation Updates: Updated banner image to Doctra_Banner_MultiDoc for better visual representation
• Navigation Fixes: Resolved MkDocs navigation issues and broken internal links
• Enhanced UX: Improved documentation structure and user experience

✅ Motivation

Expand document processing capabilities to support Microsoft Word documents and provide accessible web-based deployment options through Hugging Face Spaces. This release significantly enhances Doctra's parsing capabilities while making the tool more accessible to users through multiple deployment methods.

What's Changed

• feat: Add DOCX parser with VLM integration by @AdemBoukhris457 in #48
• feat: Add Hugging Face Spaces deployment with Gradio interface by @AdemBoukhris457 in #47
• docs: Update banner image to Doctra_Banner_MultiDoc by @AdemBoukhris457 in #50
• fix: Fix MkDocs navigation and broken internal links by @AdemBoukhris457 in #49
• release: Prepare v0.6.0 - DOCX Parser & HF Spaces Deployment by @AdemBoukhris457 in [current PR]

📚 Documentation & Project Improvements

• docs: Enhanced documentation structure and navigation
• fix: Resolved broken internal links across documentation
• ui: Updated banner images for better visual representation
• deployment: Added comprehensive Hugging Face Spaces deployment guides

📦 Version

v0.5.1 → v0.6.0 (minor version, new features, no breaking changes)

🔧 Enhanced Document Processing

The release now supports multiple document formats with advanced parsing:

PDF - Enhanced layout analysis and table extraction
DOCX - Microsoft Word documents with VLM integration
PowerPoint - Presentation document processing
Images - OCR and layout detection capabilities

🚀 Deployment Options

CLI Interface - Command-line tool for developers
Python Library - Direct integration in Python projects
Hugging Face Spaces - Web-based interface for easy access
Gradio Interface - User-friendly web UI for document processing

📚 Documentation Improvements

Updated banner images and visual assets
Fixed navigation structure and internal links
Enhanced deployment guides for HF Spaces
Improved user experience across all documentation
Comprehensive setup instructions for new deployment options

Contributors

AdemBoukhris457

Assets 2

11 Oct 22:40

AdemBoukhris457

v0.5.1

99441f9

v0.5.1

🚀 What's new in v0.5.1

• Qianfan Provider: Add Baidu AI Cloud ERNIE model support with OpenAI-compatible interface
• OpenRouter Provider: Add access to multiple models via OpenRouter platform
• Ollama Provider: Add local model support (no API key required)
• Documentation Overhaul: Complete VLM provider documentation coverage across all guides
• README Fixes: Correct Doctra logo display and update provider lists
• Project Templates: Enhanced contribution workflow with comprehensive templates

✅ Motivation

Expand VLM provider options to support more use cases (cloud providers, local models) while ensuring comprehensive documentation coverage and improved project governance. Backward-compatible patch release with enhanced capabilities.

What's Changed

• feat: Add Qianfan ERNIE model support to VLM provider by @AdemBoukhris457 in #43
• docs: Complete VLM provider documentation coverage by @AdemBoukhris457 in #44
• fix: Correct Doctra logo URL in README by @AdemBoukhris457 in #45
• release: Prepare v0.5.1 - Enhanced VLM Support & Documentation by @AdemBoukhris457 in #46

📚 Documentation & Project Improvements

• docs: Add comprehensive pull request template by @AdemBoukhris457 in #42
• docs: Add/update issue templates (bug, feature, question) by @AdemBoukhris457 in #41
• docs: Add SECURITY.md (coordinated vulnerability disclosure) by @AdemBoukhris457 in #40
• docs: Add CONTRIBUTING.md (contribution guide) by @AdemBoukhris457 in #39

📦 Version

v0.5.0 → v0.5.1 (patch, no breaking changes)

🔧 Complete VLM Provider Support

The release now supports 6 VLM providers:

OpenAI - GPT-4 Vision, GPT-4o
Gemini - Google's vision models
Anthropic - Claude with vision
OpenRouter - Access multiple models
Qianfan - Baidu AI Cloud ERNIE models
Ollama - Local models (no API key required)

📚 Documentation Improvements

Complete VLM provider configuration guides
Updated README with all supported providers
Enhanced code examples and setup instructions
Fixed logo display issues
Consistent documentation across all guides
Comprehensive contribution and security guidelines
Enhanced issue and PR templates for better project governance

Contributors

AdemBoukhris457

Assets 2

04 Oct 18:34

AdemBoukhris457

v0.5.0

7a0721c

v0.5.0

🚀 What’s new in v0.5.0

• Ollama provider: Add support across Core, UI (Gradio), and CLI for chart/diagram understanding and table → structured output.
• Docs overhaul: Material for MkDocs site, rendering/logo fixes, asset-path CI fix, and README badges linking to the docs.

✅ Motivation

Bring a new provider option (Ollama) to broaden vision/table pipelines while making the documentation easier to discover and maintain. Backward-compatible minor release.

What’s Changed
• feat: Add Ollama provider (core + UI + CLI) by @AdemBoukhris457 in #32
• docs: Add comprehensive documentation with Material for MkDocs by @AdemBoukhris457 in #33
• docs: Fix MkDocs rendering issues and update documentation logo by @AdemBoukhris457 in #34
• ci/docs: Fix MkDocs asset URLs to resolve CI build issues by @AdemBoukhris457 in #35
• docs(readme): add Doc Status/Docs badges linking to GitHub Pages by @AdemBoukhris457 in #36, #37
• docs: re-add README banner with improved resolution by @AdemBoukhris457 in #31

📦 Version
v0.4.3 → v0.5.0 (minor, no breaking changes)

Contributors

AdemBoukhris457

Assets 2

02 Oct 21:01

AdemBoukhris457

v0.4.3

877b9d2

v0.4.3

🚀 What’s new in v0.4.3

CLI restored: Re-enable doctra command by registering console_scripts in pyproject.toml and setup.py.
Docs polish: New README banner + clearer structure with acknowledgments.

✅ Motivation

Fix the broken CLI for a smoother developer experience and make the project page clearer at a glance. Safe, non-breaking patch.

What’s Changed

Fix: restore CLI entrypoint (register console_scripts in pyproject.toml + setup.py) by @AdemBoukhris457 in #28
Docs: Refresh README with new banner, clearer structure, and acknowledgments by @AdemBoukhris457 in #27
Docs: Update README banner with a new design by @AdemBoukhris457 in #29

Contributors

AdemBoukhris457

Assets 2

Releases: AdemBoukhris457/Doctra

v0.9.7

🚀 What's new in v0.9.7

✅ Motivation

🛠 What’s Changed

📦 Version

Contributors

Uh oh!

v0.8.0

🚀 What's new in v0.8.0

✅ Motivation

🛠 What’s Changed

📦 Version

Contributors

Uh oh!

v0.7.1

🚀 What's new in v0.7.1

✅ Motivation

🛠 What’s Changed

📦 Version

Contributors

Uh oh!

v0.7.0

🚀 What's new in v0.7.0

✅ Motivation

🛠 What’s Changed

📦 Version

Contributors

Uh oh!

v0.6.2

🚀 What's new in v0.6.2

✅ Motivation

📦 Version

Contributors

Uh oh!

v0.6.1

🚀 What's new in v0.6.1

✅ Motivation

📦 Version

Contributors

Uh oh!

v0.6.0

🚀 What's new in v0.6.0

✅ Motivation

What's Changed

📚 Documentation & Project Improvements

📦 Version

🔧 Enhanced Document Processing

🚀 Deployment Options

📚 Documentation Improvements

Contributors

Uh oh!

v0.5.1

🚀 What's new in v0.5.1

✅ Motivation

What's Changed

📚 Documentation & Project Improvements

📦 Version

🔧 Complete VLM Provider Support

📚 Documentation Improvements

Contributors

Uh oh!

v0.5.0

🚀 What’s new in v0.5.0

✅ Motivation

Contributors

Uh oh!

v0.4.3

🚀 What’s new in v0.4.3

✅ Motivation

What’s Changed

Contributors

Uh oh!