You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# OmniReader - Multi-model text extraction comparison
1
+
# OmniReader
2
2
3
-
OmniReader is a document processing workflow that ingests unstructured documents (PDFs, images, scans) and extracts text using multiple OCR models. It provides side-by-side comparison of extraction results, highlighting differences in accuracy, formatting, and content recognition. The multi-model approach allows users to evaluate OCR performance across different document types, languages, and formatting complexity. OmniReader delivers reproducible, automated, and cloud-agnostic analysis, with comprehensive metrics on extraction quality, processing time, and confidence scores for each model. It also supports parallel processing for faster batch operations and can compare an arbitrary number of models simultaneously.
3
+
A scalable multi-model text extraction solution for unstructured documents.
OmniReader is built for teams who routinely work with unstructured documents (e.g., PDFs, images, scanned forms) and want a scalable workflow for structured text extraction. It provides an end-to-end batch OCR pipeline with optional multi-model comparison to help ML engineers evaluate different OCR solutions before deployment.
11
+
12
+
<divalign="center">
13
+
<imgsrc="assets/demo/html_visualization.png"alt="HTML Visualization of OCR Results"width="800"/>
14
+
<p><em>HTML visualization showing metrics and comparison results from the OCR pipeline</em></p>
15
+
</div>
16
+
17
+
## 🌟 Key Features
18
+
19
+
-**End-to-end workflow management** from evaluation to production deployment
20
+
-**Multi-model comparison** to identify the best model for your specific document types
21
+
-**Scalable batch processing** that can handle enterprise document volumes
22
+
-**Quantitative evaluation metrics** to inform business and technical decisions
23
+
-**ZenML integration** providing reproducibility, cloud-agnostic deployment, and monitoring
24
+
25
+
## 🎭 How It Works
26
+
27
+
OmniReader provides two primary pipeline workflows that can be run separately:
28
+
29
+
1.**Batch OCR Pipeline**: Run large batches of documents through a single model to extract structured text and metadata.
30
+
2.**Evaluation Pipeline**: Compare multiple OCR models side-by-side and generate evaluation reports using CER/WER and HTML visualizations against ground truth text files.
31
+
32
+
Behind the scenes, OmniReader leverages state-of-the-art vision-language models and ZenML's MLOps framework to create a reproducible, scalable document processing system.
33
+
34
+
## 📚 Supported Models
35
+
36
+
OmniReader supports a wide range of OCR models, including:
37
+
38
+
-**Mistral/pixtral-12b-2409**: Mistral AI's vision-language model specializing in document understanding with strong OCR capabilities for complex layouts.
39
+
-**GPT-4o-mini**: OpenAI's efficient vision model offering a good balance of accuracy and speed for general document processing tasks.
40
+
-**Gemma3:27b**: Google's open-source multimodal model supporting 140+ languages with a 128K context window, optimized for text extraction from diverse document types.
41
+
-**Llava:34b**: Large multilingual vision-language model with strong performance on document understanding tasks requiring contextual interpretation.
42
+
-**Llava-phi3**: Microsoft's efficient multimodal model combining phi-3 language capabilities with vision understanding, ideal for mixed text-image documents.
43
+
-**Granite3.2-vision**: Specialized for visual document understanding, offering excellent performance on tables, charts, and technical diagrams.
44
+
45
+
> ⚠️ Note: For production deployments, we recommend using the non-GGUF hosted model versions via their respective APIs for better performance and accuracy. The Ollama models mentioned here are primarily for convenience.
4
46
5
47
## 🚀 Getting Started
6
48
@@ -10,215 +52,140 @@ OmniReader is a document processing workflow that ingests unstructured documents
10
52
- Mistral API key (set as environment variable `MISTRAL_API_KEY`)
11
53
- OpenAI API key (set as environment variable `OPENAI_API_KEY`)
<p><em>Side-by-side comparison of OCR results across different models</em></p>
131
119
</div>
132
120
133
-
### ### Remote Artifact Storage and Execution
121
+
##☁️ Cloud Deployment
134
122
135
-
This project supports storing artifacts remotely and executing pipelines on cloud infrastructure. Follow these steps to configure your environment for remote operation:
123
+
OmniReader supports storing artifacts remotely and executing pipelines on cloud infrastructure:
136
124
137
-
#### Setting Up Cloud Provider Integrations
138
-
139
-
Install the appropriate ZenML integrations for your preferred cloud provider:
125
+
### Set Up Cloud Provider Integrations
140
126
141
127
```bash
142
128
# For AWS
143
-
zenml integration install aws s3 -y
129
+
zenml integration install aws s3
144
130
145
131
# For Azure
146
-
zenml integration install azure azure-blob -y
132
+
zenml integration install azure
147
133
148
134
# For Google Cloud
149
-
zenml integration install gcp gcs -y
135
+
zenml integration install gcp gcs
150
136
```
151
137
152
-
For detailed configuration options and other components, refer to the ZenML documentation:
The OCR comparison pipeline consists of the following components:
160
-
161
-
### Steps
162
-
163
-
1.**Multi-Model OCR Step**: Processes images with multiple models in parallel
164
-
- Supports any number of models defined in configuration
165
-
- Models run in parallel using ThreadPoolExecutor
166
-
- Each model processes its assigned images with parallelized execution
167
-
- Progress tracking during batch processing
168
-
2.**Ground Truth Step**: Optional step that uses a reference model for evaluation (default: GPT-4o Mini)
169
-
3.**Evaluation Step**: Compares results and calculates metrics
170
-
171
-
The pipeline supports configurable models, allowing you to easily swap out the models used for OCR comparison and ground truth generation via the YAML configuration file. It also supports processing any number of models in parallel for more comprehensive comparisons.
172
-
173
-
### Configuration Management
138
+
Run your pipeline in the cloud:
174
139
175
-
The new configuration system provides:
176
-
177
-
- Structured YAML files for experiment parameters
178
-
- Parameter validation and intelligent defaults
179
-
- Easy sharing and version control of experiment settings
180
-
- Configuration generator for quickly creating new experiment setups
181
-
- Support for multi-model configuration via arrays
182
-
- Flexible model selection and comparison
183
-
184
-
### Metadata Tracking
185
-
186
-
ZenML's metadata tracking is used throughout the pipeline:
140
+
```bash
141
+
# Configure your cloud stack
142
+
zenml stack register my-cloud-stack -a cloud-artifact-store -o cloud-orchestrator
143
+
```
187
144
188
-
- Processing times and performance metrics
189
-
- Extracted text length and entity counts
190
-
- Comparison metrics between models (CER, WER)
191
-
- Progress tracking for batch operations
192
-
- Parallel processing statistics
145
+
For detailed configuration options and other components, refer to the ZenML documentation:
0 commit comments