You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: omni-reader/README.md
+37-25Lines changed: 37 additions & 25 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,15 +1,15 @@
1
1
# OmniReader - Multi-model text extraction comparison
2
2
3
-
OmniReader is a document processing workflow that ingests unstructured documents (PDFs, images, scans) and extracts text using multiple OCR models - specifically Gemma 3 and Mistral AI Pixtral12B. It provides side-by-side comparison of extraction results, highlighting differences in accuracy, formatting, and content recognition. This dual-model approach allows users to evaluate OCR performance across different document types, languages, and formatting complexity. OmniReader delivers reproducible, automated, and cloud-agnostic analysis, with comprehensive metrics on extraction quality, processing time, and confidence scores for each model.
3
+
OmniReader is a document processing workflow that ingests unstructured documents (PDFs, images, scans) and extracts text using multiple OCR models. It provides side-by-side comparison of extraction results, highlighting differences in accuracy, formatting, and content recognition. The multi-model approach allows users to evaluate OCR performance across different document types, languages, and formatting complexity. OmniReader delivers reproducible, automated, and cloud-agnostic analysis, with comprehensive metrics on extraction quality, processing time, and confidence scores for each model. It also supports parallel processing for faster batch operations and can compare an arbitrary number of models simultaneously.
4
4
5
5
## 🚀 Getting Started
6
6
7
7
### Prerequisites
8
8
9
9
- Python 3.9+
10
-
-[Ollama](https://ollama.ai/) installed and configured for Gemma3
11
10
- Mistral API key (set as environment variable `MISTRAL_API_KEY`)
12
-
- ZenML installed and configured
11
+
- OpenAI API key (set as environment variable `OPENAI_API_KEY`)
# Run comparison with all available models in parallel
122
+
python run_compare_ocr.py --image assets/your_image.jpg --model all
117
123
```
118
124
119
125
### Using the Streamlit App
@@ -130,12 +136,15 @@ The OCR comparison pipeline consists of the following components:
130
136
131
137
### Steps
132
138
133
-
1.**Model 1 OCR Step**: Processes images with the first model (default: Ollama Gemma3)
134
-
2.**Model 2 OCR Step**: Processes images with the second model (default: Pixtral 12B)
135
-
3.**Ground Truth Step**: Optional step that uses a reference model for evaluation (default: GPT-4o Mini)
136
-
4.**Evaluation Step**: Compares results and calculates metrics
139
+
1.**Multi-Model OCR Step**: Processes images with multiple models in parallel
140
+
- Supports any number of models defined in configuration
141
+
- Models run in parallel using ThreadPoolExecutor
142
+
- Each model processes its assigned images with parallelized execution
143
+
- Progress tracking during batch processing
144
+
2.**Ground Truth Step**: Optional step that uses a reference model for evaluation (default: GPT-4o Mini)
145
+
3.**Evaluation Step**: Compares results and calculates metrics
137
146
138
-
The pipeline now supports configurable models, allowing you to easily swap out the models used for OCR comparison and ground truth generation via the YAML configuration file.
147
+
The pipeline supports configurable models, allowing you to easily swap out the models used for OCR comparison and ground truth generation via the YAML configuration file. It also supports processing any number of models in parallel for more comprehensive comparisons.
139
148
140
149
### Configuration Management
141
150
@@ -145,6 +154,8 @@ The new configuration system provides:
145
154
- Parameter validation and intelligent defaults
146
155
- Easy sharing and version control of experiment settings
147
156
- Configuration generator for quickly creating new experiment setups
157
+
- Support for multi-model configuration via arrays
158
+
- Flexible model selection and comparison
148
159
149
160
### Metadata Tracking
150
161
@@ -153,6 +164,8 @@ ZenML's metadata tracking is used throughout the pipeline:
153
164
- Processing times and performance metrics
154
165
- Extracted text length and entity counts
155
166
- Comparison metrics between models (CER, WER)
167
+
- Progress tracking for batch operations
168
+
- Parallel processing statistics
156
169
157
170
### Results Visualization
158
171
@@ -178,5 +191,4 @@ omni-reader/
178
191
## 🔗 Links
179
192
180
193
-[ZenML Documentation](https://docs.zenml.io/)
181
-
-[Mistral AI Vision Documentation](https://docs.mistral.ai/capabilities/vision/)
0 commit comments