Skip to content

Commit e316cfc

Browse files
author
marwan37
committed
update README
1 parent 9ad7c8a commit e316cfc

File tree

1 file changed

+37
-25
lines changed

1 file changed

+37
-25
lines changed

omni-reader/README.md

Lines changed: 37 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,15 @@
11
# OmniReader - Multi-model text extraction comparison
22

3-
OmniReader is a document processing workflow that ingests unstructured documents (PDFs, images, scans) and extracts text using multiple OCR models - specifically Gemma 3 and Mistral AI Pixtral12B. It provides side-by-side comparison of extraction results, highlighting differences in accuracy, formatting, and content recognition. This dual-model approach allows users to evaluate OCR performance across different document types, languages, and formatting complexity. OmniReader delivers reproducible, automated, and cloud-agnostic analysis, with comprehensive metrics on extraction quality, processing time, and confidence scores for each model.
3+
OmniReader is a document processing workflow that ingests unstructured documents (PDFs, images, scans) and extracts text using multiple OCR models. It provides side-by-side comparison of extraction results, highlighting differences in accuracy, formatting, and content recognition. The multi-model approach allows users to evaluate OCR performance across different document types, languages, and formatting complexity. OmniReader delivers reproducible, automated, and cloud-agnostic analysis, with comprehensive metrics on extraction quality, processing time, and confidence scores for each model. It also supports parallel processing for faster batch operations and can compare an arbitrary number of models simultaneously.
44

55
## 🚀 Getting Started
66

77
### Prerequisites
88

99
- Python 3.9+
10-
- [Ollama](https://ollama.ai/) installed and configured for Gemma3
1110
- Mistral API key (set as environment variable `MISTRAL_API_KEY`)
12-
- ZenML installed and configured
11+
- OpenAI API key (set as environment variable `OPENAI_API_KEY`)
12+
- ZenML >= 0.80.0
1313

1414
### Installation
1515

@@ -23,35 +23,32 @@ pip install -r requirements.txt
2323
1. Ensure Ollama is running with the required models:
2424

2525
```bash
26-
# For using the default Gemma3 model
27-
ollama pull gemma3:27b
26+
# For using the default Qwen2 model
27+
ollama pull llama3.2-vision:11b
28+
ollama pull gemma3:12b
2829

2930
# If using other Ollama models in your config, pull those as well
3031
# ollama pull llama3:70b
3132
# ollama pull dolphin-mixtral:8x7b
3233
```
3334

34-
NOTE: When using Ollama models in your configuration, you must ensure they are pulled locally on your system.
35-
36-
2. Set your Mistral API key:
35+
2. Set the following environment variables:
3736

3837
```bash
39-
export MISTRAL_API_KEY=your_mistral_api_key
38+
MISTRAL_API_KEY=your_mistral_api_key
39+
OPENAI_API_KEY=your_openai_api_key
4040
```
4141

4242
## 📌 Usage
4343

4444
### Using YAML Configuration (Recommended)
4545

4646
```bash
47-
# Generate a default configuration file
48-
python run.py --create-default-config my_config.yaml
47+
# Use the default config (ocr_config.yaml)
48+
python run.py
4949

50-
# Run with a configuration file
50+
# Run with a custom config file
5151
python run.py --config my_config.yaml
52-
53-
# Generate a configuration for a specific experiment
54-
python config_generator.py --name invoice_test --image-folder ./invoices --ground-truth-source openai
5552
```
5653

5754
### Configuration Structure
@@ -66,9 +63,12 @@ input:
6663

6764
# OCR model configuration
6865
models:
69-
custom_prompt: null # Optional custom prompt for both models
70-
model1: "ollama/gemma3:27b" # First model for comparison
71-
model2: "pixtral-12b-2409" # Second model for comparison
66+
custom_prompt: null # Optional custom prompt for all models
67+
# Either specify individual models (for backward compatibility)
68+
model1: "llama3.2-vision:11b" # First model for comparison
69+
model2: "gemma3:12b" # Second model for comparison
70+
# Or specify multiple models as a list (new approach)
71+
models: ["llama3.2-vision:11b", "gemma3:12b"]
7272
ground_truth_model: "gpt-4o-mini" # Model to use for ground truth when source is "openai"
7373

7474
# Ground truth configuration
@@ -114,6 +114,12 @@ python run.py --list-ground-truth-files
114114

115115
# For quicker processing of a single image without metadata or artifact tracking
116116
python run_compare_ocr.py --image assets/your_image.jpg --model both
117+
118+
# Run comparison with multiple specific models in parallel
119+
python run_compare_ocr.py --image assets/your_image.jpg --model "gemma3:12b,llama3.2-vision:11b,moondream"
120+
121+
# Run comparison with all available models in parallel
122+
python run_compare_ocr.py --image assets/your_image.jpg --model all
117123
```
118124

119125
### Using the Streamlit App
@@ -130,12 +136,15 @@ The OCR comparison pipeline consists of the following components:
130136

131137
### Steps
132138

133-
1. **Model 1 OCR Step**: Processes images with the first model (default: Ollama Gemma3)
134-
2. **Model 2 OCR Step**: Processes images with the second model (default: Pixtral 12B)
135-
3. **Ground Truth Step**: Optional step that uses a reference model for evaluation (default: GPT-4o Mini)
136-
4. **Evaluation Step**: Compares results and calculates metrics
139+
1. **Multi-Model OCR Step**: Processes images with multiple models in parallel
140+
- Supports any number of models defined in configuration
141+
- Models run in parallel using ThreadPoolExecutor
142+
- Each model processes its assigned images with parallelized execution
143+
- Progress tracking during batch processing
144+
2. **Ground Truth Step**: Optional step that uses a reference model for evaluation (default: GPT-4o Mini)
145+
3. **Evaluation Step**: Compares results and calculates metrics
137146

138-
The pipeline now supports configurable models, allowing you to easily swap out the models used for OCR comparison and ground truth generation via the YAML configuration file.
147+
The pipeline supports configurable models, allowing you to easily swap out the models used for OCR comparison and ground truth generation via the YAML configuration file. It also supports processing any number of models in parallel for more comprehensive comparisons.
139148

140149
### Configuration Management
141150

@@ -145,6 +154,8 @@ The new configuration system provides:
145154
- Parameter validation and intelligent defaults
146155
- Easy sharing and version control of experiment settings
147156
- Configuration generator for quickly creating new experiment setups
157+
- Support for multi-model configuration via arrays
158+
- Flexible model selection and comparison
148159

149160
### Metadata Tracking
150161

@@ -153,6 +164,8 @@ ZenML's metadata tracking is used throughout the pipeline:
153164
- Processing times and performance metrics
154165
- Extracted text length and entity counts
155166
- Comparison metrics between models (CER, WER)
167+
- Progress tracking for batch operations
168+
- Parallel processing statistics
156169

157170
### Results Visualization
158171

@@ -178,5 +191,4 @@ omni-reader/
178191
## 🔗 Links
179192

180193
- [ZenML Documentation](https://docs.zenml.io/)
181-
- [Mistral AI Vision Documentation](https://docs.mistral.ai/capabilities/vision/)
182-
- [Gemma3 Documentation](https://ai.google.dev/gemma/docs/core)
194+
- [Mistral AI Vision](https://docs.mistral.ai/capabilities/vision/)

0 commit comments

Comments
 (0)