Skip to content

Commit ee4bc8d

Browse files
author
marwan37
committed
update README.md
1 parent 8f44467 commit ee4bc8d

File tree

1 file changed

+21
-14
lines changed

1 file changed

+21
-14
lines changed

omni-reader/README.md

Lines changed: 21 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -20,12 +20,19 @@ pip install -r requirements.txt
2020

2121
### Configuration
2222

23-
1. Ensure Ollama is running with Gemma3 model:
23+
1. Ensure Ollama is running with the required models:
2424

2525
```bash
26+
# For using the default Gemma3 model
2627
ollama pull gemma3:27b
28+
29+
# If using other Ollama models in your config, pull those as well
30+
# ollama pull llama3:70b
31+
# ollama pull dolphin-mixtral:8x7b
2732
```
2833

34+
NOTE: When using Ollama models in your configuration, you must ensure they are pulled locally on your system.
35+
2936
2. Set your Mistral API key:
3037

3138
```bash
@@ -36,8 +43,6 @@ export MISTRAL_API_KEY=your_mistral_api_key
3643

3744
### Using YAML Configuration (Recommended)
3845

39-
OmniReader now supports YAML configuration files for easier management of pipeline parameters:
40-
4146
```bash
4247
# Generate a default configuration file
4348
python run.py --create-default-config my_config.yaml
@@ -58,23 +63,25 @@ YAML configuration files organize parameters into logical sections:
5863
input:
5964
image_paths: [] # List of specific image paths
6065
image_folder: "./assets" # Folder containing images
61-
image_patterns: ["*.jpg", "*.jpeg", "*.png", "*.webp"] # Glob patterns
6266

6367
# OCR model configuration
6468
models:
6569
custom_prompt: null # Optional custom prompt for both models
70+
model1: "ollama/gemma3:27b" # First model for comparison
71+
model2: "pixtral-12b-2409" # Second model for comparison
72+
ground_truth_model: "gpt-4o-mini" # Model to use for ground truth when source is "openai"
6673

6774
# Ground truth configuration
6875
ground_truth:
69-
source: "none" # Source: "openai", "manual", "file", or "none"
76+
source: "openai" # Source: "openai", "manual", "file", or "none"
7077
texts: [] # Ground truth texts (for manual source)
7178
file: null # Path to ground truth JSON file (for file source)
7279

7380
# Output configuration
7481
output:
7582
ground_truth:
7683
save: false # Whether to save ground truth data
77-
directory: "ground_truth" # Directory for ground truth data
84+
directory: "ocr_results" # Directory for ground truth data
7885

7986
ocr_results:
8087
save: false # Whether to save OCR results
@@ -94,21 +101,18 @@ You can still use command line arguments for quick runs:
94101
python run.py --image-folder assets
95102

96103
# Run with custom prompt
97-
python run.py --image-folder assets --custom-prompt "Extract all text from this image and identify any named entities."
104+
python run.py --image-folder assets --custom-prompt "Extract all text from this image."
98105

99106
# Run with specific images
100107
python run.py --image-paths assets/image1.jpg assets/image2.png
101108

102109
# Run with OpenAI ground truth for evaluation
103110
python run.py --image-folder assets --ground-truth openai --save-ground-truth
104111

105-
# Run with manual ground truth
106-
python run.py --image-paths assets/image1.jpg assets/image2.jpg --ground-truth manual --ground-truth-texts "Text from image 1" "Text from image 2"
107-
108112
# List available ground truth files
109113
python run.py --list-ground-truth-files
110114

111-
# For quicker non-ZenML processing of a single image
115+
# For quicker processing of a single image without metadata or artifact tracking
112116
python run_compare_ocr.py --image assets/your_image.jpg --model both
113117
```
114118

@@ -126,9 +130,12 @@ The OCR comparison pipeline consists of the following components:
126130

127131
### Steps
128132

129-
1. **Gemma3 OCR Step**: Uses Ollama Gemma3 model for OCR
130-
2. **Mistral OCR Step**: Uses Mistral's Pixtral model for OCR
131-
3. **Evaluation Step**: Compares results and calculates metrics
133+
1. **Model 1 OCR Step**: Processes images with the first model (default: Ollama Gemma3)
134+
2. **Model 2 OCR Step**: Processes images with the second model (default: Pixtral 12B)
135+
3. **Ground Truth Step**: Optional step that uses a reference model for evaluation (default: GPT-4o Mini)
136+
4. **Evaluation Step**: Compares results and calculates metrics
137+
138+
The pipeline now supports configurable models, allowing you to easily swap out the models used for OCR comparison and ground truth generation via the YAML configuration file.
132139

133140
### Configuration Management
134141

0 commit comments

Comments
 (0)