Skip to content

Commit 51c8eae

Browse files
author
marwan37
committed
update README
1 parent 09e847f commit 51c8eae

File tree

1 file changed

+175
-63
lines changed

1 file changed

+175
-63
lines changed

omni-reader/README.md

Lines changed: 175 additions & 63 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,14 @@ OmniReader is built for teams who routinely work with unstructured documents (e.
1414
<p><em>HTML visualization showing metrics and comparison results from the OCR pipeline</em></p>
1515
</div>
1616

17+
## 🔮 Use Cases
18+
19+
- **Document Processing Automation**: Extract structured data from invoices, receipts, and forms
20+
- **Content Digitization**: Convert scanned documents and books into searchable digital content
21+
- **Regulatory Compliance**: Extract and validate information from compliance documents
22+
- **Data Migration**: Convert legacy paper documents into structured digital formats
23+
- **Research & Analysis**: Extract data from academic papers, reports, and publications
24+
1725
## 🌟 Key Features
1826

1927
- **End-to-end workflow management** from evaluation to production deployment
@@ -44,6 +52,74 @@ OmniReader supports a wide range of OCR models, including:
4452

4553
> ⚠️ Note: For production deployments, we recommend using the non-GGUF hosted model versions via their respective APIs for better performance and accuracy. The Ollama models mentioned here are primarily for convenience.
4654
55+
### 🔧 OCR Processor Configuration
56+
57+
OmniReader supports multiple OCR processors to handle different models:
58+
59+
1. **litellm**: For using LiteLLM-compatible models including those from Mistral and other providers.
60+
61+
- Set API keys for your providers (e.g., `MISTRAL_API_KEY`)
62+
- **Important**: When using `litellm` as the processor, you must specify the `provider` field in your model configuration.
63+
64+
2. **ollama**: For running local models through Ollama.
65+
66+
- Requires: [Ollama](https://ollama.com/) installed and running
67+
- Set `OLLAMA_HOST` (defaults to "http://localhost:11434/api/generate")
68+
- If using local models, they must be pulled before use with `ollama pull model_name`
69+
70+
3. **openai**: For using OpenAI models like GPT-4o.
71+
- Set `OPENAI_API_KEY` environment variable
72+
73+
Example model configurations in your `configs/batch_pipeline.yaml`:
74+
75+
```yaml
76+
models_registry:
77+
- name: "gpt-4o-mini"
78+
shorthand: "gpt4o"
79+
ocr_processor: "openai"
80+
# No provider needed for OpenAI
81+
82+
- name: "gemma3:27b"
83+
shorthand: "gemma3"
84+
ocr_processor: "ollama"
85+
# No provider needed for Ollama
86+
87+
- name: "mistral/pixtral-12b-2409"
88+
shorthand: "pixtral"
89+
ocr_processor: "litellm"
90+
provider: "mistral" # Provider field required for litellm processor
91+
```
92+
93+
To add your own models, extend the `models_registry` with the appropriate processor and provider configurations based on the model source.
94+
95+
## 🛠️ Project Structure
96+
97+
```
98+
omni-reader/
99+
100+
├── app.py # Streamlit UI for interactive document processing
101+
├── assets/ # Sample images for ocr
102+
├── configs/ # YAML configuration files
103+
├── ground_truth_texts/ # Text files containing ground truth for evaluation
104+
├── pipelines/ # ZenML pipeline definitions
105+
│ ├── batch_pipeline.py # Batch OCR pipeline (single or multiple models)
106+
│ └── evaluation_pipeline.py # Evaluation pipeline (multiple models)
107+
├── steps/ # Pipeline step implementations
108+
│ ├── evaluate_models.py # Model comparison and metrics
109+
│ ├── loaders.py # Loading images and ground truth texts
110+
│ └── run_ocr.py # Running OCR with selected models
111+
├── utils/ # Utility functions and helpers
112+
│ ├── ocr_processing.py # OCR processing core logic
113+
│ ├── metrics.py # Metrics for evaluation
114+
│ ├── visualizations.py # Visualization utilities for the evaluation pipeline
115+
│ ├── encode_image.py # Image encoding utilities for OCR processing
116+
│ ├── prompt.py # Prompt template for vision models
117+
│ ├── config.py # Utilities for loading and validating configs
118+
│ └── model_configs.py # Model configuration and registry
119+
├── run.py # Main entrypoint for running the pipeline
120+
└── README.md # Project documentation
121+
```
122+
47123
## 🚀 Getting Started
48124
49125
### Prerequisites
@@ -57,29 +133,13 @@ OmniReader supports a wide range of OCR models, including:
57133
### Quick Start
58134
59135
```bash
60-
# Clone the repository
61-
git clone https://github.com/yourusername/omni-reader.git
62-
63-
# Navigate to OmniReader
64-
cd omni-reader
65-
66136
# Install dependencies
67137
pip install -r requirements.txt
68138
69139
# Start Ollama (if using local models)
70140
ollama serve
71141
```
72142

73-
### Prepare Your Models
74-
75-
If using local models, ensure any Ollama models you want to use are pulled:
76-
77-
```bash
78-
ollama pull gemma3:27b
79-
ollama pull llava-phi3
80-
ollama pull granite3.2-vision
81-
```
82-
83143
### Set Up Your Environment
84144

85145
Configure your API keys:
@@ -93,11 +153,20 @@ export OLLAMA_HOST=base_url_for_ollama_host # defaults to "http://localhost:1143
93153
### Run OmniReader
94154

95155
```bash
96-
# Use the default config (config.yaml)
156+
# Run the batch pipeline (default)
97157
python run.py
98158

159+
# Run the evaluation pipeline
160+
python run.py --eval
161+
99162
# Run with a custom config file
100-
python run.py --config my_config.yaml
163+
python run.py --config my_custom_config.yaml
164+
165+
# Run with custom input
166+
python run.py --image-folder ./my_images
167+
168+
# List ground truth files
169+
python run.py --list-ground-truth-files
101170
```
102171

103172
### Interactive UI
@@ -120,66 +189,107 @@ streamlit run app.py
120189

121190
## ☁️ Cloud Deployment
122191

123-
OmniReader supports storing artifacts remotely and executing pipelines on cloud infrastructure:
192+
OmniReader supports storing artifacts remotely and executing pipelines on cloud infrastructure. For this example, we'll use AWS, but you can use any cloud provider you want.
124193

125-
### Set Up Cloud Provider Integrations
194+
### AWS Setup
126195

127-
```bash
128-
# For AWS
129-
zenml integration install aws s3
196+
1. **Install required integrations**:
130197

131-
# For Azure
132-
zenml integration install azure
198+
```bash
199+
zenml integration install aws s3
200+
```
133201

134-
# For Google Cloud
135-
zenml integration install gcp gcs
136-
```
202+
2. **Set up your AWS credentials**:
137203

138-
Run your pipeline in the cloud:
204+
- Create an IAM role with appropriate permissions (S3, ECR, SageMaker)
205+
- Configure your role ARN and region
139206

140-
```bash
141-
# Configure your cloud stack
142-
zenml stack register my-cloud-stack -a cloud-artifact-store -o cloud-orchestrator
143-
```
207+
3. **Register an AWS service connector**:
208+
209+
```bash
210+
zenml service-connector register aws_connector \
211+
--type aws \
212+
--auth-method iam-role \
213+
--role_arn=<ROLE_ARN> \
214+
--region=<YOUR_REGION> \
215+
--aws_access_key_id=<YOUR_ACCESS_KEY_ID> \
216+
--aws_secret_access_key=<YOUR_SECRET_ACCESS_KEY>
217+
```
218+
219+
4. **Configure stack components**:
220+
221+
a. **S3 Artifact Store**:
222+
223+
```bash
224+
zenml artifact-store register s3_artifact_store \
225+
-f s3 \
226+
--path=s3://<YOUR_BUCKET_NAME> \
227+
--connector aws_connector
228+
```
144229

145-
For detailed configuration options and other components, refer to the ZenML documentation:
230+
b. **SageMaker Orchestrator**:
231+
232+
```bash
233+
zenml orchestrator register sagemaker_orchestrator \
234+
--flavor=sagemaker \
235+
--region=<YOUR_REGION> \
236+
--execution_role=<ROLE_ARN>
237+
```
238+
239+
c. **ECR Container Registry**:
240+
241+
```bash
242+
zenml container-registry register ecr_registry \
243+
--flavor=aws \
244+
--uri=<ACCOUNT_ID>.dkr.ecr.<YOUR_REGION>.amazonaws.com \
245+
--connector aws_connector
246+
```
247+
248+
5. **Register and activate your stack**:
249+
```bash
250+
zenml stack register aws_stack \
251+
-a s3_artifact_store \
252+
-o sagemaker_orchestrator \
253+
-c ecr_registry \
254+
--set
255+
```
256+
257+
### Other Cloud Providers
258+
259+
Similar setup processes can be followed for other cloud providers:
260+
261+
- **Azure**: Install the Azure integration (`zenml integration install azure`) and set up Azure Blob Storage, AzureML, and Azure Container Registry
262+
- **Google Cloud**: Install the GCP integration (`zenml integration install gcp gcs`) and set up GCS, Vertex AI, and GCR
263+
264+
For detailed configuration options for these providers, refer to the ZenML documentation:
146265

147266
- [AWS Integration Guide](https://docs.zenml.io/how-to/popular-integrations/aws-guide)
148267
- [GCP Integration Guide](https://docs.zenml.io/how-to/popular-integrations/gcp-guide)
149268
- [Azure Integration Guide](https://docs.zenml.io/how-to/popular-integrations/azure-guide)
150269

151-
## 🛠️ Project Structure
270+
### 🐳 Docker Settings for Cloud Deployment
152271

153-
```
154-
omni-reader/
155-
156-
├── app.py # Streamlit UI for interactive document processing
157-
├── assets/ # Sample images for ocr
158-
├── configs/ # YAML configuration files
159-
├── ground_truth_texts/ # Text files containing ground truth for evaluation
160-
├── pipelines/ # ZenML pipeline definitions
161-
│ ├── batch_pipeline.py # Batch OCR pipeline (single or multiple models)
162-
│ └── evaluation_pipeline.py # Evaluation pipeline (multiple models)
163-
├── steps/ # Pipeline step implementations
164-
│ ├── evaluate_models.py # Model comparison and metrics
165-
│ ├── loaders.py # Loading images and ground truth texts
166-
│ ├── run_ocr.py # Running OCR with selected models
167-
│ └── save_results.py # Saving results and visualizations
168-
├── utils/ # Utility functions and helpers
169-
│ ├── ocr_processing.py # OCR processing core logic
170-
│ ├── config.py # Configuration utilities
171-
│ └── model_configs.py # Model configuration and registry
172-
├── run.py # Main entrypoint for running the pipeline
173-
└── README.md # Project documentation
174-
```
272+
For cloud execution, you'll need to configure Docker settings in your pipeline:
175273

176-
## 🔮 Use Cases
274+
```python
275+
from zenml.config import DockerSettings
177276

178-
- **Document Processing Automation**: Extract structured data from invoices, receipts, and forms
179-
- **Content Digitization**: Convert scanned documents and books into searchable digital content
180-
- **Regulatory Compliance**: Extract and validate information from compliance documents
181-
- **Data Migration**: Convert legacy paper documents into structured digital formats
182-
- **Research & Analysis**: Extract data from academic papers, reports, and publications
277+
# Create Docker settings
278+
docker_settings = DockerSettings(
279+
required_integrations=["aws", "s3"], # Based on your cloud provider
280+
requirements="requirements.txt",
281+
python_package_installer="uv", # Optional, defaults to "pip"
282+
environment={
283+
"OPENAI_API_KEY": os.getenv("OPENAI_API_KEY"),
284+
"MISTRAL_API_KEY": os.getenv("MISTRAL_API_KEY"),
285+
},
286+
)
287+
288+
# Use in your pipeline definition
289+
@pipeline(settings={"docker": docker_settings})
290+
def batch_ocr_pipeline(...):
291+
...
292+
```
183293

184294
## 📚 Documentation
185295

@@ -188,4 +298,6 @@ For more information about ZenML and building MLOps pipelines, refer to the [Zen
188298
For model-specific documentation:
189299

190300
- [Mistral AI Vision Documentation](https://docs.mistral.ai/capabilities/vision/)
301+
- [LiteLLM Providers Documentation](https://docs.litellm.ai/docs/providers)
302+
- [Gemma3 Documentation](https://ai.google.dev/gemma/docs/integrations/ollama)
191303
- [Ollama Models Library](https://ollama.com/library)

0 commit comments

Comments
 (0)