Skip to content

Commit 962e8f5

Browse files
authored
Merge branch 'main' into project/adddeployment
2 parents 1b570d5 + b820cc9 commit 962e8f5

21 files changed

+97
-52
lines changed

omni-reader/Dockerfile.sandbox

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
# Sandbox base image
2+
FROM safoinext/zenml-sandbox:latest
3+
4+
# Install project-specific dependencies
5+
# Install polars-lts-cpu instead of polars (version compiled for CPU compatibility)
6+
RUN pip install --no-cache-dir \
7+
"instructor==1.7.7" \
8+
"jiwer==3.0.5" \
9+
"jiter==0.8.2" \
10+
"importlib-metadata<7.0,>=1.4.0" \
11+
"litellm==1.64.1" \
12+
"mistralai==1.0.3" \
13+
"numpy<2.0,>=1.9.0" \
14+
"openai==1.69.0" \
15+
"Pillow==11.1.0" \
16+
"polars-lts-cpu==1.26.0" \
17+
"pyarrow>=7.0.0" \
18+
"python-dotenv==1.0.1" \
19+
"streamlit==1.44.0" \
20+
"pydantic>=2.8.2,<2.9.0" \
21+
"tqdm==4.66.4" \
22+
"zenml>=0.80.0" \
23+
uv
24+
25+
# Set workspace directory
26+
WORKDIR /workspace
27+
28+
# Clone only the omni-reader directory and reorganize
29+
RUN git clone --depth 1 https://github.com/zenml-io/zenml-projects.git /tmp/zenml-projects && \
30+
cp -r /tmp/zenml-projects/omni-reader/* /workspace/ && \
31+
rm -rf /tmp/zenml-projects
32+
33+
# Create a template .env file for API keys
34+
RUN echo "OPENAI_API_KEY=YOUR_OPENAI_API_KEY_HERE" > .env && \
35+
echo "MISTRAL_API_KEY=YOUR_MISTRAL_API_KEY_HERE" >> .env
36+
37+
# Create a .vscode directory (mainly to auto-apply the dark theme)
38+
RUN mkdir -p /workspace/.vscode
39+
# Copy settings file
40+
COPY settings.json /workspace/.vscode/settings.json
41+
42+
# Set environment variable to skip CPU checks for Polars as a fallback
43+
ENV POLARS_SKIP_CPU_CHECK=1
44+

omni-reader/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
A scalable multi-model text extraction solution for unstructured documents.
44

55
<div align="center">
6-
<img src="assets/docs/pipeline_dags.png" alt="Pipeline DAG" width="600" />
6+
<img src="assets/docs/pipeline_dags.png" alt="Pipeline DAG" width="800" />
77
</div>
88

99
**Extract Structured Text from Any Document**
879 KB
Loading

omni-reader/assets/samples_for_ocr/easy_example.jpeg renamed to omni-reader/assets/samples_for_ocr/handwritten/easy_example.jpeg

File renamed without changes.

omni-reader/assets/samples_for_ocr/education_article_excerpt.webp renamed to omni-reader/assets/samples_for_ocr/handwritten/education_article_excerpt.webp

File renamed without changes.

omni-reader/assets/samples_for_ocr/incomplete_sentence.png renamed to omni-reader/assets/samples_for_ocr/handwritten/incomplete_sentence.png

File renamed without changes.

omni-reader/assets/samples_for_ocr/reporter_notes.png renamed to omni-reader/assets/samples_for_ocr/handwritten/reporter_notes.png

File renamed without changes.

omni-reader/assets/samples_for_ocr/lexus_vin_number.webp renamed to omni-reader/assets/samples_for_ocr/numbers/lexus_vin_number.webp

File renamed without changes.

omni-reader/assets/samples_for_ocr/tire_serial_number.jpg renamed to omni-reader/assets/samples_for_ocr/numbers/tire_serial_number.jpg

File renamed without changes.

omni-reader/assets/samples_for_ocr/rx_prescription_clear.jpg renamed to omni-reader/assets/samples_for_ocr/rx_prescriptions/rx_prescription_clear.jpg

File renamed without changes.

0 commit comments

Comments
 (0)