-
Notifications
You must be signed in to change notification settings - Fork 71
Add Omni Reader Project #187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
strickvl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few quick general observations:
- you have a poetry.lock, a requirements.txt and a pyproject.toml, all seemingly with different dependencies... I think you need to pick one format and coalesce on that :) I think requirements.txt is generally what we've done in projects prior to now so I'd suggest you keep that.
- seems like maybe too much logic is contained within the pipeline itself that normally we'd see happen inside the step. the loop over the images I think is a pattern normally that we'd see inside a step. We'd also often have a
loaderstep which would load the images (even if it was just returning their paths vs the actualImageobject) just for purposes of logging the filenames etc - where are you running these pipelines? I can't see them on the
internalor thedemotenant? - if you put the looping logic inside the step, it'd be nice then to have the average processing times etc (across multiple images i.e.) logged as metadata which we can then compare in the dashboard
- I think we do have room for at least one Ollama project in our projects, but I wonder a bit whether it's this one. Mabye it depends a bit whether we get standout results here or not.
- Feels maybe a bit unfair to compare gemma3 vs pixtral. they're almost in different classes. (see https://huggingface.co/spaces/opencompass/open_vlm_leaderboard one of the popular leaderboards to get a sense of this). Wondering whether maybe to switch out pixtral for either mistral-small 3.1, or one of the Qwen vision models of a similar size.
Otherwise I think the README can be a bit nicer probably, including some image of the streamlit app, perhaps.
Also def return and log the results of the evaluate_models step as dict etc, but hoping you can add in an HTML visualization (i.e. return an HTMLString (see ZenML docs or other projects for how to do this)) for the data. Maybe with one sample result included inside or something as well so that you have an actual report at the end?
|
I was thinking about ways to improve the evals part of the pipeline and collaboratively came up with this document: https://gist.github.com/strickvl/c165a3d73310f7d91e75cde670aa428d The code suggestions may or may not be actually how you implement things, but I think some of the ideas are maybe worth exploring, esp the quick wins. You should also def do the |
htahir1
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strickvl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically you're almost there. Just some small changes and this is ready to publish.
…un_ocr results type
OmniReader is built for teams who routinely work with unstructured documents (e.g., PDFs, images, scanned forms) and want a scalable workflow for structured text extraction. It provides an end-to-end batch OCR pipeline with optional multi-model comparison to help ML engineers evaluate different OCR solutions before deployment.