Skip to content

Conversation

@marwan37
Copy link
Contributor

@marwan37 marwan37 commented Mar 28, 2025

OmniReader is built for teams who routinely work with unstructured documents (e.g., PDFs, images, scanned forms) and want a scalable workflow for structured text extraction. It provides an end-to-end batch OCR pipeline with optional multi-model comparison to help ML engineers evaluate different OCR solutions before deployment.

@marwan37 marwan37 added enhancement New feature or request wip work in progress (don't merge) labels Mar 28, 2025
@dagshub
Copy link

dagshub bot commented Mar 28, 2025

Copy link
Contributor

@strickvl strickvl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few quick general observations:

  • you have a poetry.lock, a requirements.txt and a pyproject.toml, all seemingly with different dependencies... I think you need to pick one format and coalesce on that :) I think requirements.txt is generally what we've done in projects prior to now so I'd suggest you keep that.
  • seems like maybe too much logic is contained within the pipeline itself that normally we'd see happen inside the step. the loop over the images I think is a pattern normally that we'd see inside a step. We'd also often have a loader step which would load the images (even if it was just returning their paths vs the actual Image object) just for purposes of logging the filenames etc
  • where are you running these pipelines? I can't see them on the internal or the demo tenant?
  • if you put the looping logic inside the step, it'd be nice then to have the average processing times etc (across multiple images i.e.) logged as metadata which we can then compare in the dashboard
  • I think we do have room for at least one Ollama project in our projects, but I wonder a bit whether it's this one. Mabye it depends a bit whether we get standout results here or not.
  • Feels maybe a bit unfair to compare gemma3 vs pixtral. they're almost in different classes. (see https://huggingface.co/spaces/opencompass/open_vlm_leaderboard one of the popular leaderboards to get a sense of this). Wondering whether maybe to switch out pixtral for either mistral-small 3.1, or one of the Qwen vision models of a similar size.

Otherwise I think the README can be a bit nicer probably, including some image of the streamlit app, perhaps.

Also def return and log the results of the evaluate_models step as dict etc, but hoping you can add in an HTML visualization (i.e. return an HTMLString (see ZenML docs or other projects for how to do this)) for the data. Maybe with one sample result included inside or something as well so that you have an actual report at the end?

@strickvl
Copy link
Contributor

I was thinking about ways to improve the evals part of the pipeline and collaboratively came up with this document: https://gist.github.com/strickvl/c165a3d73310f7d91e75cde670aa428d

The code suggestions may or may not be actually how you implement things, but I think some of the ideas are maybe worth exploring, esp the quick wins.

You should also def do the HTMLString visualization thing I mentioned above as well as it'll allow you to lay out the results really nicely.

Copy link
Contributor

@htahir1 htahir1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it! Its getting better. I think it could use some materialisers and custom visualizations. Read docs here, here and here to get an idea

Copy link
Contributor

@strickvl strickvl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically you're almost there. Just some small changes and this is ready to publish.

@marwan37 marwan37 requested review from htahir1 and strickvl April 8, 2025 13:16
@marwan37 marwan37 merged commit f9b2b6a into main Apr 9, 2025
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request internal

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants