Replies: 1 comment 1 reply
-
|
We utilize the VLM model from opendatalab/MinerU2.0-2505-0.9B to achieve end-to-end output. Thanks to its optimized model architecture and reduced parameter count, it offers advantages in resource utilization and parsing speed. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
This model(Nanonets-OCR-s) has just been released and has already reached 201,530 downloads in just a few days. It requires only 8 GB of VRAM to run and is reported to perform very well.
Nanonets-OCR-s: https://huggingface.co/nanonets/Nanonets-OCR-s/tree/main
In addition, there are other impressive models, such as Ovis2-4B and InternVL3-8B. Rankings and scores can be found here:
https://rank.opencompass.org.cn/leaderboard-multimodal/?m=REALTIME
Beta Was this translation helpful? Give feedback.
All reactions