Would We Adopt the Newly Released Lightweight Multimodal Model Nanonets-OCR-s? #2817

xiyuan27 · 2025-06-28T01:27:57Z

xiyuan27
Jun 28, 2025

This model（Nanonets-OCR-s） has just been released and has already reached 201,530 downloads in just a few days. It requires only 8 GB of VRAM to run and is reported to perform very well.

Nanonets-OCR-s: https://huggingface.co/nanonets/Nanonets-OCR-s/tree/main

In addition, there are other impressive models, such as Ovis2-4B and InternVL3-8B. Rankings and scores can be found here:
https://rank.opencompass.org.cn/leaderboard-multimodal/?m=REALTIME

myhloli · 2025-06-28T06:24:02Z

myhloli
Jun 28, 2025
Maintainer

We utilize the VLM model from opendatalab/MinerU2.0-2505-0.9B to achieve end-to-end output. Thanks to its optimized model architecture and reduced parameter count, it offers advantages in resource utilization and parsing speed.

1 reply

GeorgeDeac Jun 28, 2025

I have also thought of this, if it would be possible to make it work with multiple VLM backbones.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Would We Adopt the Newly Released Lightweight Multimodal Model Nanonets-OCR-s? #2817

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Would We Adopt the Newly Released Lightweight Multimodal Model Nanonets-OCR-s? #2817

Uh oh!

xiyuan27 Jun 28, 2025

Replies: 1 comment · 1 reply

Uh oh!

myhloli Jun 28, 2025 Maintainer

Uh oh!

GeorgeDeac Jun 28, 2025

xiyuan27
Jun 28, 2025

Replies: 1 comment 1 reply

myhloli
Jun 28, 2025
Maintainer